Research on Penalised Regression Methods in Econometrics and Statistics
DescriptionPenalised regression refers to those estimation methods in regression where the criterion used to estimate coefficients includes a penalty term on those coefficients. An important distinguishing feature of many penalised regression methods is that they are able to combine automatic variable selection and coefficient estimation simultaneously. This is important because both are at the heart of statistical analysis. Moreover, penalised regression methods also handle cases where the number of parameters to be estimated exceeds the sample size. These very strong features make penalised regression an attractive approach, and a sizable body of literature on this subject has developed and continues to grow. Examples of penalised regression methods that have achieved great successes include the least absolute shrinkage and selection operator (LASSO) and its adaptive variant, the elastic net (EN), as well as the smooth clipped absolute deviation (SCAD) estimator. Different branches of applied statistics have also developed their own distinct types of penalised regression methods; for example, the octagon shrinkage and clustering algorithm for regression (OSCAR) was developed mainly in response to statistical problems encountered in genetic studies.Despite this vast intellectual progress, the literature still remains relatively silent on several important issues. For example, penalised regression has not yet been extended to statistical problems characterized by length-biased data. Also, despite the great popularity of penalised regression in biostatistics, in fields like econometrics where variable selection is always a major issue, the methodology has so far been largely ignored. Moreover, there remains a considerable amount of work that could usefully be undertaken to explore the properties of some recently proposed penalised regression methods, especially under non-standard conditions.With this motivation, this project is designed to extend the penalised regression method in several important directions. To achieve this, the project is divided into five parts. Part 1 focuses on the development of penalised regression methods in a class of censored regression models known collectively in econometrics as Tobit models. Here, a major issue to be addressed relates to the combination of penalised regression with the popular Heckman’s two step estimation procedure in univariate Tobit models. Also to be considered are ways of implementing penalised regression in the bivariate Tobit model, with a specific focus on some of the recently proposed semiparametric methods for this model. Part 2 of the project considers penalised regression under length-biased sampling, where the sample observations are not selected randomly, but with probability proportional to their length. Exploiting the principal investigator’s recent work on modeling length-biased data using a varying-coefficient quantile approach, we will develop penalised regression methods within the same context, although our initial analysis will likely focus on the linear regression model as an exploratory step towards obtaining results in the more complex contexts. Parts 3 and 4 will extend the Pair-wise Absolute Clustering and Sparsity (PACS) method, an improved variant of the OSCAR, to problems characterized by censored and missing data; in particular, Part 4 considers penalised regression in conjunction with the estimating equations approach with missing data as developed in a recent paper by the principal investigator. Part 5 applies penalised regression to a large Hong Kong housing data set to illustrate the usefulness of this technique in social science research. Some theoretical work for this project is already underway, and the investigators have succeeded in obtaining some preliminary results. The amount requested will be used mainly for the recruitment of research support staff to conduct simulation and real data analysis.
|Effective start/end date||1/05/12 → 13/03/15|