Some Aspects of Frequentist Model Averaging in Statistics

Project: Research

View graph of relations


Model selection has always been an integral part of statistical practice. It is so widely practised that there can be few statisticians who have not employed criteria such as the AIC or BIC to choose between models. Unfortunately, practitioners do not commonly recognize the additional uncertainty introduced by model selection in the process of statistical modeling. In reality, properties of estimators and tests subsequent to model selection depend on the way the model has been selected in addition to the stochastic nature of the chosen model. However, practitioners usually only take into the account the latter and report estimates obtained from the chosen model as if they were unconditional when they are actually conditional estimates. Extensive literature on post-model selection inference has shown that under-reporting can be a very serious problem if the additional variability introduced by model selection is ignored.Many statisticians have argued that a simple way to overcome the aforementioned under-reporting problem is by model averaging. A model average estimator compromises across a set of competing models, and in doing so incorporates model uncertainty into the conclusions about the unknown parameters. Model averaging has long been a popular technique among Bayesian statisticians. Lately there have also been several seminal developments from a frequentist standpoint. The proposed project is motivated by some of the unanswered questions in this emerging literature. Of the numerous interesting avenues of research in this growing area, five have been selected for particular attention in the proposed project. For some of these selected topics the Principal Investigator (P.I.) has carried out preliminary analysis and succeeded in deriving the key theoretical results.Part 1 of the project develops frequentist model average estimators in discrete choice models based on scores from a variety of Focused Information Criteria (FIC), with special attention paid to multinomial, ordinal and nested logit models. Monte-Carlo studies will be undertaken to compare model average estimators using different weight choices. Empirical analysis using real data will also be performed. Part 2 of the project deals the Truncated and Censored regression models and examines model combining in a similar manner to Part 1. Part 3 of the project considers the use of the Mallows criterion for model averaging in linear regression. A recent working paper by the P.I. shows that the Mallows criterion continues to possess the optimality property established in Hansen (2007, Econometrica) even if some of crucial assumptions are relaxed. The whole matter concerning the use of the Mallows criterion requires a more thorough investigation, especially when the observations are dependent, and the project will take some steps in this direction. Part 4 of the project focuses on the threshold regression model, and develops a model combining scheme such that the model weights are selected by minimizing the trace of the unbiased estimate of the MSE matrix of the model average estimator. Model selection in the face of incomplete data has received considerable attention in recent years, and Part 5 of the project is devoted to an investigation of the properties of model average estimators with weights based on model selection scores developed for different incomplete data circumstances. We also consider the scenario where model averaging is preceded by imputation of the missing data. Monte-Carlo studies will be conducted to examine the performance of model average estimators subject to different methods of missing data correction and based on different model weight choices.


Project number7002428
Grant typeSRG
Effective start/end date1/04/0930/06/09