The Identification and Dimension Reduction Problem in Missing Data Analysis


Student thesis: Doctoral Thesis

View graph of relations


Related Research Unit(s)


Awarding Institution
Award date14 Jul 2021


In this dissertation, we focus on three topics under missing data framework: (1) the identification problem under nonignorable missing covariates case; (2) the parameter estimation based on dimension reduction under the estimating equation framework with data missing at random; (3) the low dimensional structure recovery with nonignorable nonresponse.

In the first topic, some general sufficient conditions are given to solve the identification problem of observed likelihood and parameters of interest with nonignorable missing covariate data. Under these conditions, a semiparametric logistic model with a tilting parameter is adopted to model the missing mechanism. A generalized method of moments (GMM) with an imputed estimating equation is implemented to estimate all unknown parameters, including the parameters of interest and tilting parameter, simultaneously without the need for other independent surveys or a validation sample for estimating the unknown tilting parameter. The asymptotic properties of the proposed estimators are derived. We also prove that the proposed estimators based on the inverse probability weighted (IPW), augmented inverse probability weighted (AIPW), and estimating equation projection (EEP) methods have the same asymptotic efficiency when the tilting parameter is known or unknown. In the simulation studies, the finite sample performance of our methods is compared with some existing methods, which shows that our methods are more robust and effective.

In the second topic, we propose a two-step procedure for estimating the unknown parameters in a high-dimensional model with more moment conditions than unknown parameters and data that are missing at random. More specifically, in the first step, a sufficient dimension reduction (SDR) method is implemented and its statistical guarantee is proved. In the second step, three well-known missing data handling mechanisms, that is IPW, AIPW, and EEP methods, together with the GMM to the dimension reduction subspace are applied to obtain estimates of unknown parameters. The theoretical properties of the proposed methods, including the effects of dimension reduction on the asymptotic distributions of the estimators are investigated. Besides, our results refute a claim in an earlier study that dimension reduction yields the same asymptotic distributions of estimators as when the reduced dimensional structure is the true structure. A simulation study and a real data example in clinical trials are used to demonstrate the performance of the proposed method.

In the third topic, we consider SDR methods under nonignorable missingness at random assumption. In detail, we propose two SDR estimators with nonignorable nonresponse under a semiparametric missing propensity score assumption, which are the dimension reduction based imputed estimator and the fusion refined estimator. We also develop different estimation methods with the parameter in the missing propensity model known and unknown, and investigate the theoretical properties of the proposed methods. Simulations are conducted to examine their performance and figure out the drawback of the fusion refined estimator. Besides, two real datasets are analyzed to illustrate the proposed methods.