TY - JOUR
T1 - Dimensionality Reduction and Variable Selection in Multivariate Varying-Coefficient Models With a Large Number of Covariates
AU - He, Kejun
AU - Lian, Heng
AU - Ma, Shujie
AU - Huang, Jianhua Z.
PY - 2018
Y1 - 2018
N2 - Motivated by the study of gene and environment interactions, we consider a multivariate response varying-coefficient model with a large number of covariates. The need of nonparametrically estimating a large number of coefficient functions given relatively limited data poses a big challenge for fitting such a model. To overcome the challenge, we develop a method that incorporates three ideas: (i) reduce the number of unknown functions to be estimated by using (noncentered) principal components; (ii) approximate the unknown functions by polynomial splines; (iii) apply sparsity-inducing penalization to select relevant covariates. The three ideas are integrated into a penalized least-square framework. Our asymptotic theory shows that the proposed method can consistently identify relevant covariates and can estimate the corresponding coefficient functions with the same convergence rate as when only the relevant variables are included in the model. We also develop a novel computational algorithm to solve the penalized least-square problem by combining proximal algorithms and optimization over Stiefel manifolds. Our method is illustrated using data from Framingham Heart Study. Supplementary materials for this article are available online.
AB - Motivated by the study of gene and environment interactions, we consider a multivariate response varying-coefficient model with a large number of covariates. The need of nonparametrically estimating a large number of coefficient functions given relatively limited data poses a big challenge for fitting such a model. To overcome the challenge, we develop a method that incorporates three ideas: (i) reduce the number of unknown functions to be estimated by using (noncentered) principal components; (ii) approximate the unknown functions by polynomial splines; (iii) apply sparsity-inducing penalization to select relevant covariates. The three ideas are integrated into a penalized least-square framework. Our asymptotic theory shows that the proposed method can consistently identify relevant covariates and can estimate the corresponding coefficient functions with the same convergence rate as when only the relevant variables are included in the model. We also develop a novel computational algorithm to solve the penalized least-square problem by combining proximal algorithms and optimization over Stiefel manifolds. Our method is illustrated using data from Framingham Heart Study. Supplementary materials for this article are available online.
KW - Multivariate regression
KW - Oracle property
KW - Polynomial splines
UR - http://www.scopus.com/inward/record.url?scp=85042925743&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-85042925743&origin=recordpage
U2 - 10.1080/01621459.2017.1285774
DO - 10.1080/01621459.2017.1285774
M3 - RGC 21 - Publication in refereed journal
VL - 113
SP - 746
EP - 754
JO - Journal of the American Statistical Association
JF - Journal of the American Statistical Association
SN - 0162-1459
IS - 522
ER -