Semiparametric penalized quadratic inference functions for longitudinal data in ultra-high dimensions

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

1 Scopus Citations
View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Original languageEnglish
Article number105175
Journal / PublicationJournal of Multivariate Analysis
Volume196
Online published23 Mar 2023
Publication statusPublished - Jul 2023

Abstract

In many biomedical and health studies, multivariate data arise from repeated measurements on a sample of subjects over time. In order to analyze such longitudinal data, we need to consider the correlations from the same subject, and it is inappropriate to use a simple multivariate model assuming independence structure. Motivated by a large scale longitudinal public health study that requires longitudinal data analysis with correlated multivariate discrete responses from repeated measurements and very high dimensional covariates, we adopt a flexible semiparametric approach for simultaneous variable selection and estimation without the requirement of specifying the full likelihood. Specifically, we propose generalized partially linear single-index models using penalized quadratic inference functions for longitudinal data in ultra-high dimensions. A key feature is that we allow the number of single-index covariates in the nonparametric term to diverge and even to be in ultra-high dimensions. The penalized quadratic inference functions easily incorporate within-subject correlation and pursue efficient estimation, and the single-index models can incorporate nonlinearity and some interactions while avoiding the curse of dimensionality. In this challenging setting, we contribute both an efficient algorithm and new asymptotic theory for our proposed approach for diverging and even ultra-dimensional covariates and a multivariate correlated response in longitudinal data. We apply our method to investigate diabetes status within a continuing longitudinal public health study with very high-dimensional genetic variables and phenotype variables. © 2023 Elsevier Inc.

Research Area(s)

  • Longitudinal data, Model selection, Multivariate correlated response, Partially linear model, Single-index model