Ultra-High Dimensional Quantile Regression for Longitudinal Data : An Application to Blood Pressure Analysis

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review

4 Scopus Citations
View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Original languageEnglish
Pages (from-to)97–108
Journal / PublicationJournal of the American Statistical Association
Volume118
Issue number541
Online published11 Nov 2022
Publication statusPublished - Mar 2023

Abstract

Despite major advances in research and treatment, identifying important genotype risk factors for high blood pressure remains challenging. Traditional genome-wide association studies (GWAS) focus on one single nucleotide polymorphism (SNP) at a time. We aim to select among over half a million SNPs along with time-varying phenotype variables via simultaneous modeling and variable selection, focusing on the most dangerous blood pressure levels at high quantiles. Taking advantage of rich data from a large-scale public health study, we develop and apply a novel quantile penalized generalized estimating equations (GEE) approach, incorporating several key aspects including ultra-high dimensional genetic SNPs, the longitudinal nature of blood pressure measurements, time-varying covariates, and conditional high quantiles of blood pressure. Importantly, we identify interesting new SNPs for high blood pressure. Besides, we find blood pressure levels are likely heterogeneous, where the important risk factors identified differ among quantiles. This comprehensive picture of conditional quantiles of blood pressure can allow more insights and targeted treatments. We provide an efficient computational algorithm and prove consistency, asymptotic normality, and the oracle property for the quantile penalized GEE estimators with ultra-high dimensional predictors. Moreover, we establish model-selection consistency for high-dimensional BIC. Simulation studies show the promise of the proposed approach. Supplementary materials for this article are available online.

Research Area(s)

  • Correlated data, Generalized estimating equations, GWAS, SCAD, Variable selection

Citation Format(s)