Dynamic Statistical Learning in Massive Datastreams

Jingshen Wang, Lilun Du*, Changliang Zou, Zhenke Wu

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

1 Citation (Scopus)

Abstract

Technological advances have necessitated statistical methodologies for analyzing large-scale datastreams comprising multiple indefinitely time series. This manuscript proposes a dynamic tracking and screening (DTS) framework for online learning and model updating. Utilizing the sequential nature of datastreams, a robust estimation approach is developed under a linear varying coefficient model framework. This accommodates unequally-spaced design points and updates coefficient estimates without storing historical data. A data-driven choice of an optimal smoothing parameter is proposed, alongside a new multiple testing procedure for the streaming environment. Statistical guarantees of the procedure are provided, along with simulation studies on its finite-sample performance. The methods are demonstrated through a mobile health example estimating when subjects’ sleep and physical activities unusually influence their mood. © 2024 Institute of Statistical Science. All rights reserved.
Original languageEnglish
Article numberSS-2023-0195
JournalStatistica Sinica
DOIs
Publication statusAccepted/In press/Filed - 2024

Funding

The authors thank the Associate Editor and two anonymous referees for their exceptional comments that lead to improvement of the paper. Wang acknowledges the support of NSF DMS-2220537. Du\u2019s research is supported by Hong Kong RGC-GRF-16302620 and CityU Start-up Grant (Grant No: 7200774). Zou was supported by the National Key R&D Program of China (Grant Nos. 2022YFA1003703, 2022YFA1003800) and the National Natural Science Foundation of China (Grant Nos. 11925106, 12231011, 11931001,12226007,12326325). This work was partially supported by grants from the National Institutes of Health (R01 MH101459 to ZW), and an investigator award from Precision Health Initiative at the University of Michigan to ZW. We thank Dr. Srijan Sen for generous support in the IHS data access.

Research Keywords

  • Consistency
  • Kernel smoothing
  • Multiple testing
  • Varying coefficient

Fingerprint

Dive into the research topics of 'Dynamic Statistical Learning in Massive Datastreams'. Together they form a unique fingerprint.

Cite this