Abstract
Big data present new theoretical and computational challenges as well as tremendous opportunities in many fields. In health care research, we develop a novel divide-and-conquer (DAC) approach to deal with massive and right-censored data under the accelerated failure time model, where the sample size is extraordinarily large and the dimension of predictors is large but smaller than the sample size. Specifically, we construct a penalized loss function by approximating the weighted least squares loss function by combining estimation results without penalization from all subsets. The resulting adaptive LASSO penalized DAC estimator enjoys the oracle property. Simulation studies demonstrate that the proposed DAC procedure performs well and also reduces the computation time with satisfactory performance compared with estimation results using the full data. Our proposed DAC approach is applied to a massive dataset from the Chinese Longitudinal Healthy Longevity Survey. © 2022 Statistical Society of Canada / Société statistique du Canada.
| Original language | English |
|---|---|
| Pages (from-to) | 400-419 |
| Journal | Canadian Journal of Statistics |
| Volume | 51 |
| Issue number | 2 |
| Online published | 27 Aug 2022 |
| DOIs | |
| Publication status | Published - Jun 2023 |
| Externally published | Yes |
Research Keywords
- Accelerated failure time model
- adaptive LASSO
- divide and conquer
- oracle property
- survival data
RGC Funding Information
- RGC-funded