Abstract
Semi-supervised learning has become increasingly prevalent in recent years. In the semi-supervised setting, most of the data are unlabeled since the acquisition of high-quality labels requires expensive scientific measurements and/or laborious human labeling. Hence, it is common to employ some black-box machine learning methods as predictive models to generate outcomes on unlabeled data for subsequent statistical inference. In this paper, we consider sparse regression in semi-supervised setting with prediction assistance. Although the predictions may be imperfect and/or noisy, the empirical risk based on a rectified loss function, is unbiased with respect to the population risk, thereby yielding a relatively safe imputation strategy. Under some regularity conditions, the near-optimal statistical rate is established. It is interesting to derive the prediction error bound on the unlabeled data, rather than the labeled data. More importantly, the asymptotic normality and confidence intervals are also studied via debiasing strategy. With mild conditions on predictive models, it is shown that our proposed method, integrating unlabeled data for combined analysis, is asymptotically more efficient than supervised sparse regression. Both numerical evidence and real-world data analysis demonstrate the effectiveness of our method.
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2025
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2025
| Original language | English |
|---|---|
| Article number | 123 |
| Journal | Statistics and Computing |
| Volume | 35 |
| Online published | 11 Jun 2025 |
| DOIs | |
| Publication status | Published - 2025 |
Funding
This work was supported in part by the National Social Science Fund (22BTJ025), the National Natural Science Foundation of China (12271272), the Postgraduate Research & Practice Innovation Program of Jiangsu Province (KYCX24_3622) and the Humanities and Social Sciences Youth Foundation of Ministry of Education of China (23YJC910003).
Research Keywords
- Semi-supervised inference
- Prediction imputation
- Asymptotic normality
- Confidence intervals
- Debiased Lasso
Fingerprint
Dive into the research topics of 'Sparse estimation and inference for prediction-powered semi-supervised linear regression'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver