Skip to main navigation Skip to search Skip to main content

Sparse estimation and inference for prediction-powered semi-supervised linear regression

Zihao Song, Jicai Liu, Lei Wang, Heng Lian, Weihua Zhao*

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

Abstract

Semi-supervised learning has become increasingly prevalent in recent years. In the semi-supervised setting, most of the data are unlabeled since the acquisition of high-quality labels requires expensive scientific measurements and/or laborious human labeling. Hence, it is common to employ some black-box machine learning methods as predictive models to generate outcomes on unlabeled data for subsequent statistical inference. In this paper, we consider sparse regression in semi-supervised setting with prediction assistance. Although the predictions may be imperfect and/or noisy, the empirical risk based on a rectified loss function, is unbiased with respect to the population risk, thereby yielding a relatively safe imputation strategy. Under some regularity conditions, the near-optimal statistical rate is established. It is interesting to derive the prediction error bound on the unlabeled data, rather than the labeled data. More importantly, the asymptotic normality and confidence intervals are also studied via debiasing strategy. With mild conditions on predictive models, it is shown that our proposed method, integrating unlabeled data for combined analysis, is asymptotically more efficient than supervised sparse regression. Both numerical evidence and real-world data analysis demonstrate the effectiveness of our method.

© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2025
Original languageEnglish
Article number123
JournalStatistics and Computing
Volume35
Online published11 Jun 2025
DOIs
Publication statusPublished - 2025

Funding

This work was supported in part by the National Social Science Fund (22BTJ025), the National Natural Science Foundation of China (12271272), the Postgraduate Research & Practice Innovation Program of Jiangsu Province (KYCX24_3622) and the Humanities and Social Sciences Youth Foundation of Ministry of Education of China (23YJC910003).

Research Keywords

  • Semi-supervised inference
  • Prediction imputation
  • Asymptotic normality
  • Confidence intervals
  • Debiased Lasso

Fingerprint

Dive into the research topics of 'Sparse estimation and inference for prediction-powered semi-supervised linear regression'. Together they form a unique fingerprint.

Cite this