Abstract
In this paper, we study the subsampling technique for hypothesis testing in generalized linear models with large-scale datasets, focusing on testing simple null hypotheses against composite linear alternatives. We propose a subsample-based test statistic and show that it converges to non-central chi-square distributions under Pitman’s local alternatives. The optimal subsampling distribution that maximizes power requires iterative calculations on the full data, which is computationally infeasible. Furthermore, it depends on the true parameter, which cannot be consistently estimated under Pitman’s local alternatives. We maximize a lower bound of the non-central parameter to define the power enhancing probability and utilize side information under the alternative to replace the true parameter. Extensive simulations and an application to a real dataset on flight delays and cancellations show that the proposed method offers a computationally viable solution for hypothesis testing in the realm of big data. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2025
| Original language | English |
|---|---|
| Article number | 28 |
| Journal | Statistics and Computing |
| Volume | 35 |
| Issue number | 2 |
| Online published | 12 Jan 2025 |
| DOIs | |
| Publication status | Published - Apr 2025 |
| Externally published | Yes |
Research Keywords
- Generalized linear models
- Non-central chi-square distribution
- Pitman’s local alternatives
Fingerprint
Dive into the research topics of 'Power enhancing probability subsampling using side information'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver