Skip to main navigation Skip to search Skip to main content

Power enhancing probability subsampling using side information

  • Junzhuo Gao (Co-first Author)
  • , Lei Wang (Co-first Author)
  • , Haiying Wang* (Co-first Author)
  • *Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

Abstract

In this paper, we study the subsampling technique for hypothesis testing in generalized linear models with large-scale datasets, focusing on testing simple null hypotheses against composite linear alternatives. We propose a subsample-based test statistic and show that it converges to non-central chi-square distributions under Pitman’s local alternatives. The optimal subsampling distribution that maximizes power requires iterative calculations on the full data, which is computationally infeasible. Furthermore, it depends on the true parameter, which cannot be consistently estimated under Pitman’s local alternatives. We maximize a lower bound of the non-central parameter to define the power enhancing probability and utilize side information under the alternative to replace the true parameter. Extensive simulations and an application to a real dataset on flight delays and cancellations show that the proposed method offers a computationally viable solution for hypothesis testing in the realm of big data. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2025
Original languageEnglish
Article number28
JournalStatistics and Computing
Volume35
Issue number2
Online published12 Jan 2025
DOIs
Publication statusPublished - Apr 2025
Externally publishedYes

Research Keywords

  • Generalized linear models
  • Non-central chi-square distribution
  • Pitman’s local alternatives

Fingerprint

Dive into the research topics of 'Power enhancing probability subsampling using side information'. Together they form a unique fingerprint.

Cite this