TY - JOUR
T1 - Distributed Mallows Model Averaging for Ridge Regressions
AU - Zhang, Haili
AU - Wan, Alan T. K.
AU - You, Kang
AU - Zou, Guohua
PY - 2025/2
Y1 - 2025/2
N2 - Ridge regression is an effective tool to handle multicollinearity in regressions. It is also an essential type of shrinkage and regularization methods and is widely used in big data and distributed data applications. The divide and conquer trick, which combines the estimator in each subset with equal weight, is commonly applied in distributed data. To overcome multicollinearity and improve estimation accuracy in the presence of distributed data, we propose a Mallows-type model averaging method for ridge regressions, which combines estimators from all subsets. Our method is proved to be asymptotically optimal allowing the number of subsets and the dimension of variables to be divergent. The consistency of the resultant weight estimators tending to the theoretically optimal weights is also derived. Furthermore, the asymptotic normality of the model averaging estimator is demonstrated. Our simulation study and real data analysis show that the proposed model averaging method often performs better than commonly used model selection and model averaging methods in distributed data cases. © Springer-Verlag GmbH Germany & The Editorial Office of AMS 2025.
AB - Ridge regression is an effective tool to handle multicollinearity in regressions. It is also an essential type of shrinkage and regularization methods and is widely used in big data and distributed data applications. The divide and conquer trick, which combines the estimator in each subset with equal weight, is commonly applied in distributed data. To overcome multicollinearity and improve estimation accuracy in the presence of distributed data, we propose a Mallows-type model averaging method for ridge regressions, which combines estimators from all subsets. Our method is proved to be asymptotically optimal allowing the number of subsets and the dimension of variables to be divergent. The consistency of the resultant weight estimators tending to the theoretically optimal weights is also derived. Furthermore, the asymptotic normality of the model averaging estimator is demonstrated. Our simulation study and real data analysis show that the proposed model averaging method often performs better than commonly used model selection and model averaging methods in distributed data cases. © Springer-Verlag GmbH Germany & The Editorial Office of AMS 2025.
KW - 62F12
KW - 62H10
KW - 62J07
KW - Asymptotic normality
KW - asymptotic optimality
KW - consistency
KW - distributed data
KW - Mallows model averaging
KW - ridge regression
UR - http://www.scopus.com/inward/record.url?scp=85218698414&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-85218698414&origin=recordpage
U2 - 10.1007/s10114-025-3409-x
DO - 10.1007/s10114-025-3409-x
M3 - RGC 21 - Publication in refereed journal
SN - 1439-8516
VL - 41
SP - 780
EP - 826
JO - Acta Mathematica Sinica, English Series
JF - Acta Mathematica Sinica, English Series
IS - 2
ER -