Distributed Algorithms for U-statistics-based Empirical Risk Minimization

Lanjue Chen, Alan T.K. Wan, Shuyi Zhang, Yong Zhou

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

8 Citations (Scopus)
27 Downloads (CityUHK Scholars)

Abstract

Empirical risk minimization, where the underlying loss function depends on a pair of data points, covers a wide range of application areas in statistics including pairwise ranking and survival analysis. The common empirical risk estimator obtained by averaging values of a loss function over all possible pairs of observations is essentially a U-statistic. One well-known problem with minimizing U-statistic type empirical risks, is that the computational complexity of U-statistics increases quadratically with the sample size. When faced with big data, this poses computational challenges as the colossal number of observation pairs virtually prohibits centralized computing to be performed on a single machine. This paper addresses this problem by developing two computationally and statistically efficient methods based on the divide-and-conquer strategy on a decentralized computing system, whereby the data are distributed among machines to perform the tasks. One of these methods is based on a surrogate of the empirical risk, while the other method extends the one-step updating scheme in classical M-estimation to the case of pairwise loss. We show that the proposed estimators are as asymptotically efficient as the benchmark global U-estimator obtained under centralized computing. As well, we introduce two distributed iterative algorithms to facilitate the implementation of the proposed methods, and conduct extensive numerical experiments to demonstrate their merit. ©2023 Lanjue Chen, Alan T.K. Wan, Shuyi Zhang, and Yong Zhou.
Original languageEnglish
Article number263
JournalJournal of Machine Learning Research
Volume24
Online publishedSept 2023
Publication statusPublished - 2023

Funding

The authors wish to acknowledge financial support from the following funding bodies: National Natural Science Foundation of China (State Key Program: 71931004 (Zhou)), National Key R&D Program of China: 2021YFA1000100 (Zhou & Zhang), the Hong Kong Research Grant Council (General Research Fund: 11501522 (Wan)), National Natural Science Foundation of China (General Program: 72273129 (Wan), State Key Program: 72331005 (Zhang), Young Scientist Fund: 72201101 (Zhang)), Ministry of Education of Humanities and Social Sciences (Youth Fund: 22YJC910013 (Zhang)), and Shanghai Pujiang Program: 21PJC034 (Zhang). The authorship is in alphabetical order. The authors thank the editor, associate editor, and four referees for their comments and suggestions on an earlier version of the paper. Any errors that remain are the sole responsibility of the authors.

Publisher's Copyright Statement

  • This full text is made available under CC-BY 4.0. https://creativecommons.org/licenses/by/4.0/

Fingerprint

Dive into the research topics of 'Distributed Algorithms for U-statistics-based Empirical Risk Minimization'. Together they form a unique fingerprint.

Cite this