Robust Regression and its Distributed Learning

Student thesis: Doctoral Thesis

Abstract

Robust regression has been extensively studied in the statistics and machine learning communities, which enjoys robustness against outliers and heavy-tailed noises. One typical tool is Huber regression which has a specific emphasis on modeling the conditional mean and serves as a robust alternative to least squares regression. Another valid tool is quantile regression. Different from Huber-type methods, the latter models the entire conditional distribution of the response variable. Moreover, statistical learning now faces additional difficulties due to the availability of large-scale and decentralized data. On one hand, storing all data on a single machine is impracticable due to privacy issues, limited storage or communication costs. On the other hand, it is often more efficient to utilize the computing power of all local machines. Despite rapid advances, distributed learning for robust regression in the big data regime has not been thoroughly explored. In this thesis, we provide a comprehensive analysis on two problems including distributed learning for Huber matrix regression and quantile regression, which are the focus of Chapters 2-3 below.

In Chapter 2, we propose adaptive Huber matrix regression with a nuclear norm penalty, which is insensitive to heavy-tailed noises without sacrificing statistical accuracy. To further enhance the scalability in massive data applications, we employ the communication-efficient surrogate likelihood framework to develop distributed robust matrix regression, which can be efficiently implemented through the alternating direction method of multipliers (ADMM) algorithms.

In Chapter 3, we develop decentralized learning for quantile regression. The optimization problem associated with parameter estimation in quantile regression is neither smooth nor strongly convex, leading to sublinear convergence at best. Although this insinuates slow convergence, we show that, if the local sample size is sufficiently large compared to parameter dimension and network size, distributed estimation in quantile regression actually exhibits linear convergence up to the statistical precision.
Date of Award15 Aug 2024
Original languageEnglish
Awarding Institution
  • City University of Hong Kong
SupervisorHeng LIAN (Supervisor)

Cite this

'