Some Studies on Distributed Optimization and Learning

在分佈式優化與學習上的一些研究

Student thesis: Doctoral Thesis

View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Awarding Institution
Supervisors/Advisors
Award date5 Jul 2021

Abstract

In recent years, distributed optimization over multi-agent network has played an important role in both theoretical and practical aspects. Early works on distributed optimization mainly focus on the research of minimizing a smooth function known to several agents. During these ten years, research has turned to the problem of minimizing a sum of locally known convex objective functions distributed over a time-varying directed network via cooperation of agents. The problem appears in diverse areas of science and engineering frequently, such as source localization in sensor networks, parameter estimation and detection, resource allocation in wireless cellular networks, utility maximization. To solve aforementioned problems, different classes of distributed algorithms have been developed. In contrast to classical centralized optimization algorithms, the significance of distributed optimization algorithms is that, distributed optimization algorithms perform much more flexibly and effectively when handling networks in which the information communication graph is not fully connected. In history, The seminal distributed method is by adopting distributed subgradient approach. Then, several methods applying distributed stochastic techniques to convex optimization take shape gradually, including distributed push-sum algorithm, distributed ADMM algorithm, distributed mirror descent algorithm, distributed gradient tracking algorithm et al. that will be surveyed in the thesis.

In distributed optimization, the consensus among agents is one of the most essential objects to ensure convergence of the algorithm. Moreover, the consensus always relies on the topology of the network and the structure of the distributed algorithm. In this thesis, we mainly consider a class of distributed optimization problem in which the local objective function is known only at its associated agent. Each agent can not access the information of other agents directly. The global objective function is assumed to be the summation of these local objective functions. Under these backgrounds, some new consensus-based distributed algorithms over a class of time-varying network will be studied. In centralized and distributed optimization, convergence rate is an important object to describe the main feature of distributed algorithms, since the rate in terms of the total iteration step always provides a direct reflection on the effectiveness of the distributed algorithms. Hence, providing convergence rate for different classes of distributed algorithms is meaningful and also a hot topic. In this thesis, convergence rates will be studied systematically for different class of distributed algorithms.

As a counterpart of (distributed) optimization and a functional optimization scheme, (distributed) learning theory has also attracted much research interests due to its broad applications in areas related to data science. In real world, with the development of data mining, large-scale data are always collected in variety of application domains including financial engineering, medicine, business analysis, personal social network, sensor network and monitoring. In these applications, sensitive data such as personal data are always trained in ways of machine learning for different requirements. Hence, it is extremely important to protect data privacy. In recent years, distributed learning has been shown to be a powerful strategy to tackle privacy-preserving problems. Note that unprecedented large data size and complexity of distribution samples would always raise the difficulties of computing in distribution regression approach. Large-scale data would add unpredictable storage burden and memory capacity for a single machine. Meanwhile, it would take a huge amount of time for a single machine to process scientific computing on some regression process. Hence, it highly desirable to develop a distributed learning scheme to overcome aforementioned difficulties in handling data. In this thesis, we mainly focus
on a class of data: distribution data. Some novel distribution regression methods will be developed via integral operator technique.

In this thesis, the proposed learning algorithms aim at regressing to real valued outputs from probability measures. The theoretical analysis on distribution regression is far from maturity and quite challenging, since only second stage samples are observable in practical setting. In this thesis, we first present some new distribution regression algorithms to capture more features of distribution regression and derive optimal learning rates for the algorithms. Then, based on a divide-and-conquer approach, we design a new distributed learning algorithm for distribution regression. The learning rates of these algorithms will be studied systematically.

Finally, based on Nesterov’s zeroth-order technique, we develop some block coordinate type optimization and learning algorithms in the settings that the first-order gradient information can not be accessed easily. The algorithms mainly include block coordinate type methods based on two classes of classical algorithms: mirror descent algorithms and conditional gradient algorithms. The studies in this part will be conducted in framework of composite optimization.