Distributed Estimation with Random Projection in Reproducing Kernel Hilbert Spaces
DescriptionThe proposed research project will develop computationally efficient algorithms with focus on their theoretical properties in a centralized or decentralized distributed computing environment. Unlike many existing studies on parametric models, the specific setting we investigate is nonparametric regression in the reproducing kernel Hilbert space. In such estimation problems, data are distributed on multiple machines for various reasons (for example for data privacy, or simply to utilize the computational power of multiple machines). First, for distributed estimation with a central machine that aggregates the local estimators obtained by multiple machines using local data, it is well-known that the number of machines that can be used in such a system is limited and thus the size of local data can still be large. We propose to use random projection on each machine to reduce computation. Theoretically, it is a nontrivial problem to establish that such a combination does not significantly degrade performances in the distributed setting. Second, we consider a decentralized system in which there is not a central machine that communicates with all others. Instead, each machine regularly communicates with its neighbors to exchange intermediate results. In such a setting, we propose a gradient method in the functional space combined with random projection. Again we will try to establish optimal convergence rates that depend on the network topology, the mini-batch size for the gradient method, the number of machines, and the approximation from random projection. Finally, we will consider extensions to general non-least-squares losses. We will also investigate the numerical efficiency through extensive simulation studies in large-scale estimation problems.
|Effective start/end date||1/01/22 → …|