AC-SGD : Adaptively Compressed SGD for Communication-Efficient Distributed Learning

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review

2 Scopus Citations
View graph of relations

Author(s)

Detail(s)

Original languageEnglish
Pages (from-to)2678-2693
Journal / PublicationIEEE Journal on Selected Areas in Communications
Volume40
Issue number9
Online published20 Jul 2022
Publication statusPublished - Sept 2022

Abstract

Gradient compression (e.g., gradient quantization and gradient sparsification) is a core technique in reducing communication costs in distributed learning systems. The recent trend of gradient compression is to use a varying number of bits across iterations, however, relying on empirical observations or engineering heuristics without a systematic treatment and analysis. To the best of our knowledge, a general dynamic gradient compression that leverages both quantization and sparsification techniques is still far from understanding. This paper proposes a novel Adaptively-Compressed Stochastic Gradient Descent (AC-SGD) strategy to adjust the number of quantization bits and the sparsification size with respect to the norm of gradients, the communication budget, and the remaining number of iterations. In particular, we derive an upper bound, tight in some cases, of the convergence error for arbitrary dynamic compression strategy. Then we consider communication budget constraints and propose an optimization formulation - denoted as the Adaptive Compression Problem (ACP) - for minimizing the deep model’s convergence error under such constraints. By solving the ACP, we obtain an enhanced compression algorithm that significantly improves model accuracy under given communication budget constraints. Finally, through extensive experiments on computer vision and natural language processing tasks on MNIST, CIFAR-10, CIFAR-100 and AG-News datasets, respectively, we demonstrate that our compression scheme significantly outperforms the state-of-the-art gradient compression methods in terms of mitigating communication costs.

Research Area(s)

  • Adaptation models, Communication-efficient, Convergence, Distributed Learning, Optimization, Quantization, Quantization (signal), Servers, Sparsification, Stochastic processes, Training