Skip to main navigation Skip to search Skip to main content

Approximate Gradient Coding and Data Assignment in Distributed Computing Systems

  • Haojun Li (Co-first Author)
  • , Yi Chen* (Co-first Author)
  • , Kenneth W. Shum*
  • , Chi Wan Sung
  • *Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

Abstract

In the deployment of distributed gradient descent algorithms in a network, the computational time and response time of a worker node are affected by various factors, such as processor speed, memory, network delay, and congestion. Straggler feedback delays can severely set back the effectiveness of distributed learning. To leverage the advantage of parallel processing in model training, gradient coding is designed to mitigate the effect of stragglers. This paper investigates gradient coding for heterogeneous workers with varying computational capabilities. We formulate the problem of approximating the gradient vector by minimizing the average error of the recovered gradient vector and propose a solution that leverages fractional repetition codes for data assignment to mitigate the impact of stragglers. © 2025 IEEE.
Original languageEnglish
Number of pages15
JournalIEEE Transactions on Vehicular Technology
DOIs
Publication statusOnline published - 2 Dec 2025

Funding

The work in this paper was partially presented in IEEE Int. Symp. on Information Theory, 2023. This work was supported in part by the National Key R&D Program of China under Grant 2022YFA1005000, the Basic Research Project No. HZQB-KCZYZ-2021067 of Hetao Shenzhen-HK S&T Cooperation Zone, the Guangdong Provincial Key Laboratory of Future Networks of Intelligence (Grant No. 2022B1212010001), and the Shenzhen Key Laboratory of Big Data and Artificial Intelligence (Grant No. SYSPG20241211173853027).

Research Keywords

  • Computational offloading
  • distributed learning
  • fractional repetition code
  • gradient coding
  • straggler

Fingerprint

Dive into the research topics of 'Approximate Gradient Coding and Data Assignment in Distributed Computing Systems'. Together they form a unique fingerprint.

Cite this