Abstract
In the deployment of distributed gradient descent algorithms in a network, the computational time and response time of a worker node are affected by various factors, such as processor speed, memory, network delay, and congestion. Straggler feedback delays can severely set back the effectiveness of distributed learning. To leverage the advantage of parallel processing in model training, gradient coding is designed to mitigate the effect of stragglers. This paper investigates gradient coding for heterogeneous workers with varying computational capabilities. We formulate the problem of approximating the gradient vector by minimizing the average error of the recovered gradient vector and propose a solution that leverages fractional repetition codes for data assignment to mitigate the impact of stragglers. © 2025 IEEE.
| Original language | English |
|---|---|
| Number of pages | 15 |
| Journal | IEEE Transactions on Vehicular Technology |
| DOIs | |
| Publication status | Online published - 2 Dec 2025 |
Funding
The work in this paper was partially presented in IEEE Int. Symp. on Information Theory, 2023. This work was supported in part by the National Key R&D Program of China under Grant 2022YFA1005000, the Basic Research Project No. HZQB-KCZYZ-2021067 of Hetao Shenzhen-HK S&T Cooperation Zone, the Guangdong Provincial Key Laboratory of Future Networks of Intelligence (Grant No. 2022B1212010001), and the Shenzhen Key Laboratory of Big Data and Artificial Intelligence (Grant No. SYSPG20241211173853027).
Research Keywords
- Computational offloading
- distributed learning
- fractional repetition code
- gradient coding
- straggler
Fingerprint
Dive into the research topics of 'Approximate Gradient Coding and Data Assignment in Distributed Computing Systems'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver