Abstract
Federated learning (FL) is a revolutionary machine learning paradigm that enables collaborative model training while protecting data privacy. It allows multiple edge devices (e.g., mobile or IoT) to train a shared model by exchanging gradients with a central server or among themselves. This preserves privacy and overcomes the communication bottleneck of centralized learning, making it suitable for applications with distributed data and high privacy needs like healthcare, finance, and edge computing.However, FL confronts several crucial challenges. The key issue is the communication efficiency during model training, as the gradient transmission between devices and the server demands significant network resources, especially for large-scale models and datasets. Although gradient compression techniques have been suggested to relieve this, they frequently bring compression errors that impair learning performance. In wireless FL scenarios like OTA-FL, non-uniform channel fading and noise interference exacerbate the communication complexity and make reliable model updates hard to achieve. Moreover, data heterogeneity across devices is another challenge. In real applications, data is often non-i.i.d. with significant variations in distribution on each device. This may cause slow convergence or even divergence of the training process since the aggregated gradients might not precisely represent the overall data distribution. The traditional algorithms (like FedAvg and FedProx) struggle to handle such heterogeneity effectively.
To address these challenges, this thesis presents three innovative contributions. First, we propose an adaptive Top-K in SGD framework for distributed learning. This method adaptively adjusts the sparsification degree of gradients based on the characteristics of the gradients and the communication budget. Through theoretical convergence analysis and extensive experiments on image classification and object detection tasks, we demonstrate that it achieves better convergence rates compared to existing gradient compression methods, both with and without error compensation.
Second, we introduce a novel power control strategy for OTA-FL with gradient compression. By deriving the optimality gap of the loss function under different power control policies and using linear approximations to handle complex terms, we optimize the transmit power of each device to minimize the impact of channel fading and noise. Numerical results show that our strategy significantly outperforms traditional power control methods in terms of convergence speed and prediction accuracy.
Finally, we design a decouple gradient algorithm for federated learning, inspired by generalization theory. The algorithm separates spurious and invariant data in the parameter space by analyzing gradient variances across mini-batches, partitioning gradient dimensions accordingly. It then assigns distinct learning rates to these dimensions, a larger one for high-variance dimensions to reduce spurious data's impact and a conservative one for low-variance dimensions to capture invariant relationships.
This algorithm acts as a general-purpose plugin for classic FL algorithms like FedAvg, FedProx, and FedOpt. Through experiments on benchmark datasets, we verify its efficacy in enhancing FL system training efficiency. It notably accelerates convergence in heterogeneous scenarios and improves the model's generalization ability, outperforming traditional methods in cross-domain and heterogeneous settings.
In summary, this thesis makes significant contributions to the field of FL by addressing key challenges related to communication efficiency and data heterogeneity. Our proposed methods provide practical solutions to enhance the performance and applicability of FL in various real-world applications, paving the way for more efficient and privacy-preserving distributed machine learning.
| Date of Award | 4 Jul 2025 |
|---|---|
| Original language | English |
| Awarding Institution |
|
| Supervisor | Weitao XU (Supervisor) |