Exploring Generalization in Neural Networks: From Centralized Training to Federated Learning

Student thesis: Doctoral Thesis

Abstract

The deployment of machine learning models into heterogeneous data environments has necessitated the development of models that are robust in their generalization capabilities. This thesis presents a comprehensive exploration of model generalization under different constraints of machine learning: neuron coverage for enhancing model generalization encountering unseen Out-of-Distribution (OOD) data constraint; gradient alignment in Federated Learning for generalization under data privacy protection constraint. Furthermore, a novel federated learning framework, FedSGC, designed to address the challenges of model generalization while catering computational and communication constraints in edge computing environments. The collective insights from these studies offer a multifaceted approach to advancing machine learning applications under centralized and distributed settings and improving model generalization ability and robustness.

The first study delves into the concept of neuron coverage, a principle derived from software testing that emphasizes the importance of activating a broad spectrum of neurons during the training of deep neural networks (DNNs). By treating each neuron as a functional point within the DNN’s architecture, we propose a training regimen that maximizes neuron coverage through a coverage loss term, which is formulated by aggregating neuron activation and incorporating gradient similarity regularization. This method, termed Neuron Coverage-Guided Domain Generalization (NCDG), is shown to enhance DNN generalization by optimizing decision behavior and minimizing the risk of misclassification on out-of-distribution (OOD) samples. The effectiveness of NCDG is demonstrated through extensive experiments across various domain generalization tasks, outperforming state-of-the-art methods.

The second study addresses the challenge of privacy preservation in the context of domain generalization. It introduces a method that improves DNN generalization without compromising data privacy, a critical consideration for applications such as medical imaging classification. The proposed approach leverages gradient alignment within a centralized server to aggregate information from distributed datasets, aligning distributions across domains without direct data sharing. This method is underpinned by the Maximum Mean Discrepancy (MMD), a measure of distribution distance, and is shown to achieve superior cross-domain generalization compared to existing federated learning methods.

The third study introduces FedSGC, a novel federated learning framework that integrates dynamic sparse training with gradient congruity inspection. FedSGC is designed to overcome the limitations of traditional federated learning, which can be hindered by high computational and communication costs, especially in resource-constrained edge computing environments. By pruning neurons with conflicting gradients and prioritizing the growth of those with consistent gradients, FedSGC significantly reduces local computation and communication overheads while p. The framework’s efficacy is demonstrated through evaluations on challenging non-i.i.d. (OOD) datasets, where it achieves competitive accuracy with minimal computation and communication costs.

In all, in this thesis, we make contributions to improve the deep neuron network model generalization ability and robustness from the following three aspects: First, we presents a neuron coverage maximization approach for improving DNN generalization, offering a new perspective on training robust models. Second, we introduces a privacypreserving domain generalization method that aligns with the MMD framework, providing a solution for enhancing model generalization without data sharing. Third, we proposes FedSGC, a federated learning framework that addresses the challenges of data heterogeneity and resource constraints in edge computing, setting a new standard for efficient and privacy-preserving machine learning.

The implications of this work extend to various sectors where data privacy and model performance are paramount. By providing a framework that enhances model generalization while preserving privacy, this thesis contributes to the advancement of machine learning methodologies suitable for sensitive applications. Our proposed methods are grounded in theoretical innovation and empirical validation, offering a solid foundation for future research and practical application in domains such as healthcare, finance, and autonomous systems.
Date of Award17 Jul 2024
Original languageEnglish
Awarding Institution
  • City University of Hong Kong
SupervisorShiqi WANG (Supervisor)

Cite this

'