Unveiling and Learning Generalized Neural Networks through Explainable Methods

Student thesis: Doctoral Thesis

Abstract

Recent years have witnessed the rapid development of machine learning systems, which has brought about remarkable progress in generation techniques, explainable AI, and generalized neural networks. Despite these achievements, critical challenges still persist in ensuring the reliability of these high-performing systems, especially under various distributional shifts caused by complex and dynamic environments. For example, machine learning models often fail to generalize effectively when exposed to unseen test data, such as those encountered in real-world scenarios with varying sensor inputs, light changes, or human manipulations. Additionally, the lack of faithful and interpretable explanations for model decisions further limits their deployment in high-stakes applications where trust and transparency are paramount. This thesis aims to tackle these challenges through novel methodologies and advanced explainable techniques, contributing to the development of more robust and interpretable machine learning systems.

The first part of this thesis focuses on unveiling the faithfulness status of existing explanation approaches, particularly in prevelant attention-based models, e.g., transformers. Through our broad experimental sweep, we identify a key limitation in current explanation methods: their inability to capture the polarity of feature importance. To address this, we propose the faithfulness violation test, a diagnostic tool to re-evaluate explanation reliability. Extensive experiments demonstrate that existing explanation methods often fail to reflect the true reasoning process, and our framework provides actionable insights for faithfulness improvement and future adoption of explanation methods in attention-based architectures.

In the second part, we tackle the challenge of learning generalized models under distributional shifts. We draw that the element-wise alignment strategy could lead to collapse problems in feature representations. To this end, we introduce a plug-and-play approach, concept contrast (CoCo), alleviating overfitting problems in invariant feature learning through explainable ways. By focusing on high-level concepts in neurons instead of individual feature elements, CoCo significantly improves feature diversity and generalization capabilities. Evaluations across multiple benchmarks further highlight the effectiveness of CoCo in enhancing robustness and adaptability in generalization tasks.

The third part of this thesis explores neuron-level interpretability in out-of-distribution (OOD) scenarios. Beyond the traditional understanding of neuron activities, we first reformulate neuron activation states by considering both the neuron output and its influence on model decisions. Then, to characterize the relationship between neurons and OOD issues, we propose neuron activation coverage (NAC), a simple explanation tool that quantifies neuron behaviors under distribution shifts. NAC offers a unified framework for OOD detection and generalization, demonstrating strong correlations with model robustness and enabling the identification of unexpected neuron behaviors. Our experiments further reveal two key findings: 1) OOD data can be identified based on the neuron behavior, which significantly eases the OOD detection problem and beats the 21 previous methods over three benchmarks (CIFAR-10, CIFAR-100, and ImageNet-1K). 2) a positive correlation between NAC and model generalization ability consistently holds across architectures and datasets, which enables an NAC-based criterion for evaluating model generalization ability.

Overall, this thesis contributes to the fields of explainable AI, robust generalization, and neuron-level interpretability by introducing three key advancements. First, it unveils the limitations of existing explanation methods and proposes the Faithfulness Violation Test to improve explanation reliability in attention-based models. Second, it addresses the challenge of distributional shifts through Concept Contrast (CoCo), a novel framework that enhances feature diversity and model generalization capability. Third, it introduces Neuron Activation Coverage (NAC), a powerful yet explainable tool for OOD detection and generalization by modeling neuron behaviors. These contributions pave the way for building interpretable, adaptable, and reliable machine learning systems that thrive in diverse and challenging real-world environments.
Date of Award24 Apr 2025
Original languageEnglish
Awarding Institution
  • City University of Hong Kong
SupervisorShiqi WANG (Supervisor)

Keywords

  • Explainable AI
  • robust generalization
  • neuron-level interpretability
  • distribution shifts
  • out-of-distribution detection and generalization
  • contrastive learning

Cite this

'