Enhancing the Reliability of Intelligent Software Systems Through Novel Automated Testing Approaches

Student thesis: Doctoral Thesis

Abstract

The integration of deep learning (DL) techniques has catalyzed a paradigm shift in software development, moving systems from rule-based logic to data-driven intelligence. This is not merely an incremental improvement but a breakthrough that enables new classes of applications. For instance, DL-based models achieve outstanding performance in diverse domains, including computer vision, medical imaging, and financial fraud predictions. While the DL-based systems have exhibited remarkable success, they remain susceptible to out-of-distribution (OOD) input data, including adversarial samples, corrupted samples, and data with natural domain shifts. Understanding the challenges inherent in the usage of DL is crucial for developing and commercializing reliable and robust intelligent software systems. Typically, these challenges are mainly caused by the black-box characteristics of Deep Neural Networks (DNNs), resulting in difficulties in explaining and localizing the root cause of misbehaviors, thus making it harder to repair. One mainstream practice to alleviate these challenges is to generate representative and meanwhile, valid test cases (in the thesis, we use the terms test inputs and test cases interchangeably) that effectively reveal diverse faults in DNNs and use them to enhance the DNN under test.

In this thesis, we embark on a comprehensive exploration of four pivotal aspects in deep learning testing, each tackling crucial challenges in testing intelligent software systems: (1) Adversarial test input generation for autonomous driving systems(ADSs), (2) Transferable test input generation for ADSs, (3) Valid test input generation for image data, and (4) Cross-domain ensemble-based test input selection. We propose three novel DL-based approaches (UniAda, validity testing, and ensemble-based testing) and conduct empirical studies to address ongoing research challenges.

Specifically, recent research on testing intelligent systems has predominantly focused on generating adversarial test cases to trigger system misbehavior. In the realm of autonomous driving research, the focus of existing adversarial attack methods on end-to-end (E2E) ADSs has predominantly centered on misbehaviors of steering angle, which overlooks speed-related controls or imperceptible perturbations. To address these challenges, we introduce UniAda, a multi-objective white-box attack technique with a core function that revolves around crafting an image-agnostic adversarial perturbation capable of simultaneously influencing both steering and speed controls. UniAda capitalizes on an intricately designed multiobjective optimization function with the adaptive weighting scheme (AWS), enabling the concurrent optimization of diverse objectives. Validated with both simulated and real-world driving data, UniAda outperforms five benchmarks across twometrics. This systematic approach establishes UniAda as a proven testing technique to enhance the reliability of DL-based E2E ADSs.

To ensure the generalizability of the generated adversarial test cases for ADSs, it is important to assess their transferability. A variety of transferability-enhancement approaches have been proposed, categorized into input transformation enhancements and attack objective enhancements. However, most of them are validated on image classification tasks only, there is limited research exploring methods to enhance attack transferability and assess the effectiveness of transferability-enhancement techniques on E2E ADSs. This part of the thesis seeks to fill the gap in autonomous driving research by conducting an empirical study on different transferability-enhancement techniques for steering models and identifying the most effective ones.

Numerous proposed automated test generation techniques are capable of automatically generating thousands of test cases. However, many of them lack a validity checking mechanism and have been shown to generate invalid inputs. Valid test input generation is pivotal in reliably enhancing the DNN under test, since the DNN is expected to only generalize within the valid input distribution, and invalid test inputs should not be predicted and may lead to unreliable testing. To mitigate this, we propose a novel framework for validity testing, which improves from existing testing frameworks by incorporating distribution awareness through joint optimization. By introducing distribution-based objectives, our proposed framework excels in generating valid test inputs, validated by both automated and human assessors, which establishes our method as an effective technique for valid test input generation.

Another challenge associated with a large number of unlabeled generated test cases lies in selecting a representative subset for manual labeling, which can then be used to enhance the performance of DNNs. Efficient test input selection and prioritization have emerged as promising approaches to ensure the quality of the selected test suite in supporting the improvement of intelligent DL systems. Identifying test inputs that are likely to reveal diverse DNN faults can significantly reduce manual labeling efforts and expedite the DNN repair process. However, when applying existing test input selection techniques in practice, we have observed three limitations: 1. Lack of consideration for the consensus of results from multiple independent techniques (i.e., selecting inputs based on a fused score from multiple techniques); 2. Insufficient effectiveness evaluation of these techniques; 3. Limited scope of tasks and datasets. In this part of the thesis, we address these three limitations correspondingly by proposing an ensemble-based testing technique to consider the consensus decision, evaluating the effectiveness of these techniques with comprehensive criteria, and assessing the performance on both autonomous driving and malware detection domains. By implementing our proposed test selection technique, researchers can significantly improve the DNN repair process.

In summary, this thesis introduces novel approaches and empirical studies geared toward enhancing automated testing approaches for intelligent software systems. Through our proposed approaches and empirical insights, we aim to make meaningful contributions to the continual advancement of robust and effective techniques within these domains.
Date of Award28 Aug 2025
Original languageEnglish
Awarding Institution
  • City University of Hong Kong
SupervisorJacky Wai Keung (Supervisor)

Keywords

  • Deep Learning Testing
  • Multi-objective Testing
  • Test Input Transferability
  • Test Input Validation
  • Ensemble-based Testing

Cite this

'