On the Threats and Countermeasures of Adversarial Inputs for Machine Learning Systems
機器學習系統對抗性輸入的威脅與對策研究
Student thesis: Doctoral Thesis
Author(s)
Related Research Unit(s)
Detail(s)
Awarding Institution | |
---|---|
Supervisors/Advisors |
|
Award date | 30 Jan 2024 |
Link(s)
Permanent Link | https://scholars.cityu.edu.hk/en/theses/theses(ab35539f-f4c6-43ab-a1b5-9590b5cf7f12).html |
---|---|
Other link(s) | Links |
Abstract
Machine learning systems are bringing ever-growing societal impacts at scale, as they are being increasingly integrated into various critical applications such as autonomous driving, FinTech, medical diagnosis and so on. In such settings, ensuring the reliability and security of machine learning systems is crucial. However, there is a rising recognition that machine learning systems are susceptible to adversarial inputs, which can result in misdecisions, privacy breaches, and intellectual property theft. The situation is further exacerbated by the fact that most data come from open and untrusted sources. As a result, it is essential to address the limitations of machine learning systems against adversarial inputs to develop more dependable and trustworthy applications.
Using computer vision as an example, this dissertation presents a holistic study of the adversarial input threats throughout the entire lifecycle of modern machine learning systems, including data preparation, model training, and model deployment. First, we demonstrate a novel content disguising attack called the scaling camouflage attack, which targets the data preparation phase. This attack exploits the standard image scaling algorithm used by the data preprocessing procedure and causes the machine's extracted content to be dramatically different from the original. Different from the well-known adversarial example attack, our attack happen in the data preprocessing stage, and hence it is not subject to specific machine learning models. Second, we show a new data poisoning attack called the membership poisoning attack, which targets the model training phase. This attack amplifies the membership exposure of benign samples and shows a connection between data integrity and data confidently. Our attack reveals that there exists a connection between the data integrity and data confidently, which still remains underexplored by prior arts. Third, we investigate the teacher model exposure threat in the transfer learning context and propose a teacher model fingerprinting attack to infer the origin of a student model efficiently, which targets the model deployment phase. Unlike existing model reverse engineering approaches, our proposed fingerprinting method neither relies on fine-grained model outputs, e.g., posteriors, nor auxiliary information of the model architecture or training dataset.
For each adversarial attack, we propose potential countermeasures. Through these presented adversarial input attacks, this dissertation aims to highlight the challenges and advance the research frontier in tackling adversarial inputs threatening machine learning systems, particularly when they operate in a more open and even adversarial environment.
Using computer vision as an example, this dissertation presents a holistic study of the adversarial input threats throughout the entire lifecycle of modern machine learning systems, including data preparation, model training, and model deployment. First, we demonstrate a novel content disguising attack called the scaling camouflage attack, which targets the data preparation phase. This attack exploits the standard image scaling algorithm used by the data preprocessing procedure and causes the machine's extracted content to be dramatically different from the original. Different from the well-known adversarial example attack, our attack happen in the data preprocessing stage, and hence it is not subject to specific machine learning models. Second, we show a new data poisoning attack called the membership poisoning attack, which targets the model training phase. This attack amplifies the membership exposure of benign samples and shows a connection between data integrity and data confidently. Our attack reveals that there exists a connection between the data integrity and data confidently, which still remains underexplored by prior arts. Third, we investigate the teacher model exposure threat in the transfer learning context and propose a teacher model fingerprinting attack to infer the origin of a student model efficiently, which targets the model deployment phase. Unlike existing model reverse engineering approaches, our proposed fingerprinting method neither relies on fine-grained model outputs, e.g., posteriors, nor auxiliary information of the model architecture or training dataset.
For each adversarial attack, we propose potential countermeasures. Through these presented adversarial input attacks, this dissertation aims to highlight the challenges and advance the research frontier in tackling adversarial inputs threatening machine learning systems, particularly when they operate in a more open and even adversarial environment.