TY - GEN
T1 - Hijacking Attacks against Neural Network by Analyzing Training Data
AU - Ge, Yunjie
AU - Wang, Qian
AU - Huang, Huayang
AU - Li, Qi
AU - Wang, Cong
AU - Shen, Chao
AU - Zhao, Lingchen
AU - Jiang, Peipei
AU - Fang, Zheng
AU - Zhang, Shenyi
PY - 2024/8
Y1 - 2024/8
N2 - Backdoors and adversarial examples are the two primary threats currently faced by deep neural networks (DNNs). Both attacks attempt to hijack the model behaviors with unintended outputs by introducing (small) perturbations to the inputs. However, neither attack is without limitations in practice. Backdoor attacks, despite the high success rates, often require the strong assumption that the adversary could tamper with the training data or code of the target model, which is not always easy to achieve in reality. Adversarial example attacks, which put relatively weaker assumptions on attackers, often demand high computational resources, yet do not always yield satisfactory success rates when attacking mainstream blackbox models in the real world. These limitations motivate the following research question: can model hijacking be achieved in a simpler way with more satisfactory attack performance and also more reasonable attack assumptions? In this paper, we provide a positive answer with CleanSheet, a new model hijacking attack that obtains the high performance of backdoor attacks without requiring the adversary to temper with the model training process. CleanSheet exploits vulnerabilities in DNNs stemming from the training data. Specifically, our key idea is to treat part of the clean training data of the target model as “poisoned data”, and capture the characteristics of these data that are more sensitive to the model (typically called robust features) to construct “triggers”. These triggers can be added to any input example to mislead the target model, similar to backdoor attacks. We validate the effectiveness of CleanSheet through extensive experiments on five datasets, 79 normally trained models, 68 pruned models, and 39 defensive models. Results show that CleanSheet exhibits performance comparable to state-of-the-art backdoor attacks, achieving an average attack success rate (ASR) of 97.5% on CIFAR-100 and 92.4% on GTSRB, respectively. Furthermore, CleanSheet consistently maintains a high ASR, with most ASR surpassing 80%, when confronted with various mainstream backdoor defense mechanisms. © USENIX Security Symposium 2024. All rights reserved.
AB - Backdoors and adversarial examples are the two primary threats currently faced by deep neural networks (DNNs). Both attacks attempt to hijack the model behaviors with unintended outputs by introducing (small) perturbations to the inputs. However, neither attack is without limitations in practice. Backdoor attacks, despite the high success rates, often require the strong assumption that the adversary could tamper with the training data or code of the target model, which is not always easy to achieve in reality. Adversarial example attacks, which put relatively weaker assumptions on attackers, often demand high computational resources, yet do not always yield satisfactory success rates when attacking mainstream blackbox models in the real world. These limitations motivate the following research question: can model hijacking be achieved in a simpler way with more satisfactory attack performance and also more reasonable attack assumptions? In this paper, we provide a positive answer with CleanSheet, a new model hijacking attack that obtains the high performance of backdoor attacks without requiring the adversary to temper with the model training process. CleanSheet exploits vulnerabilities in DNNs stemming from the training data. Specifically, our key idea is to treat part of the clean training data of the target model as “poisoned data”, and capture the characteristics of these data that are more sensitive to the model (typically called robust features) to construct “triggers”. These triggers can be added to any input example to mislead the target model, similar to backdoor attacks. We validate the effectiveness of CleanSheet through extensive experiments on five datasets, 79 normally trained models, 68 pruned models, and 39 defensive models. Results show that CleanSheet exhibits performance comparable to state-of-the-art backdoor attacks, achieving an average attack success rate (ASR) of 97.5% on CIFAR-100 and 92.4% on GTSRB, respectively. Furthermore, CleanSheet consistently maintains a high ASR, with most ASR surpassing 80%, when confronted with various mainstream backdoor defense mechanisms. © USENIX Security Symposium 2024. All rights reserved.
UR - http://www.scopus.com/inward/record.url?scp=85205009479&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-85205009479&origin=recordpage
M3 - RGC 32 - Refereed conference paper (with host publication)
SN - 9781939133441
T3 - Proceedings of the USENIX Security Symposium
SP - 6867
EP - 6884
BT - Proceedings of the 33rd USENIX Security Symposium
PB - USENIX Association
T2 - 33rd USENIX Security Symposium (USENIX Security '24)
Y2 - 14 August 2024 through 16 August 2024
ER -