Hijacking Attacks against Neural Network by Analyzing Training Data

Yunjie Ge, Qian Wang, Huayang Huang, Qi Li, Cong Wang, Chao Shen, Lingchen Zhao*, Peipei Jiang, Zheng Fang, Shenyi Zhang

*Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

1 Citation (Scopus)

Abstract

Backdoors and adversarial examples are the two primary threats currently faced by deep neural networks (DNNs). Both attacks attempt to hijack the model behaviors with unintended outputs by introducing (small) perturbations to the inputs. However, neither attack is without limitations in practice. Backdoor attacks, despite the high success rates, often require the strong assumption that the adversary could tamper with the training data or code of the target model, which is not always easy to achieve in reality. Adversarial example attacks, which put relatively weaker assumptions on attackers, often demand high computational resources, yet do not always yield satisfactory success rates when attacking mainstream blackbox models in the real world. These limitations motivate the following research question: can model hijacking be achieved in a simpler way with more satisfactory attack performance and also more reasonable attack assumptions? In this paper, we provide a positive answer with CleanSheet, a new model hijacking attack that obtains the high performance of backdoor attacks without requiring the adversary to temper with the model training process. CleanSheet exploits vulnerabilities in DNNs stemming from the training data. Specifically, our key idea is to treat part of the clean training data of the target model as “poisoned data”, and capture the characteristics of these data that are more sensitive to the model (typically called robust features) to construct “triggers”. These triggers can be added to any input example to mislead the target model, similar to backdoor attacks. We validate the effectiveness of CleanSheet through extensive experiments on five datasets, 79 normally trained models, 68 pruned models, and 39 defensive models. Results show that CleanSheet exhibits performance comparable to state-of-the-art backdoor attacks, achieving an average attack success rate (ASR) of 97.5% on CIFAR-100 and 92.4% on GTSRB, respectively. Furthermore, CleanSheet consistently maintains a high ASR, with most ASR surpassing 80%, when confronted with various mainstream backdoor defense mechanisms. © USENIX Security Symposium 2024. All rights reserved.
Original languageEnglish
Title of host publicationProceedings of the 33rd USENIX Security Symposium
PublisherUSENIX Association
Pages6867-6884
ISBN (Print)9781939133441
Publication statusPublished - Aug 2024
Event33rd USENIX Security Symposium (USENIX Security '24) - Philadelphia Marriott Downtown, Philadelphia, United States
Duration: 14 Aug 202416 Aug 2024
https://www.usenix.org/conference/usenixsecurity24

Publication series

NameProceedings of the USENIX Security Symposium

Conference

Conference33rd USENIX Security Symposium (USENIX Security '24)
PlaceUnited States
CityPhiladelphia
Period14/08/2416/08/24
Internet address

Funding

We thank the anonymous reviewers for their helpful and valuable feedback. This work was partially supported by National Key R&D Program of China under Grant 2020AAA0107702, the NSFC under Grants U20B2049, U21B2018, 62302344, 62132011, 62161160337, 61822309, 61773310, U1736205, and 61802166, HK RGC under Grants R6021-20F and N_CityU139/21, and Shaanxi Province Key Industry Innovation Program under Grant 2021ZDLGY01-02.

Fingerprint

Dive into the research topics of 'Hijacking Attacks against Neural Network by Analyzing Training Data'. Together they form a unique fingerprint.

Cite this