Defeating Misclassification Attacks Against Transfer Learning
Research output: Journal Publications and Reviews (RGC: 21, 22, 62) › 21_Publication in refereed journal › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Journal / Publication | IEEE Transactions on Dependable and Secure Computing |
Online published | 25 Jan 2022 |
Publication status | Online published - 25 Jan 2022 |
Link(s)
DOI | DOI |
---|---|
Document Link | |
Link to Scopus | https://www.scopus.com/record/display.uri?eid=2-s2.0-85124092159&origin=recordpage |
Permanent Link | https://scholars.cityu.edu.hk/en/publications/publication(b63cc4be-e199-4b99-b4b9-1ac2148e5937).html |
Abstract
Transfer learning is prevalent as a technique to efficiently generate new models (Student models) based on the knowledge transferred from a pre-trained model (Teacher model). However, Teacher models are often publicly available for sharing and reuse, which inevitably introduces vulnerability to trigger severe attacks against transfer learning systems. In this paper, we take a first step towards mitigating one of the most advanced misclassification attacks in transfer learning. We design a distilled \emph{differentiator} via activation-based network pruning to enervate the attack transferability while retaining accuracy. We adopt an ensemble structure from variant differentiators to improve the defence robustness. To avoid the bloated ensemble size during inference, we propose two-phase defence, in which inference from the Student model is firstly performed to narrow down the candidate differentiators to be assembled, and later only a small, fixed number of them can be chosen to validate clean or reject adversarial inputs effectively. Our comprehensive evaluations on both large and small image recognition tasks confirm that the Student models with our defence of only 5 differentiators immune over 90\% the adversarial inputs with accuracy loss less than 10\%. Our comparison also demonstrates that our design outperforms prior problematic defences.
Research Area(s)
- Computational modeling, Data models, Deep neural network, Defence against adversarial examples, Mathematical models, Perturbation methods, Pre-trained model, Task analysis, Training, Transfer learning
Citation Format(s)
Defeating Misclassification Attacks Against Transfer Learning. / Wu, Bang; Wang, Shuo; Yuan, Xingliang; Wang, Cong; Rudolph, Carsten; Yang, Xiangwen.
In: IEEE Transactions on Dependable and Secure Computing, 25.01.2022.Research output: Journal Publications and Reviews (RGC: 21, 22, 62) › 21_Publication in refereed journal › peer-review