Mitigating Backdoor Attacks in Pre-trained Encoders via Self-supervised Knowledge Distillation
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Pages (from-to) | 2613-2625 |
Number of pages | 13 |
Journal / Publication | IEEE Transactions on Services Computing |
Volume | 17 |
Issue number | 5 |
Online published | 1 Jul 2024 |
Publication status | Published - Sept 2024 |
Link(s)
Abstract
Pre-trained encoders in computer vision have recently received great attention from both research and industry communities. Among others, a promising paradigm is to utilize self-supervised learning (SSL) to train image encoders with massive unlabeled samples, thereby endowing encoders with the capability to embed abundant knowledge into the feature representations. Backdoor attacks on SSL disrupt the encoder's feature extraction capabilities, causing downstream classifiers to inherit backdoor behavior and leading to misclassification. Existing backdoor defense methods primarily focus on supervised learning scenarios and cannot be effectively migrated to SSL pre-trained encoders. In this article, we present a backdoor defense scheme based on self-supervised knowledge distillation. Our approach aims to eliminate backdoors while preserving the feature extraction capability using the downstream dataset. We incorporate the benefits of contrastive and non-contrastive SSL methods for knowledge distillation, ensuring differentiation between the representations of various classes and the consistency of representations within the same class. Consequently, the extraction capability of pre-trained encoders is preserved. Extensive experiments against multiple attacks demonstrate that the proposed scheme outperforms the state-of-the-art solutions. © 2024 IEEE.
Research Area(s)
- Backdoor defense, Computational modeling, Feature extraction, Machine learning security, Security, Self-supervised learning, Supervised learning, Task analysis, Training
Citation Format(s)
Mitigating Backdoor Attacks in Pre-trained Encoders via Self-supervised Knowledge Distillation. / Bie, Rongfang; Jiang, Jinxiu; Xie, Hongcheng et al.
In: IEEE Transactions on Services Computing, Vol. 17, No. 5, 09.2024, p. 2613-2625.
In: IEEE Transactions on Services Computing, Vol. 17, No. 5, 09.2024, p. 2613-2625.
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review