Mitigating Backdoor Attacks in Pre-trained Encoders via Self-supervised Knowledge Distillation

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

1 Scopus Citations
View graph of relations

Author(s)

  • Rongfang Bie
  • Jinxiu Jiang
  • Yu Guo
  • Yinbin Miao

Related Research Unit(s)

Detail(s)

Original languageEnglish
Pages (from-to)2613-2625
Number of pages13
Journal / PublicationIEEE Transactions on Services Computing
Volume17
Issue number5
Online published1 Jul 2024
Publication statusPublished - Sept 2024

Abstract

Pre-trained encoders in computer vision have recently received great attention from both research and industry communities. Among others, a promising paradigm is to utilize self-supervised learning (SSL) to train image encoders with massive unlabeled samples, thereby endowing encoders with the capability to embed abundant knowledge into the feature representations. Backdoor attacks on SSL disrupt the encoder's feature extraction capabilities, causing downstream classifiers to inherit backdoor behavior and leading to misclassification. Existing backdoor defense methods primarily focus on supervised learning scenarios and cannot be effectively migrated to SSL pre-trained encoders. In this article, we present a backdoor defense scheme based on self-supervised knowledge distillation. Our approach aims to eliminate backdoors while preserving the feature extraction capability using the downstream dataset. We incorporate the benefits of contrastive and non-contrastive SSL methods for knowledge distillation, ensuring differentiation between the representations of various classes and the consistency of representations within the same class. Consequently, the extraction capability of pre-trained encoders is preserved. Extensive experiments against multiple attacks demonstrate that the proposed scheme outperforms the state-of-the-art solutions. © 2024 IEEE.

Research Area(s)

  • Backdoor defense, Computational modeling, Feature extraction, Machine learning security, Security, Self-supervised learning, Supervised learning, Task analysis, Training