Skip to main navigation Skip to search Skip to main content

Towards Black-Box Adversarial Attacks on Interpretable Deep Learning Systems

Yike Zhan, Baolin Zheng, Qian Wang*, Ningping Mou, Binqing Guo, Qi Li, Chao Shen, Cong Wang

*Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Abstract

Recent works have empirically shown that neural network interpretability is susceptible to malicious manipulations. However, existing attacks against Interpretable Deep Learning Systems (IDLSes) all focus on the white-box setting, which is obviously unpractical in real-world scenarios. In this paper, we make the first attempt to attack IDLSes in the decision-based black-box setting. We propose a new framework called Dual Black-box Adversarial Attack (DBAA) which can generate adversarial examples that are misclassified as the target class, yet have very similar interpretations to their benign cases. We conduct comprehensive experiments on different combinations of classifiers and interpreters to illustrate the effectiveness of DBAA. Empirical results show that in all the cases, DBAA achieves high attack success rates and Intersection over Union (IoU) scores.
Original languageEnglish
Title of host publicationIEEE ICME - IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO 2022
Subtitle of host publicationICME 2022 - CONFERENCE PROCEEDINGS
PublisherIEEE Computer Society
Number of pages6
ISBN (Electronic)9781665485630
ISBN (Print)9781665485647
DOIs
Publication statusPublished - 2022
Event2022 IEEE International Conference on Multimedia and Expo (ICME 2022) - Hybrid, Taipei, Taiwan, China
Duration: 18 Jul 202222 Jul 2022
https://2022.ieeeicme.org/

Publication series

NameProceedings - IEEE International Conference on Multimedia and Expo
Volume2022-July
ISSN (Print)1945-7871
ISSN (Electronic)1945-788X

Conference

Conference2022 IEEE International Conference on Multimedia and Expo (ICME 2022)
Abbreviated titleIEEE ICME 2022
PlaceTaiwan, China
CityTaipei
Period18/07/2222/07/22
Internet address

Funding

This work was supported in part by the National Key R&D Program of China (2020AAA0107701), NSFC under Grants U20B2049, U21B2018, 62161160337 and 62132011, Shaanxi Province Key Industry Innovation Program under Grant 2021ZDLGY01-02, Research Grants Council of Hong Kong under Grants N CityU139/21 and R6021-20F, and in part by BNRist under Grant BNR2020RC01013.

Research Keywords

  • adversarial examples
  • black-box attacks
  • Interpretable deep learning systems

RGC Funding Information

  • RGC-funded

Fingerprint

Dive into the research topics of 'Towards Black-Box Adversarial Attacks on Interpretable Deep Learning Systems'. Together they form a unique fingerprint.

Cite this