Skip to main navigation Skip to search Skip to main content

Random Entangled Tokens for Adversarially Robust Vision Transformer

Huihui Gong, Minjing Dong, Siqi Ma, Seyit Camtepe, Surya Nepal, Chang Xu

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Abstract

Vision Transformers (ViTs) have emerged as a compelling alternative to Convolutional Neural Networks (CNNs) in the realm of computer vision, showcasing tremendous potential. However, recent research has un-veiled a susceptibility of ViTs to adversarial attacks, akin to their CNN counterparts. Adversarial training and randomization are two representative effective defenses for CNNs. Some researchers have attempted to apply adversarial training to ViTs and achieved comparable robustness to CNNs, while it is not easy to directly apply randomization to ViTs because of the architecture difference between CNNs and ViTs. In this paper, we delve into the structural intricacies of ViTs and propose a novel defense mechanism termed Random entangled image Transformer (ReiT), which seamlessly integrates adversarial training and randomization to bolster the adversarial robustness of ViTs. Recognizing the challenge posed by the structural disparities between ViTs and CNNs, we introduce a novel module, input-independent random entangled self-attention (II-ReSA). This module op-timizes random entangled tokens that lead to 'dissimilar' self-attention outputs by leveraging model parameters and the sampled random tokens, thereby synthesizing the self-attention module outputs and random entangled tokens to diminish adversarial similarity. ReiT incorporates two distinct random entangled tokens and employs dual randomization, offering an effective countermeasure against adversarial examples while ensuring comprehensive deduction guarantees. Through extensive experiments conducted on various ViT variants and benchmarks, we substantiate the superiority of our proposed method in enhancing the adversarial robustness of Vision Transformers. © 2024 IEEE.
Original languageEnglish
Title of host publicationProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
PublisherIEEE
Pages24554-24563
Number of pages10
ISBN (Electronic)979-8-3503-5300-6
ISBN (Print)979-8-3503-5301-3
DOIs
Publication statusPublished - 2024
Event2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024)
- Seattle Convention Center, Seattle, United States
Duration: 17 Jun 202421 Jun 2024
https://cvpr.thecvf.com/Conferences/2024
https://ieeexplore.ieee.org/xpl/conhome/1000147/all-proceedings
https://cvpr.thecvf.com/virtual/2024/index.html

Publication series

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
PublisherIEEE Computer Society
ISSN (Print)1063-6919
ISSN (Electronic)2575-7075

Conference

Conference2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024)
PlaceUnited States
CitySeattle
Period17/06/2421/06/24
Internet address

Research Keywords

  • Adversarial Robustness
  • Randomized Defence
  • Self-Attention Mechanism
  • Vision Transformers

Fingerprint

Dive into the research topics of 'Random Entangled Tokens for Adversarially Robust Vision Transformer'. Together they form a unique fingerprint.

Cite this