Abstract
Vision Transformers (ViTs) have emerged as a compelling alternative to Convolutional Neural Networks (CNNs) in the realm of computer vision, showcasing tremendous potential. However, recent research has un-veiled a susceptibility of ViTs to adversarial attacks, akin to their CNN counterparts. Adversarial training and randomization are two representative effective defenses for CNNs. Some researchers have attempted to apply adversarial training to ViTs and achieved comparable robustness to CNNs, while it is not easy to directly apply randomization to ViTs because of the architecture difference between CNNs and ViTs. In this paper, we delve into the structural intricacies of ViTs and propose a novel defense mechanism termed Random entangled image Transformer (ReiT), which seamlessly integrates adversarial training and randomization to bolster the adversarial robustness of ViTs. Recognizing the challenge posed by the structural disparities between ViTs and CNNs, we introduce a novel module, input-independent random entangled self-attention (II-ReSA). This module op-timizes random entangled tokens that lead to 'dissimilar' self-attention outputs by leveraging model parameters and the sampled random tokens, thereby synthesizing the self-attention module outputs and random entangled tokens to diminish adversarial similarity. ReiT incorporates two distinct random entangled tokens and employs dual randomization, offering an effective countermeasure against adversarial examples while ensuring comprehensive deduction guarantees. Through extensive experiments conducted on various ViT variants and benchmarks, we substantiate the superiority of our proposed method in enhancing the adversarial robustness of Vision Transformers. © 2024 IEEE.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition |
| Publisher | IEEE |
| Pages | 24554-24563 |
| Number of pages | 10 |
| ISBN (Electronic) | 979-8-3503-5300-6 |
| ISBN (Print) | 979-8-3503-5301-3 |
| DOIs | |
| Publication status | Published - 2024 |
| Event | 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024) - Seattle Convention Center, Seattle, United States Duration: 17 Jun 2024 → 21 Jun 2024 https://cvpr.thecvf.com/Conferences/2024 https://ieeexplore.ieee.org/xpl/conhome/1000147/all-proceedings https://cvpr.thecvf.com/virtual/2024/index.html |
Publication series
| Name | Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition |
|---|---|
| Publisher | IEEE Computer Society |
| ISSN (Print) | 1063-6919 |
| ISSN (Electronic) | 2575-7075 |
Conference
| Conference | 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024) |
|---|---|
| Place | United States |
| City | Seattle |
| Period | 17/06/24 → 21/06/24 |
| Internet address |
Research Keywords
- Adversarial Robustness
- Randomized Defence
- Self-Attention Mechanism
- Vision Transformers
Fingerprint
Dive into the research topics of 'Random Entangled Tokens for Adversarially Robust Vision Transformer'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver