Abstract
Vision Transformers (ViTs) are increasingly used in computer vision due to their high performance, but their vulnerability to adversarial attacks is a concern. Existing methods lack a solid theoretical basis, focusing mainly on empirical training adjustments. This study introduces SpecFormer, tailored to fortify ViTs against adversarial attacks, with theoretical underpinnings. We establish local Lipschitz bounds for the self-attention layer and propose the Maximum Singular Value Penalization (MSVP) to precisely manage these bounds By incorporating MSVP into ViTs’ attention layers, we enhance the model’s robustness without compromising training efficiency. SpecFormer, the resulting model, outperforms other state-of-the-art models in defending against adversarial attacks, as proven by experiments on CIFAR and ImageNet datasets. Code is released at https://github.com/microsoft/robustlearn. © The Author(s),
| Original language | English |
|---|---|
| Title of host publication | Computer Vision |
| Subtitle of host publication | ECCV 2024 |
| Editors | Aleš Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, Gül Varol |
| Publisher | Springer, Cham |
| Pages | 345–362 |
| Number of pages | 18 |
| ISBN (Electronic) | 978-3-031-72949-2 |
| ISBN (Print) | 978-3-031-72948-5 |
| DOIs | |
| Publication status | Published - 31 Oct 2024 |
| Event | 18th European Conference on Computer Vision (ECCV 2024) - MiCo Milano, Milan, Italy Duration: 29 Sept 2024 → 4 Oct 2024 https://eccv.ecva.net/ |
Publication series
| Name | Lecture Notes in Computer Science |
|---|---|
| Publisher | Springer |
| Volume | 15112 |
| ISSN (Print) | 0302-9743 |
| ISSN (Electronic) | 1611-3349 |
Conference
| Conference | 18th European Conference on Computer Vision (ECCV 2024) |
|---|---|
| Abbreviated title | ECCV2024 |
| Place | Italy |
| City | Milan |
| Period | 29/09/24 → 4/10/24 |
| Internet address |
Bibliographical note
Full text of this publication does not contain sufficient affiliation information. With consent from the author(s) concerned, the Research Unit(s) information for this record is based on the existing academic department affiliation of the author(s)Funding
Qi WU acknowledges the support from The CityU-JD Digits Joint Laboratory in Financial Technology and Engineering and The Hong Kong Research Grants Council [General Research Fund 11219420/9043008 ]. The work described in this paper was partially supported by the InnoHK initiative, the Government of the HKSAR, and the Laboratory for AI-Powered Financial Technologies.
Research Keywords
- Vision Transformer
- Adversarial Robustness
- Lipschitz Continuity
RGC Funding Information
- RGC-funded
Fingerprint
Dive into the research topics of 'SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization'. Together they form a unique fingerprint.Projects
- 1 Finished
-
GRF: Generative Models of Multivariate Dependence for Asset Returns
WU, Q. (Principal Investigator / Project Coordinator)
1/01/21 → 29/12/25
Project: Research
Student theses
-
Towards More Adaptive and Reliable Artificial Intelligence: Exploration in Sequential Data, Images and Large Language Models
HU, X. (Author), WU, Q. (Supervisor) & ZHANG, Q. (Co-supervisor), 16 Jul 2024Student thesis: Doctoral Thesis
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver