Skip to main navigation Skip to search Skip to main content

SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization

  • Xixu Hu
  • , Runkai Zheng
  • , Jindong Wang*
  • , Cheuk Hang Leung
  • , Qi Wu*
  • , Xing Xie
  • *Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Abstract

Vision Transformers (ViTs) are increasingly used in computer vision due to their high performance, but their vulnerability to adversarial attacks is a concern. Existing methods lack a solid theoretical basis, focusing mainly on empirical training adjustments. This study introduces SpecFormer, tailored to fortify ViTs against adversarial attacks, with theoretical underpinnings. We establish local Lipschitz bounds for the self-attention layer and propose the Maximum Singular Value Penalization (MSVP) to precisely manage these bounds By incorporating MSVP into ViTs’ attention layers, we enhance the model’s robustness without compromising training efficiency. SpecFormer, the resulting model, outperforms other state-of-the-art models in defending against adversarial attacks, as proven by experiments on CIFAR and ImageNet datasets. Code is released at https://github.com/microsoft/robustlearn. © The Author(s),
Original languageEnglish
Title of host publicationComputer Vision
Subtitle of host publicationECCV 2024
EditorsAleš Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, Gül Varol
PublisherSpringer, Cham
Pages345–362
Number of pages18
ISBN (Electronic)978-3-031-72949-2
ISBN (Print)978-3-031-72948-5
DOIs
Publication statusPublished - 31 Oct 2024
Event18th European Conference on Computer Vision (ECCV 2024) - MiCo Milano, Milan, Italy
Duration: 29 Sept 20244 Oct 2024
https://eccv.ecva.net/

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume15112
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference18th European Conference on Computer Vision (ECCV 2024)
Abbreviated titleECCV2024
PlaceItaly
CityMilan
Period29/09/244/10/24
Internet address

Bibliographical note

Full text of this publication does not contain sufficient affiliation information. With consent from the author(s) concerned, the Research Unit(s) information for this record is based on the existing academic department affiliation of the author(s)

Funding

Qi WU acknowledges the support from The CityU-JD Digits Joint Laboratory in Financial Technology and Engineering and The Hong Kong Research Grants Council [General Research Fund 11219420/9043008 ]. The work described in this paper was partially supported by the InnoHK initiative, the Government of the HKSAR, and the Laboratory for AI-Powered Financial Technologies.

Research Keywords

  • Vision Transformer
  • Adversarial Robustness
  • Lipschitz Continuity

RGC Funding Information

  • RGC-funded

Fingerprint

Dive into the research topics of 'SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization'. Together they form a unique fingerprint.

Cite this