Skip to main navigation Skip to search Skip to main content

VEATTACK: DOWNSTREAM-AGNOSTIC VISION ENCODER ATTACK AGAINST LARGE VISION LANGUAGE MODELS

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Abstract

Large Vision-Language Models (LVLMs) have demonstrated capabilities in multimodal understanding, yet their vulnerability to adversarial attacks raises significant concerns. To achieve practical attacking, this paper aims at efficient and transferable untargeted attacks under limited perturbation sizes. Considering this objective, white-box attacks require full-model gradients and task-specific labels, making costs scale with tasks, while black-box attacks rely on proxy models, typically requiring large perturbation sizes and elaborate transfer strategies. Given the centrality and widespread reuse of the vision encoder in LVLMs, we adopt a gray-box setting that targets the vision encoder alone for efficient but effective attacking. We theoretically establish the feasibility of vision-encoder-only attacks, laying the foundation for our gray-box setting. Based on this analysis, we propose perturbing patch tokens rather than the class token, informed by both theoretical and empirical insights. We generate adversarial examples by minimizing the cosine similarity between clean and perturbed visual features, without accessing the subsequent models, tasks, or labels. This significantly reduces computational overhead while eliminating the task and label dependence. VEAttack has achieved a performance degradation of 94.5% on image caption task and 75.7% on visual question answering task. We also reveal some key observations to provide insights into LVLM attack/defense: 1) hidden layer variations of LLM, 2) token attention differential, 3) Mobius band in transfer attack, 4) low sensitivity to attack steps. The code is available at https://github.com/hefeimei06/VEAttack-LVLM.
Original languageEnglish
Title of host publicationThe Fourteenth International Conference on Learning Representations
Publication statusPublished - 23 Apr 2026
Event14th International Conference on Learning Representations (ICLR 2026) - Riocentro Convention and Event Center, Rio de Janeiro, Brazil
Duration: 23 Apr 202627 Apr 2026
https://iclr.cc/Conferences/2026

Conference

Conference14th International Conference on Learning Representations (ICLR 2026)
Abbreviated titleICLR 2026
PlaceBrazil
CityRio de Janeiro
Period23/04/2627/04/26
Internet address

Bibliographical note

Since this conference is yet to commence, the information for this record is subject to revision.

Funding

This work was supported in part by Young Scientist Fund (No. 62406265) of NSFC, Start-up Grant (No. 9610680) of the City University of Hong Kong, and the Australian Research Council under Projects DP240101848 and FT230100549.

Research Keywords

  • adversarial attack
  • vision-encoder-only
  • large vision language models
  • downstream-agnostic

Fingerprint

Dive into the research topics of 'VEATTACK: DOWNSTREAM-AGNOSTIC VISION ENCODER ATTACK AGAINST LARGE VISION LANGUAGE MODELS'. Together they form a unique fingerprint.

Cite this