Abstract
Large Vision-Language Models (LVLMs) have demonstrated capabilities in multimodal understanding, yet their vulnerability to adversarial attacks raises significant concerns. To achieve practical attacking, this paper aims at efficient and transferable untargeted attacks under limited perturbation sizes. Considering this objective, white-box attacks require full-model gradients and task-specific labels, making costs scale with tasks, while black-box attacks rely on proxy models, typically requiring large perturbation sizes and elaborate transfer strategies. Given the centrality and widespread reuse of the vision encoder in LVLMs, we adopt a gray-box setting that targets the vision encoder alone for efficient but effective attacking. We theoretically establish the feasibility of vision-encoder-only attacks, laying the foundation for our gray-box setting. Based on this analysis, we propose perturbing patch tokens rather than the class token, informed by both theoretical and empirical insights. We generate adversarial examples by minimizing the cosine similarity between clean and perturbed visual features, without accessing the subsequent models, tasks, or labels. This significantly reduces computational overhead while eliminating the task and label dependence. VEAttack has achieved a performance degradation of 94.5% on image caption task and 75.7% on visual question answering task. We also reveal some key observations to provide insights into LVLM attack/defense: 1) hidden layer variations of LLM, 2) token attention differential, 3) Mobius band in transfer attack, 4) low sensitivity to attack steps. The code is available at https://github.com/hefeimei06/VEAttack-LVLM.
| Original language | English |
|---|---|
| Title of host publication | The Fourteenth International Conference on Learning Representations |
| Publication status | Published - 23 Apr 2026 |
| Event | 14th International Conference on Learning Representations (ICLR 2026) - Riocentro Convention and Event Center, Rio de Janeiro, Brazil Duration: 23 Apr 2026 → 27 Apr 2026 https://iclr.cc/Conferences/2026 |
Conference
| Conference | 14th International Conference on Learning Representations (ICLR 2026) |
|---|---|
| Abbreviated title | ICLR 2026 |
| Place | Brazil |
| City | Rio de Janeiro |
| Period | 23/04/26 → 27/04/26 |
| Internet address |
Bibliographical note
Since this conference is yet to commence, the information for this record is subject to revision.Funding
This work was supported in part by Young Scientist Fund (No. 62406265) of NSFC, Start-up Grant (No. 9610680) of the City University of Hong Kong, and the Australian Research Council under Projects DP240101848 and FT230100549.
Research Keywords
- adversarial attack
- vision-encoder-only
- large vision language models
- downstream-agnostic
Fingerprint
Dive into the research topics of 'VEATTACK: DOWNSTREAM-AGNOSTIC VISION ENCODER ATTACK AGAINST LARGE VISION LANGUAGE MODELS'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver