Learning Where to Edit Vision Transformers

Yunqiao Yang, Long-Kai Huang*, Shengzhuang Chen, Kede Ma, Ying Wei*

*Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Abstract

Model editing aims to data-efficiently correct predictive errors of large pre-trained models while ensuring generalization to neighboring failures and locality to minimize unintended effects on unrelated examples. While significant progress has been made in editing Transformer-based large language models, effective strategies for editing vision Transformers (ViTs) in computer vision remain largely untapped. In this paper, we take initial steps towards correcting predictive errors of ViTs, particularly those arising from subpopulation shifts. Taking a locate-then-edit approach, we first address the “where-to-edit” challenge by meta-learning a hypernetwork on CutMix-augmented data generated for editing reliability. This trained hypernetwork produces generalizable binary masks that identify a sparse subset of structured model parameters, responsive to real-world failure samples. Afterward, we solve the “how-to-edit” problem by simply fine-tuning the identified parameters using a variant of gradient descent to achieve successful edits. To validate our method, we construct an editing benchmark that introduces subpopulation shifts towards natural underrepresented images and AI-generated images, thereby revealing the limitations of pre-trained ViTs for object recognition. Our approach not only achieves superior performance on the proposed benchmark but also allows for adjustable trade-offs between generalization and locality. Our code is available at https://github.com/hustyyq/Where-to-Edit. © 2024 Neural information processing systems foundation. All rights reserved.
Original languageEnglish
Title of host publicationNeurIPS Proceedings
Subtitle of host publicationAdvances in Neural Information Processing Systems 37 (NeurIPS 2024)
EditorsA. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, C. Zhang
PublisherNeural Information Processing Systems (NeurIPS)
Publication statusPublished - 2024
Event38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024) - Vancouver Convention Center, Vancouver, Canada
Duration: 10 Dec 202415 Dec 2024
https://neurips.cc/
https://proceedings.neurips.cc/

Publication series

NameAdvances in Neural Information Processing Systems
PublisherNeural information processing systems foundation
ISSN (Print)1049-5258

Conference

Conference38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024)
Abbreviated titleNeurIPS 2024
PlaceCanada
CityVancouver
Period10/12/2415/12/24
Internet address

Bibliographical note

Research Unit(s) information for this publication is provided by the author(s) concerned.

Fingerprint

Dive into the research topics of 'Learning Where to Edit Vision Transformers'. Together they form a unique fingerprint.

Cite this