Unleashing the Power of Meta-tuning for Few-shot Generalization Through Sparse Interpolated Experts

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

View graph of relations

Author(s)

  • Jihoon Tack
  • Yee Whye Teh
  • Jonathan Richard Schwarz
  • Ying Wei

Related Research Unit(s)

Detail(s)

Original languageEnglish
Title of host publicationProceedings of the 41st International Conference on Machine Learning
Pages7280-7297
Publication statusPublished - Jul 2024

Publication series

NameProceedings of Machine Learning Research
Volume235
ISSN (Print)2640-3498

Conference

Title41st International Conference on Machine Learning (ICML 2024)
LocationMesse Wien Exhibition Congress Center
PlaceAustria
CityVienna
Period21 - 27 July 2024

Abstract

Recent successes suggest that parameter-efficient fine-tuning of foundation models is becoming the state-of-the-art method for transfer learning in vision, gradually replacing the rich literature of alternatives such as meta-learning. In trying to harness the best of both worlds, meta-tuning introduces a subsequent optimization stage of foundation models but has so far only shown limited success and crucially tends to underperform on out-of-distribution (OOD) tasks. In this paper, we introduce Sparse MetA-Tuning (SMAT), a method inspired by sparse mixture-of-experts approaches and trained to isolate subsets of pre-trained parameters automatically for meta-tuning on each task. SMAT successfully overcomes OOD sensitivity and delivers on the promise of enhancing the transfer abilities of vision foundation models beyond parameter-efficient finetuning. We establish new state-of-the-art results on a challenging combination of Meta-Dataset augmented with additional OOD tasks in both zero-shot and gradient-based adaptation settings. In addition, we provide a thorough analysis of the superiority of learned over hand-designed sparsity patterns for sparse expert methods and the pivotal importance of the sparsity level in balancing between in-distribution and out-of-distribution generalization. Our code and models are publicly available. Copyright 2024 by the author(s)

Bibliographic Note

Research Unit(s) information for this publication is provided by the author(s) concerned.

Citation Format(s)

Unleashing the Power of Meta-tuning for Few-shot Generalization Through Sparse Interpolated Experts. / Chen, Shengzhuang; Tack, Jihoon; Yang, Yunqiao et al.
Proceedings of the 41st International Conference on Machine Learning. 2024. p. 7280-7297 (Proceedings of Machine Learning Research; Vol. 235).

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review