Source-Free Elastic Model Adaptation for Vision-and-Language Navigation

Mingkui Tan*, Peihao Chen*, Hongyan Zhi, Jiajie Mai, Benjamin Rosman, Dongyu Ji, Runhao Zeng

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

Abstract

Vision-and-Language Navigation (VLN) requires an agent to follow given instructions to navigate. Despite the significant progress, the model trained on seen environments has a performance drop on unseen environments due to distribution shift. To improve the generalization, existing method attempts to apply test-time adaptation to VLN. However, it needs to access the training data and all testing data for updating the model before inference. The setting is not suitable for the real application because it is hard for the agent to access training data and all testing data when the agent is applied in a new environment. In this paper, we consider a more practical setting with source-free and online-inference test-time adaption. In other words, the model can only access one testing sample for test-time adaptation. In this setting, the model may suffer from catastrophic forgetting of the learned knowledge and unstable parameter update issues. To solve these challenges, we propose an elastic adaptation model (EAM) that consists of an auxiliary decision model and a sample replay mechanism. We use the online testing samples to adapt the auxiliary decision model to new environments, which cooperates with the frozen original model to make better action decisions. The sample replay mechanism stores the historical testing samples to make the adaptation process more stable. Our method is model-agnostic and is effortless to be applied to most existing methods. Experimental results show that our method achieves stable performance improvement based on three existing methods on three VLN benchmark datasets. © 1999-2012 IEEE.
Original languageEnglish
Pages (from-to)3953-3965
Number of pages13
JournalIEEE Transactions on Multimedia
Volume27
Online published28 Jan 2025
DOIs
Publication statusPublished - 2025

Bibliographical note

Full text of this publication does not contain sufficient affiliation information. With consent from the author(s) concerned, the Research Unit(s) information for this record is based on the existing academic department affiliation of the author(s).

Research Keywords

  • Multi-Modal
  • Test-Time Adaptation
  • Vision-and-Language Navigation

Fingerprint

Dive into the research topics of 'Source-Free Elastic Model Adaptation for Vision-and-Language Navigation'. Together they form a unique fingerprint.

Cite this