Skip to main navigation Skip to search Skip to main content

Learning multimodal adaptive relation graph and action boost memory for visual navigation

Jian Luo, Bo Cai*, Yaoxiang Yu, Aihua Ke, Kang Zhou, Jian Zhang

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

Abstract

The task of visual navigation (VN) is steering the agent find target object only using visual perceptions. Previous works largely exploit multimodal information (e.g. visual and training memory) to improve the environmental perception ability, while making less effort to leverage interchange information. Besides, multimodal fusion tends to ignore the data dependencies (prefer a part of the modal data) as well as the supervision of the action.

In this work, we present a novel multimodal graph learning (MGL) structure for VN, which consists of three parts. (1) the multimodal fusion exploits the rich information across spatial, RGB, and depth information about objects’ place, as well as semantic information about their categories, (2) adaptive relation graph (ARG) is dynamically built using object detectors, which encodes multimodal fusion and adapt to a novel environment. It embeds its navigation history and other useful task-oriented structural information, thus make the agent own the association ability and make advisable informed decisions and (3) action boost module (ABM) aims to assist the agent make intelligent decisions, which predicts more accurate action using beneficial training experience. Our agent can foresight what the goal state may look like and how to get closer towards that state. These combinations of the “what” and the “how” allow the agent to navigate to the target object effectively. We validate our approach on the AI2-THOR dataset. It reports 24.2% and 23.7% increase in SPL(Success weighted by Per Length) and SR(Success Rate) compared with baselines, respectively. Code and datasets can be found in https://github.com/luosword/ABM_VN.

© 2024 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
Original languageEnglish
Article number102678
JournalAdvanced Engineering Informatics
Volume62
Issue numberPart B
Online published16 Jul 2024
DOIs
Publication statusPublished - Oct 2024

Research Keywords

  • Action boost memory
  • Knowledge graph
  • Reinforcement learning
  • Visual navigation
  • Visual transformer network

Fingerprint

Dive into the research topics of 'Learning multimodal adaptive relation graph and action boost memory for visual navigation'. Together they form a unique fingerprint.

Cite this