Abstract
Vision-and-Language Navigation in Continuous Environments (VLN-CE) requires agents to navigate 3D environments based on visual observations and natural language instructions. Existing approaches, focused on topological and semantic maps, often face limitations in accurately understanding and adapting to complex or previously unseen environments, particularly due to static and offline map constructions. To address these challenges, this paper proposes OVL-MAP, an innovative algorithm comprising three key modules: an online vision-and-language map construction module, a waypoint prediction module, and an action decision module. The online map construction module leverages robust open-vocabulary semantic segmentation to dynamically enhance the agent's scene understanding. The waypoint prediction module processes natural language instructions to identify task-relevant regions, predict sub-goal locations, and guide trajectory planning. The action decision module utilizes the DD-PPO strategy for effective navigation. Evaluations on the Robo-VLN and R2R-CE datasets demonstrate that OVL-MAP significantly improves navigation performance and exhibits stronger generalization in unknown environments. © 2025 IEEE.
| Original language | English |
|---|---|
| Pages (from-to) | 3294-3301 |
| Journal | IEEE Robotics and Automation Letters |
| Volume | 10 |
| Issue number | 4 |
| Online published | 11 Feb 2025 |
| DOIs | |
| Publication status | Published - Apr 2025 |
Research Keywords
- embodied intelligence
- multimodal perception
- Navigation maps
- vision-based navigation
Fingerprint
Dive into the research topics of 'OVL-MAP: An Online Visual Language Map Approach for Vision-and-Language Navigation in Continuous Environments'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver