Dynamic Open-Vocabulary 3D Scene Graphs for Long-term Language-Guided Mobile Manipulation

Zhijie Yan, Shufei Li, Zuoxu Wang*, Lixiu Wu, Han Wang, Jun Zhu, Lijiang Chen, Jihong Liu

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

Abstract

Enabling mobile robots to perform long-term tasks in dynamic real-world environments is a formidable challenge, especially when the environment changes frequently due to human-robot interactions or the robot's own actions. Traditional methods typically assume static scenes, which limits their applicability in the continuously changing real world. To overcome these limitations, we present DovSG, a novel mobile manipulation framework that leverages dynamic open-vocabulary 3D scene graphs and a language-guided task planning module for long-term task execution. DovSG takes RGB-D sequences as input and utilizes vision-language models (VLMs) for object detection to obtain high-level object semantic features. Based on the segmented objects, a structured 3D scene graph is generated for low-level spatial relationships. Furthermore, an efficient mechanism for locally updating the scene graph, allows the robot to adjust parts of the graph dynamically during interactions without the need for full scene reconstruction. This mechanism is particularly valuable in dynamic environments, enabling the robot to continually adapt to scene changes and effectively support the execution of long-term tasks. We validated our system in real-world environments with varying degrees of manual modifications, demonstrating its effectiveness and superior performance in long-term tasks.

© 2025 IEEE. All rights reserved, including rights for text and data mining, and training of artificial intelligence and similar technologies. Personal use is permitted, but republication/redistribution requires IEEE permission.
Original languageEnglish
Pages (from-to)4252-4259
JournalIEEE Robotics and Automation Letters
Volume10
Issue number5
Online published3 Mar 2025
DOIs
Publication statusPublished - May 2025

Research Keywords

  • 3D scene graph
  • Long-term Tasks
  • Mobile Manipulation
  • Open vocabulary

Fingerprint

Dive into the research topics of 'Dynamic Open-Vocabulary 3D Scene Graphs for Long-term Language-Guided Mobile Manipulation'. Together they form a unique fingerprint.

Cite this