We introduce DualMap, an online open-vocabulary mapping system that enables robots to understand and navigate dynamically changing environments through natural language queries. Designed for efficient semantic mapping and adaptability to changing environments, DualMap meets the essential requirements for real-world robot navigation applications. Our proposed hybrid segmentation frontend and object-level status check eliminate the costly 3D object merging required by prior methods, enabling efficient online scene mapping. The dual-map representation combines a global abstract map for high-level candidate selection with a local concrete map for precise goal-reaching, effectively managing and updating dynamic changes in the environment. Through extensive experiments in both simulation and real-world scenarios, we demonstrate state-of-the-art performance in 3D open-vocabulary segmentation, efficient scene mapping, and online language-guided navigation.
a) A detailed 3D semantic concrete map \( \mathcal{M}_c \) is built from online observations of posed RGBD frames; b) An anchor-based abstract map \( \mathcal{M}_a \) is derived from \( \mathcal{M}_c \), retaining global layout and static objects; c) Given a natural language query \( Q \), the agent retrieves a global candidate \( a^* \) from \( \mathcal{M}_a \) and starts navigation. During execution, it incrementally builds a local concrete map \( \mathcal{M}_c^{\text{local}} \), checks for target object presence, and updates the abstract map \( \mathcal{M}_a \) accordingly. If the target is not found near the \( a^* \), a new navigation attempt is made using the updated map \( \mathcal{M}_a' \). This loop continues until the target is found (navigation success) or the attempt limit is reached (navigation failure).
To increase object diversity in HM3D scenes, we randomly place various YCB objects and simulate dynamic changes by altering their locations via our tools. We show two examples of navigation under cross-anchor changes, with the corresponding abstract map in the bottom-right.
Our system also supports real-time open-vocabulary semantic mapping using an iPhone. The generated map can be queried with natural language and further utilized for downstream tasks such as robot navigation.
@article{jiang2025dualmap,
title={DualMap: Online Open-Vocabulary Semantic Mapping for Natural Language Navigation in Dynamic Changing Scenes},
author={Jiang, Jiajun and Zhu, Yiming and Wu, Zirui and Song, Jie},
journal=arXiv preprint arXiv:2506.01950,
year={2025}
}