DualMap: Online Open-Vocabulary Semantic Mapping for Natural Language Navigation in Dynamic Changing Scenes

1The Hong Kong University of Science and Technology (Guangzhou) 2The Hong Kong University of Science and Technology
*Corresponding Author
arXiv 2025

DualMap, a novel online open-vocabulary mapping system that enables robots to understand and navigate dynamically changing environments through natural language queries.

Abstract

We introduce DualMap, an online open-vocabulary mapping system that enables robots to understand and navigate dynamically changing environments through natural language queries. Designed for efficient semantic mapping and adaptability to changing environments, DualMap meets the essential requirements for real-world robot navigation applications. Our proposed hybrid segmentation frontend and object-level status check eliminate the costly 3D object merging required by prior methods, enabling efficient online scene mapping. The dual-map representation combines a global abstract map for high-level candidate selection with a local concrete map for precise goal-reaching, effectively managing and updating dynamic changes in the environment. Through extensive experiments in both simulation and real-world scenarios, we demonstrate state-of-the-art performance in 3D open-vocabulary segmentation, efficient scene mapping, and online language-guided navigation.

System Pipeline

a) A detailed 3D semantic concrete map \( \mathcal{M}_c \) is built from online observations of posed RGBD frames; b) An anchor-based abstract map \( \mathcal{M}_a \) is derived from \( \mathcal{M}_c \), retaining global layout and static objects; c) Given a natural language query \( Q \), the agent retrieves a global candidate \( a^* \) from \( \mathcal{M}_a \) and starts navigation. During execution, it incrementally builds a local concrete map \( \mathcal{M}_c^{\text{local}} \), checks for target object presence, and updates the abstract map \( \mathcal{M}_a \) accordingly. If the target is not found near the \( a^* \), a new navigation attempt is made using the updated map \( \mathcal{M}_a' \). This loop continues until the target is found (navigation success) or the attempt limit is reached (navigation failure).

Results

Navigation in Dynamic Simulation Scenes

To increase object diversity in HM3D scenes, we randomly place various YCB objects and simulate dynamic changes by altering their locations via our tools. We show two examples of navigation under cross-anchor changes, with the corresponding abstract map in the bottom-right.

Query: "Find the red plate."

Query: "Can you look for the bowl?"

Open-Vocabulary Mapping on the Go — with Just an iPhone!

Our system also supports real-time open-vocabulary semantic mapping using an iPhone. The generated map can be queried with natural language and further utilized for downstream tasks such as robot navigation.

More Quantitative Results

Table 1: Semantic Segmentation Results with Different CLIP Backbones

Semantic Segmentation Table

Table 2: System Time Decomposition

System Time Decomposition Table

BibTeX

@article{jiang2025dualmap,
      title={DualMap: Online Open-Vocabulary Semantic Mapping for Natural Language Navigation in Dynamic Changing Scenes},
      author={Jiang, Jiajun and Zhu, Yiming and Wu, Zirui and Song, Jie},
      journal=arXiv preprint arXiv:2506.01950,
      year={2025}
    }