Tracking objects that move within dynamic environments is a core challenge in robotics. Recent research has advanced this topic significantly; however, many existing approaches remain inefficient due to their reliance on heavy foundation models. To address this limitation, we propose LOST-3DSG, a lightweight open-vocabulary 3D scene graph designed to track dynamic objects in real-world environments. Our method adopts a semantic approach to entity tracking based on word2vec and sentence embeddings, enabling an open-vocabulary representation while avoiding the necessity of storing dense CLIP visual features. As a result, LOST-3DSG achieves superior performance compared to approaches that rely on high-dimensional visual embeddings. We evaluate our method through qualitative and quantitative experiments conducted in a real 3D environment using a TIAGo robot. The results demonstrate the effectiveness and efficiency of LOST-3DSG in dynamic object tracking.
LOST-3DSG consists of two main components: the Perception Module and the Scene Update Module. The key contributions of our work are:
The system processes RGB-D observations to extract object labels, colors, materials, and fine-grained descriptions using a Vision-Language Model. These semantic attributes are then encoded using lightweight embeddings and used for temporal association, enabling the robot to recognize when objects move, disappear, or reappear in the scene.
@misc{ferraina2026lost3dsglightweightopenvocabulary3d,
title={LOST-3DSG: Lightweight Open-Vocabulary 3D Scene Graphs with Semantic Tracking in Dynamic Environments},
author={Sara Micol Ferraina and Michele Brienza and Francesco Argenziano and Emanuele Musumeci and Vincenzo Suriani and Domenico D. Bloisi and Daniele Nardi},
year={2026},
eprint={2601.02905},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2601.02905},
}