[2025-12-16] 🔥🔥🔥 We release RoboTracer on arxiv and launch the project page. It retains all RoboRefer (previous version) features while also further supporting multi-step, metric-grounded spatial tracing with explicit reasoning.
We introduce RoboTracer, the first 3D-aware reasoning VLM for multi-step metric-grounded spatial tracing with explicit reasoning.
We present TraceSpatial, a dataset can enable general VLMs to adapt to spatial tracing tasks, with 4.5M data samples (~30M QA pairs) from 2D/3D/Video sources, spanning outdoor/indoor/tabletop scenes and containing complex reasoning processes (up to 9 steps).
- Release TraceSpatial-Bench evaluation code (About 2 week).
- Release the SFT-trained 2B RoboTracer model and inference code (About 1 month).
- Release the SFT-trained 8B RoboTracer model (About 2 months).
- Release the TraceSpatial Dataset and SFT training code (About 2 months).
- Release the RFT-trained RoboTracer model and training code (Maybe 2 months or more).
- Release the Dataset Generation Pipeline (Maybe 2 months or more).
If you have any questions about the code or the paper, feel free to email Enshen (zhouenshen@buaa.edu.cn) Yibo (leeibo@buaa.edu.cn), and Jingkun (anjingkun02@gmail.com).
-
This repository is built upon the codebase of NVILA, RoboRefer, MapAnything, R1-V.
-
We acknowledge OpenImage, CA-1M, ScanNet, DROID, AgiBot-Beta, RoboTwin 2.0 for their data and assets.
If you find RoboTracer, TraceSpatial, and TraceSpatial-Bench useful for your research, please cite using this BibTeX:
@article{zhou2025robotracer,
title={RoboTracer: Mastering Spatial Trace with Reasoning in Vision-Language Models for Robotics},
author={Zhou, Enshen and Chi, Cheng and Li, Yibo and An, Jingkun and Zhang, Jiayuan and Rong, Shanyu and Han, Yi and Ji, Yuheng and Liu, Mengzhen and Wang, Pengwei and others},
journal={arXiv preprint arXiv:2512.13660},
year={2025}
}