Highlights
- Pro
Stars
Official Implementation of "Geometrically-Constrained Agent for Spatial Reasoning"
RealSee3D: A multi-view RGB-D dataset combining real-world captures and procedurally generated scenes, with extensible annotations for diverse 3D vision research.
[NeurIPS 2025] Source codes for the paper "MindJourney: Test-Time Scaling with World Models for Spatial Reasoning"
Synthetic VQA data generation code for SpatialReasoner.
Training recipe for SpatialReasoner
Code for the paper "VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use"
Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views
Official implementation of “Towards Cross-View Point Correspondence in Vision-Language Models”.
[NeurIPS 2025] Official implementation of "RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics"
Official implementation of DepthLM
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.
[CVPR'25] Official repository for "Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration"
RynnEC: Bringing MLLMs into Embodied World
[ICCV 2023 Oral] ScanNet++: A High-Fidelity Dataset of 3D Indoor Scenes
[ICCV'23 Workshop] SAM3D: Segment Anything in 3D Scenes
[ICLR 2025, Oral] EmbodiedSAM: Online Segment Any 3D Thing in Real Time
[ICCV 2025 Oral] SceneSplat - Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining
VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction
Code of π^3: Permutation-Equivariant Visual Geometry Learning
[SIGGRAPH Asia 2025 (ACM TOG)] AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views