Skip to content
View fangzhou2000's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Block or report fangzhou2000

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Official Implementation of "Geometrically-Constrained Agent for Spatial Reasoning"

Python 18 Updated Dec 18, 2025

RealSee3D: A multi-view RGB-D dataset combining real-world captures and procedurally generated scenes, with extensible annotations for diverse 3D vision research.

Python 204 8 Updated Dec 18, 2025

[NeurIPS 2025] Source codes for the paper "MindJourney: Test-Time Scaling with World Models for Spatial Reasoning"

Python 116 4 Updated Nov 4, 2025

Open-source unified multimodal model

Python 5,500 481 Updated Oct 27, 2025

Synthetic VQA data generation code for SpatialReasoner.

Python 12 1 Updated Nov 25, 2025

Training recipe for SpatialReasoner

Python 26 1 Updated Sep 21, 2025
Python 449 46 Updated Dec 22, 2025

Code for the paper "VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use"

Python 144 3 Updated Aug 10, 2025

Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views

Python 123 4 Updated Dec 9, 2025

Code for the Molmo Vision-Language Model

Python 841 80 Updated Dec 12, 2024

Official implementation of “Towards Cross-View Point Correspondence in Vision-Language Models”.

Python 10 Updated Dec 8, 2025

[NeurIPS 2025] Official implementation of "RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics"

Python 214 6 Updated Dec 16, 2025
Python 4,463 434 Updated Sep 14, 2025

Official implementation of DepthLM

Python 276 12 Updated Oct 7, 2025

Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.

1,220 39 Updated Dec 23, 2025

[CVPR'25] Official repository for "Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration"

Python 79 4 Updated Jun 10, 2025

RynnEC: Bringing MLLMs into Embodied World

Jupyter Notebook 383 17 Updated Oct 29, 2025

Fast Segment Anything

Python 8,204 746 Updated Jul 30, 2024
Python 224 19 Updated Nov 5, 2025

[ICCV 2023 Oral] ScanNet++: A High-Fidelity Dataset of 3D Indoor Scenes

Python 340 31 Updated Dec 19, 2025

[ICCV'23 Workshop] SAM3D: Segment Anything in 3D Scenes

Python 1,286 79 Updated Apr 21, 2024

[ICLR 2025, Oral] EmbodiedSAM: Online Segment Any 3D Thing in Real Time

Python 603 31 Updated May 7, 2025

[ICCV 2025 Oral] SceneSplat - Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining

Python 279 14 Updated Dec 11, 2025
Python 122 4 Updated Nov 30, 2025

VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction

Python 314 23 Updated Sep 1, 2025

Code of π^3: Permutation-Equivariant Visual Geometry Learning

Python 1,489 78 Updated Dec 20, 2025
Python 3 Updated Jul 16, 2025

[SIGGRAPH Asia 2025 (ACM TOG)] AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views

Python 645 30 Updated Dec 22, 2025
Next