-
PhD student, Wuhan University
- Wuhan
Starred repositories
CoTracker is a model for tracking any point (pixel) on a video.
GLUEMAP: Global Structure-from-Motion Meets Feedforward Reconstruction
Official implementation of paper "VLM³: Vision Language Models Are Native 3D Learners".
UFM: A Unified Dense Image Correspondence Estimator for both Optical Flow & Wide Baseline Matching Tasks. Matches any pair of images. (NeurIPS 2025)
[CVPR 2023] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
[ICLR 2026] PyTorch implementation of "The Less You Depend, The More You Learn: Synthesizing Novel Views from Sparse, Unposed Images with Mimimal 3D Knowledge".
[SIGGRAPH 2026 / TOG] Official code of the paper "UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors".
[ICML 2026] 4RC: 4D Reconstruction via Conditional Querying Anytime and Anywhere
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
This is a project about visual spatial reasoning.
A paper list for spatial reasoning
[ICLR 2026] Official implementation of the paper "📷 On the Generalization Capacities of MLLMs for Spatial Intelligence"
PyTorch code and models for VJEPA2 self-supervised learning from video.
[CVPR 2026 Oral] "MARCO: Navigating the Unseen Space of Semantic Correspondence"
[CVPR 2026] Official codes of "Monet: Reasoning in Latent Visual Space Beyond Image and Language"
Official repo of "Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens"
[ICCV '25 Highlight] CoMatch: Dynamic Covisibility-Aware Transformer for Bilateral Subpixel-Level Semi-Dense Image Matching
A Curated List of Awesome Works in World Modeling, Aiming to Serve as a One-stop Resource for Researchers, Practitioners, and Enthusiasts Interested in World Modeling.
The code for paper 'Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors'
[ECCV 2026] Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding
[CVPR 2026] "E-RayZer: Self-supervised 3D Reconstruction as Spatial Visual Pre-training" official implementation.
[ICLR 2026] π^3: Permutation-Equivariant Visual Geometry Learning
[CVPR 2026] Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views
[ICLR'26] This repository is the implementation of "3D Aware Region Prompted Vision Language Model"
Open-source, self-hosted note-taking tool built for quick capture. Markdown-native, lightweight, and fully yours.
[ICCV 2025 Highlight] No Pose at All: Self-Supervised Pose-Free 3D Gaussian Splatting from Sparse Views