-
The University of Hong Kong
- Hong Kong SAR
Stars
(NeurIPS 2025) Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation
(ICCV 2025) How Far are AI-generated Videos from Simulating the 3D Visual World: A Learned 3D Evaluation Approach
[NeurIPS'25] Official repository of Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations
[ICLR 2026] LongLive: Real-time Interactive Long Video Generation
Implementation of Paper “GV-VAD : Exploring Video Generation for Weakly-Supervised Video Anomaly Detection”
Wan: Open and Advanced Large-Scale Video Generative Models
(ICCV 2025) Holistic Tokenizer for Autoregressive Image Generation
Official code and checkpoint release for mobile robot foundation models: GNM, ViNT, and NoMaD.
Official code for the CVPR 2025 paper "Navigation World Models".
The official implementation of the paper "UrbanWorld: An Urban World Model for 3D City Generation"
[NeurIPS 2025]Official repositories for "Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thought".
Official Implementation of "VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuning".
VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction
[ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness
Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
OpenEQA Embodied Question Answering in the Era of Foundation Models
Universal Monocular Metric Depth Estimation
[CVPR 2023 Highlight] Perspective Fields for Single Image Camera Calibration
[NeurIPS 2024] Geometry-Aware Large Reconstruction Model for Efficient and High-Quality 3D Generation
[ICCV 2025] HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
[CVPR 2025] UniGoal: Towards Universal Zero-shot Goal-oriented Navigation
[ECCV 2024] SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM
[CVPR'24 Highlight & Best Demo Award] Gaussian Splatting SLAM
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, realistic, and adaptive scene generation for applications in…