Highlights
- Pro
Stars
Cosmos-Reason2 models understand the physical common sense and generate appropriate embodied decisions in natural language through long chain-of-thought reasoning processes.
A simulation platform for versatile Embodied AI research and developments.
Claude Code CLI integration for Unreal Engine 5.7 - Get AI coding assistance with built-in UE5.7 documentation context directly in the editor.
Official implementation for WorldScore: A Unified Evaluation Benchmark for World Generation
[CVPR 2025 Highlight] Official implementation of the solvers and estimators proposed in the paper "Relative Pose Estimation through Affine Corrections of Monocular Depth Priors"
VideoGPA is a self-supervised framework that enhances 3D consistency in Video Diffusion Models.
ViPE: Video Pose Engine for Geometric 3D Perception
Masked Depth Modeling for Spatial Perception
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
[ICLR 2026] MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence
[ECCV'20] Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling
A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training
[CVPR 2026] G2VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning
Python package for the evaluation of odometry and SLAM
[ICCV 2023, Official Code] for paper "Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives". Official Weights and Demos provided.
MAGI-1: Autoregressive Video Generation at Scale
[ICLR 2026] Official Repo for Rolling Forcing: Autoregressive Long Video Diffusion in Real Time
[ICLR 2026] LongLive: Real-time Interactive Long Video Generation
OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.
Public code for XFactor: Introduces the first geometry-free model to achieve true self-supervised / pose-free Novel View Synthesis (NVS) by learning transferable latent camera pose representations.
Native Multimodal Models are World Learners
ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation (NeurIPS 2023 Spotlight)
(NeurIPS 2024 Oral 🔥) Improved Distribution Matching Distillation for Fast Image Synthesis
[ICLR 2026 Oral (top 1.2%)] Official implementation of DepthLM