Starred repositories
A Comprehensive Survey of Interactive Video World Models
AI-IO: An Aerodynamics-Inspired Real-Time Inertial Odometry for Quadrotors (ICRA 2026)
An open source library designed to provide community examples of Joint Embedding Predictive Architectures (JEPAs). It contains code and examples for learning representations from images, video, and…
[ICML 2026] Official Code for Rectified LpJEPA: Joint-Embedding Predictive Architectures with Sparse and Maximum-Entropy Representations
Official implementation of our paper "CNN-JEPA: Self-Supervised Pretraining Convolutional Neural Networks Using Joint Embedding Predictive Architecture"
Code to pretrain, fine-tune, and evaluate DreamZero and run sim & real-world evals
[AAAI 2026] SparseWorld: A Flexible, Adaptive, and Efficient 4D Occupancy World Model Powered by Sparse and Dynamic Queries
PyTorch code and models for V-JEPA self-supervised learning from video.
GaussianAD: Gaussian-Centric End-to-End Autonomous Driving
Efficient vision foundation models for high-resolution generation and perception.
Code of "OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments".
(ICCV 2025) GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting
[ICCV 2025] Detect Anything 3D in the Wild
Reference PyTorch implementation and models for DINOv3
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
Evo-0: Vision-Language-Action Model with Implicit Spatial Understanding.
[CVPR 2025] Gaussian World Model for Streaming 3D Occupancy Prediction
[ICCV 2025] TeRA: Rethinking Text-guided Realistic 3D Avatar Generation
[ICCV 2025] Can Generative Geospatial Diffusion Models Excel as Discriminative Geospatial Foundation Models?
[ECCV2024] SQD-MapNet: Stream Query Denoising for Vectorized HD-Map Construction
HE-Drive: Human-Like End-to-End Driving with Vision Language Models
This is the official project repository for "FASTopo: Fast-Slow Lane Segment Topology Reasoning with Latent World Models"