Stars
LongLive: Real-time Interactive Long Video Generation
Actuated Version of the Universal Manipulation Interface Gripper
A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and Autonomous Driving, including papers, codes, and related webs…
Benchmarking Knowledge Transfer in Lifelong Robot Learning
Code release for ICLR 2023 paper: SlotFormer on object-centric dynamics models
UT-Austin-RPL / mimicdroid-robocasa
Forked from robocasa/robocasaMimicDroid: In-Context Learning for Humanoid Robot Manipulation from Human Play Videos
A fast and differentiable model predictive control (MPC) solver for PyTorch.
PhysX: Physical-Grounded 3D Asset Generation (NeurIPS 2025, Spotlight)
A comprehensive list of papers investigating physical cognition in video generation, including papers, codes, and related websites.
Reference PyTorch implementation and models for DINOv3
Open source repo for Locate 3D Model, 3D-JEPA and Locate 3D Dataset
[CVPR 2024✨Highlight] Official repository for HOLD, the first method that jointly reconstructs articulated hands and objects from monocular videos without assuming a pre-scanned object template and…
Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos
[CVPR'25 Oral] MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision
This is the code repository for IntPhys 2, a video benchmark designed to evaluate the intuitive physics understanding of deep learning models.
Training VLM agents with multi-turn reinforcement learning
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
PyTorch code and models for VJEPA2 self-supervised learning from video.
A Modular Toolkit for Robot Kinematic Optimization
moojink / openvla-oft
Forked from openvla/openvlaFine-Tuning Vision-Language-Action Models: Optimizing Speed and Success
ICCV 2025 | TesserAct: Learning 4D Embodied World Models