Stars
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
A Curated List of Vision-Language-Action (VLA) and World Action Models (WAM) Research and Beyond
LDA-1B: Scaling Latent Dynamics Action Model via Universal Embodied Data Ingestion
stable-baselines3 implementation of dsrl
Official implementation for DSRL, Steering Your Diffusion Policy with Latent Space Reinforcement Learning (CoRL 2025)
A Survey on Reinforcement Learning of Vision-Language-Action Models for Robotic Manipulation
A simple and well styled PPO implementation. Based on my Medium series: https://medium.com/@eyyu/coding-ppo-from-scratch-with-pytorch-part-1-4-613dfc1b14c8.
[ICLR 2026] The offical Implementation of "Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model"
Real-Time VLAs via Future-state-aware Asynchronous Inference.
RynnVLA-002: A Unified Vision-Language-Action and World Model
PyTorch code and models for VJEPA2 self-supervised learning from video.
[NeurIPS'25] Generalizable Reasoning through Compositional Energy Minimization
Open-source code of the paper: Real-to-Sim Robot Policy Evaluation with Gaussian Splatting Simulation of Soft-Body Interactions.
Cambrian-S: Towards Spatial Supersensing in Video
Muon is an optimizer for hidden layers in neural networks
Reference PyTorch implementation and models for DINOv3
PyTorch Code for Energy-Based Transformers paper -- generalizable reasoning and scalable learning
The offical repo for paper "VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers" (ICCV 2025)
AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
Official code for the CVPR 2025 paper "Navigation World Models".
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer