Stars
BEHAVIOR-1K: a platform for accelerating Embodied AI research. Join our Discord for support: https://discord.gg/bccR5vGFEx
starVLA: A Lego-like Codebase for Vision-Language-Action Model Developing
[IROS 2025 Award Finalist] The Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems
robosuite: A Modular Simulation Framework and Benchmark for Robot Learning
Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model
Cosmos-Transfer2.5, built on top of Cosmos-Predict2.5, produces high-quality world simulations conditioned on multiple spatial control inputs.
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy
The world's first open-source multimodal creative assistant This is a substitute for Canva and Manus that prioritizes privacy and is usable locally.
Building General-Purpose Robots Based on Embodied Foundation Model
Re-implementation of pi0 vision-language-action (VLA) model from Physical Intelligence
[CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents
RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
[CVPR 2025] Official implementation of "GenManip: LLM-driven Simulation for Generalizable Instruction-Following Manipulation"
An All-in-one robot manipulation learning suite for policy models training and evaluation on various datasets and benchmarks.
CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks
Cosmos-Transfer1 is a world-to-world transfer model designed to bridge the perceptual divide between simulated and real-world environments.
A comprehensive list of papers about Robot Manipulation, including papers, codes, and related websites.
RoboBrain 2.0: Advanced version of RoboBrain. See Better. Think Harder. Do Smarter. 🎉🎉🎉
[RSS 2025] Learning to Act Anywhere with Task-centric Latent Actions
GraspVLA: a Grasping Foundation Model Pre-trained on Billion-scale Synthetic Action Data
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer