Stars
End-to-end pipeline converting generative videos (Veo, Sora) to humanoid robot motions
Zxy-MLlab / LIBERO-PRO
Forked from Lifelong-Robot-Learning/LIBEROLIBERO-PRO is the official repository of the LIBERO-PRO — an evaluation extension of the original LIBERO benchmark
Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2
Code for "Novel Object 6D Pose Estimation with a Single Reference View".
Fast-in-Slow: A Dual-System Foundation Model Unifying Fast Manipulation within Slow Reasoning
6D Cartesian space hybrid force-velocity control using positional inner loop and wrist mounted FT sensor.
Official implementation for Compliant Residual DAgger
🔥 SpatialVLA: a spatial-enhanced vision-language-action model that is trained on 1.1 Million real robot episodes. Accepted at RSS 2025.
WoW (World-Omniscient World Model) is a generative world model trained on 2 million robotic interaction trajectories, designed to imagine, reason, and act in the physical world. Unlike passive vide…
Code for PEEK: Guiding and Minimal Image Representations for Zero-Shot Generalization of Robot Manipulation Policies
Official implementation of ReconVLA: Reconstructive Vision-Language-Action Model as Effective Robot Perceiver.
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model
[AAAI 2026] Official code for MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation
moojink / openvla-oft
Forked from openvla/openvlaFine-Tuning Vision-Language-Action Models: Optimizing Speed and Success
Repo for running various baselines with Behavior-1K
Team Comet's 2025 BEHAVIOR Challenge Codebase
This is a official code for the benchmark of the paper "VTDexManip: A Dataset and Benchmark for Visual-tactile Pretraining and Dexterous Manipulation with Reinforcement Learning" (ICLR 2025)
Extract frames and motion vectors from H.264 and MPEG-4 encoded video.
HiF-VLA: An efficient, bidirectional spatiotemporal expansion Vision-Language-Action Model
egocentric humanoid manipulation benchmark
[ICLR 25] Code for "Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning"
Uni-Hand: Universal Hand Motion Forecasting in Egocentric Views (with visual imitation learning for robots)
Learning Dexterous Manipulation Skills from Imperfect Simulations
Official Implementation of "Real-world RL for Active Perception Behaviors"
[CVPR 2025]Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation
Official Release of "Mixture of Horizons in Action Chunking"
MM-ACT: Learn from Multimodal Parallel Generation to Act