Starred repositories
PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides [EMNLP 2025]
Learning Auxiliary Monocular Contexts Helps Monocular 3D Object Detection (AAAI'22)
OpenMMLab's next-generation platform for general 3D object detection.
[ECCV2022] PETR: Position Embedding Transformation for Multi-View 3D Object Detection & [ICCV2023] PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images
My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"
The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework. Website: https://swarms.ai
The official implementation for [NeurIPS2025 Oral] Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
EvoVLA: Self-Evolving Vision-Language-Action Model
[CVPR 2025] Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation
[AAAI 2026] EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation
[CVPR 2025] EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
A course in reinforcement learning in the wild
Scalable toolkit for efficient model reinforcement
FinRL®: Financial Reinforcement Learning. 🔥
The Robotics Library (RL) is a self-contained C++ library for rigid body kinematics and dynamics, motion planning, and control.
A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.
Re-implementation of pi0 vision-language-action (VLA) model from Physical Intelligence
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model
openvla / openvla
Forked from TRI-ML/prismatic-vlmsOpenVLA: An open-source vision-language-action model for robotic manipulation.
[ICLR'25] MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions