-
SJTU
RL
Scalable toolkit for efficient model reinforcement
An Open-source RL System from ByteDance Seed and Tsinghua AIR
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
VLA-RFT: Vision-Language-Action Models with Reinforcement Fine-Tuning
An official implementation of DanceGRPO: Unleashing GRPO on Visual Generation
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
RLinf is a flexible and scalable open-source infrastructure designed for post-training foundation models (LLMs, VLMs, VLAs) via reinforcement learning.
Official Implementation of Paper: WMPO: World Model-based Policy Optimization for Vision-Language-Action Models
verl: Volcano Engine Reinforcement Learning for LLMs