-
Bytedance Seed
- https://sites.google.com/view/haibinlin/
- @eric_haibin_lin
Stars
TPU inference for vLLM, with unified JAX and PyTorch support.
100M tokens. Infinite compute. Lowest val loss wins.
An interface library for RL post training with environments.
A set of examples based on verl for end-to-end RL training recipes.
Official Code for "Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search"
MiroRL is an MCP-first reinforcement learning framework for deep research agent.
tmlr-group / Co-rewarding
Forked from resistzzz/Co-rewarding[ICLR 2026] "Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models"
[ICLR 2026] SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
A repo for open research on building large reasoning models
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
[NeurIPS 2025] NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
[ICLR'26] RM-R1: Unleashing the Reasoning Potential of Reward Models
Repository for the paper "InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners"
The absolute trainer to light up AI agents.
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo