Stars
Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen3.6, GPT-OSS, Llama, and more!
The official paper for EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL.
面向 Claude Code / Codex / OpenCode / Gemini 的多通道AI CLI 任务完成提醒,支持耗时阈值、桌面端与命令行、通用 Webhook(飞书/钉钉/企微)、Telegram、邮件、桌面/声音提示,配备自动监听日志,AI摘要等功能
An interface library for RL post training with environments.
The official implementation of "EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis".
Agentic RL on Any Harness at Scale
An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models
Comprehensive open-source library of AI research and engineering skills for any AI model. Package the skills and your claude code/codex/gemini agent will be an AI research agent with full horsepowe…
Scalable and extensible reinforcement learning for LM agents.
Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning
[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards
Sotopia: an Open-ended Social Learning Environment (ICLR 2024 spotlight)
Build and run agents you can see, understand and trust.
A repo for open research on building large reasoning models
This code can be used to generate simulated NIRCam, NIRISS, or FGS data
We introduce BabyVision, a benchmark revealing the infancy of AI vision.
Google Research
SocialMaze: A Benchmark for Evaluating Social Reasoning in Large Language Models
Game-RL: Synthesizing Multimodal Verifiable Game Data to Boost VLMs' General Reasoning
A Survey of Reinforcement Learning for Large Reasoning Models
⚡️SwanLab - an open-source, modern-design AI training tracking and visualization tool. Supports Cloud / Self-hosted use. Integrated with PyTorch / Transformers / verl / LLaMA Factory / ms-swift / U…
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends