🚀 An open-source, hands-on curriculum bridging the gap from basic RL concepts to LLM alignment, RLVR, and advanced Agentic systems.
-
Updated
May 15, 2026 - Python
🚀 An open-source, hands-on curriculum bridging the gap from basic RL concepts to LLM alignment, RLVR, and advanced Agentic systems.
Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning
Run more RL experiments. Wait less for GPUs.
[CVPR 2026] Official Code for "ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning"
Claw-R1: Empowering OpenClaw with Advanced Agentic RL.
[ACL 2026 Findings] Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization
RL study guide — foundations through RLHF, DPO, GRPO, RLVR, agentic RL, and offline RL. Hand-written CS294 notes, 19 lecture drafts, 5 tested exercises, citations that resolve.
DART-GUI: Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation
Curated, opinionated index of post-R1 LLM × Reinforcement Learning. Many deep-dive blog posts cross-linked to many papers — GRPO, DAPO, DPO, PPO, RLHF, GSPO, CISPO, VAPO, Reward Modeling, MoE RL stability, Verifier-Free RL, Training-Free RL, Agentic RL, DeepSeek-R1 reproduction.
Proximity-based Multi-turn Optimization (ProxMO) - Official Implementation
SGLang model provider for Strands Agents for on-policy agentic RL training.
[ACL2026] AlphaQuanter: An End-to-End Tool-Orchestrated Agentic Reinforcement Learning Framework for Stock Trading.
Standardizing environment infrastructure with Strands Agents — step, observe, reward.
This is the official repository for our paper "Doctor-R1: Mastering Clinical Inquiry with Experiential Agentic Reinforcement Learning" published in ICRL 2026.
Official implementation for paper "Demystifying Reinforcement Learning for Long-Horizon Tool-Using Agents: A Comprehensive Recipe"
ToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agents
Official Code of Paper: MolAct: An Agentic RL Framework for Molecular Editing and Property Optimization
Fast, Multi-Cloud Sandboxes for AI Agents.
MolMem: Memory-Augmented Agentic Reinforcement Learning for Sample-Efficient Molecular Optimization
Automate digital forensics and incident response tasks using an autonomous agent aligned with MITRE ATT&CK frameworks.
Add a description, image, and links to the agentic-rl topic page so that developers can more easily learn about it.
To associate your repository with the agentic-rl topic, visit your repo's landing page and select "manage topics."