Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen3.6, GPT-OSS, Llama, and more!

Python 10,151 906 Updated Jun 21, 2026

openai / gpt-oss

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 20,182 2,099 Updated Jun 9, 2026

Visualize-ML / Book5_Essentials-of-Probability-and-Statistics

Book_5_《统计至简》 | 鸢尾花书：从加减乘除到机器学习；上架！

Jupyter Notebook 3,675 753 Updated May 1, 2026

yongliu20 / Awesome-Unified-Understanding-and-Generation

53 Updated Aug 22, 2025

bytedance / UI-TARS

Pioneering Automated GUI Interaction with Native Agents

Python 11,023 832 Updated Jan 27, 2026

TapXWorld / ChinaTextbook

所有小初高、大学PDF教材。

Roff 74,452 16,675 Updated Oct 18, 2025

samkhur006 / awesome-llm-planning-reasoning

A curated collection of LLM reasoning and planning resources, including key papers, limitations, benchmarks, and additional learning materials.

321 18 Updated Feb 28, 2025

GAIR-NLP / OctoThinker

Revisiting Mid-training in the Era of Reinforcement Learning Scaling

Jupyter Notebook 188 14 Updated Jul 23, 2025

policy-gradient / GRPO-Zero

Implementing DeepSeek R1's GRPO algorithm from scratch

Python 1,866 94 Updated Apr 18, 2025

a2aproject / A2A

Agent2Agent (A2A) is an open protocol enabling communication and interoperability between opaque agentic applications.

Shell 24,400 2,472 Updated Jun 22, 2026

camel-ai / loong

🐉 Loong: Synthesize Long CoTs at Scale through Verifiers.

Python 503 42 Updated Jun 12, 2026

BytedTsinghua-SIA / DAPO

An Open-source RL System from ByteDance Seed and Tsinghua AIR

Python 1,829 84 Updated May 11, 2025

AgentR1 / Agent-R1

Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning

Python 1,490 102 Updated Jun 22, 2026

Wan-Video / Wan2.1

Wan: Open and Advanced Large-Scale Video Generative Models

Python 16,306 2,892 Updated Mar 5, 2026

ScienceOne-AI / DeepSeek-671B-SFT-Guide

An open-source solution for full parameter fine-tuning of DeepSeek-V3/R1 671B, including complete code and scripts from training to inference, as well as some practical experiences and conclusions.…

Python 809 98 Updated Mar 13, 2025