Starred repositories
Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning
An Open-Source Large-Scale Reinforcement Learning Project for Search Agents
MiroThinker is a deep research agent optimized for complex research and prediction tasks. Our latest models, MiroThinker-1.7, achieves 74.0 and 75.3 on the BrowseComp and BrowseComp Zh, respectively.
A Tree Search Library with Flexible API for LLM Inference-Time Scaling
OpenClaw-RL: Train any agent simply by talking
The official repository of "A Comprehensive Survey on Reinforcement Learning-based Agentic Search: Foundations, Roles, Optimizations, Evaluations, and Applications".
Self-referential self-improving agents that can optimize for any computable task
Curated academic CV templates and guidelines for PhD students, researchers, and faculty job applicants.
AI agents running research on single-GPU nanochat training automatically
LLM Chess - evaluating Large Language Models' reasoning and instruction-following abilities by simulating chess games
A collection of various llm pruning implementations, training code for GPUs & TPUs, and evaluation script.
CATArena is an engineering-level tournament evaluation platform for Large Language Model-driven code agents (LLM-driven code agents), based on an iterative competitive peer learning framework.
"AI-Trader: 100% Fully-Automated Agent-Native Trading"
Synthetic data curation for post-training and structured data extraction
Benchmark LLM reasoning capability by solving chess puzzles.
Training VLM agents with multi-turn reinforcement learning
Harsh Jhamtani*, Varun Gangal*, Eduard Hovy, Graham Neubig, Taylor Berg-Kirkpatrick. Learning to Generate Move-by-Move Commentary for Chess Games from Large-Scale Social Forum Data. ACL 2018
Open source neural network chess engine with GPU acceleration and broad hardware support.
A Text-Based Environment for Interactive Debugging
This is the official GitHub repository for our survey paper "Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language Models".
Fully open reproduction of DeepSeek-R1
[ICLR 2026] Learning to Reason without External Rewards
A library for generative social simulation
AI paper trading project inspired by nof1 Alpha Arena, using cctx for quotation.
Procgen Benchmark: Procedurally-Generated Game-Like Gym-Environments
Defeating the Training-Inference Mismatch via FP16
Natural Language Reinforcement Learning