Stars
From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning.
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
HPA-HLE is an open-source framework for Human Last Examing using multi-agent collaboration, dynamic routing, and entropy-reducing evaluation. It achieved 27.5% accuracy across multiple tests withou…
Fully open data curation for reasoning models
[ACL2025] A novel complex reasoning enhancement method that utilizes widely available algorithmic questions and their codes to generate logical reasoning data.
LogicBench is a natural language question-answering dataset consisting of 25 different reasoning patterns spanning over propositional, first-order, and non-monotonic logics.
Train your Agent model via our easy and efficient framework
verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in-Group Policy Optimization for LLM Agent Training"
SkyRL: A Modular Full-stack RL Library for LLMs
Training VLM agents with multi-turn reinforcement learning
A live stream development of RL tunning for LLM agents
RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.
[ICML 2025 Oral] CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction
LLM/VLM gaming agents and model evaluation through games.
A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning
The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra
Multiple datasets for ARC (Abstraction and Reasoning Corpus)
☁️ KUMO: Generative Evaluation of Complex Reasoning in Large Language Models
Code and Data for ACL 2025 Paper "Aristotle: Mastering Logical Reasoning with A Logic-Complete Decompose-Search-Resolve Framework".
[COLM 2025] Official repository for R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents
Understanding R1-Zero-Like Training: A Critical Perspective
[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards