Skip to content
View pengzju's full-sized avatar
  • Zhejiang University

Block or report pengzju

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning.

Python 23 1 Updated Oct 7, 2025

xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

Jupyter Notebook 143 7 Updated Nov 13, 2025

HPA-HLE is an open-source framework for Human Last Examing using multi-agent collaboration, dynamic routing, and entropy-reducing evaluation. It achieved 27.5% accuracy across multiple tests withou…

Python 2 1 Updated Jun 13, 2025

Resources for the Enigmata Project.

Python 74 4 Updated Aug 13, 2025

Fully open data curation for reasoning models

Python 2,173 182 Updated Dec 2, 2025

[ACL2025] A novel complex reasoning enhancement method that utilizes widely available algorithmic questions and their codes to generate logical reasoning data.

8 Updated Aug 4, 2025

LogicBench is a natural language question-answering dataset consisting of 25 different reasoning patterns spanning over propositional, first-order, and non-monotonic logics.

33 3 Updated May 2, 2024

Train your Agent model via our easy and efficient framework

Python 1,668 156 Updated Dec 5, 2025
Python 332 24 Updated Aug 29, 2025

verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in-Group Policy Optimization for LLM Agent Training"

Python 1,305 117 Updated Dec 11, 2025

SkyRL: A Modular Full-stack RL Library for LLMs

Python 1,397 204 Updated Dec 23, 2025

Training VLM agents with multi-turn reinforcement learning

Python 350 42 Updated Dec 1, 2025

A live stream development of RL tunning for LLM agents

Python 3,686 514 Updated Oct 8, 2025

RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.

Jupyter Notebook 2,448 195 Updated Dec 3, 2025

[ICML 2025 Oral] CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction

Python 565 32 Updated May 6, 2025
4 Updated Aug 30, 2024

LLM/VLM gaming agents and model evaluation through games.

Python 833 88 Updated Nov 16, 2025

A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning

Python 330 77 Updated Oct 29, 2025

The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra

TypeScript 20,074 1,911 Updated Dec 15, 2025

Multiple datasets for ARC (Abstraction and Reasoning Corpus)

Python 85 15 Updated Mar 28, 2025

☁️ KUMO: Generative Evaluation of Complex Reasoning in Large Language Models

Jupyter Notebook 19 Updated Jun 4, 2025

Code and Data for ACL 2025 Paper "Aristotle: Mastering Logical Reasoning with A Logic-Complete Decompose-Search-Resolve Framework".

Python 22 4 Updated Oct 3, 2025

[COLM 2025] Official repository for R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents

Python 214 41 Updated Jul 13, 2025

Materials for ConceptARC paper

109 9 Updated Nov 6, 2024

Understanding R1-Zero-Like Training: A Critical Perspective

Python 1,177 53 Updated Aug 27, 2025

The Abstraction and Reasoning Corpus

JavaScript 4,677 700 Updated Apr 4, 2025

Logical puzzles generator

Python 1 Updated Dec 7, 2024

[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Python 1,283 106 Updated Dec 15, 2025
Next