Skip to content
View pengzju's full-sized avatar
  • Zhejiang University

Block or report pengzju

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning.

Python 25 1 Updated Oct 7, 2025

xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

Jupyter Notebook 146 7 Updated Nov 13, 2025

HPA-HLE is an open-source framework for Human Last Examing using multi-agent collaboration, dynamic routing, and entropy-reducing evaluation. It achieved 27.5% accuracy across multiple tests withou…

Python 2 1 Updated Jun 13, 2025

Resources for the Enigmata Project.

Python 81 7 Updated Aug 13, 2025

Fully open data curation for reasoning models

Python 2,233 186 Updated Dec 2, 2025

[ACL2025] A novel complex reasoning enhancement method that utilizes widely available algorithmic questions and their codes to generate logical reasoning data.

8 Updated Aug 4, 2025

LogicBench is a natural language question-answering dataset consisting of 25 different reasoning patterns spanning over propositional, first-order, and non-monotonic logics.

36 5 Updated May 2, 2024

Train your Agent model via our easy and efficient framework

Python 1,725 162 Updated Dec 5, 2025
Python 341 25 Updated Aug 29, 2025

verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in-Group Policy Optimization for LLM Agent Training"

Python 1,732 157 Updated Feb 27, 2026

SkyRL: A Modular Full-stack RL Library for LLMs

Python 1,714 285 Updated Mar 26, 2026

Training VLM agents with multi-turn reinforcement learning

Python 433 50 Updated Mar 25, 2026

A live stream development of RL tunning for LLM agents

Python 3,962 544 Updated Oct 8, 2025

RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.

Python 2,563 210 Updated Mar 26, 2026

[ICML 2025 Oral] CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction

Python 567 32 Updated May 6, 2025
4 Updated Aug 30, 2024

[ICLR 2026] LLM/VLM gaming agents and model evaluation through games.

Python 897 96 Updated Nov 16, 2025

A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning

Python 363 86 Updated Mar 26, 2026

The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra

TypeScript 29,094 2,852 Updated Mar 10, 2026

Multiple datasets for ARC (Abstraction and Reasoning Corpus)

Python 86 15 Updated Mar 28, 2025

☁️ KUMO: Generative Evaluation of Complex Reasoning in Large Language Models

Jupyter Notebook 19 1 Updated Jun 4, 2025

Code and Data for ACL 2025 Paper "Aristotle: Mastering Logical Reasoning with A Logic-Complete Decompose-Search-Resolve Framework".

Python 25 5 Updated Oct 3, 2025

[COLM 2025] Official repository for R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents

Python 255 56 Updated Jul 13, 2025

Materials for ConceptARC paper

117 9 Updated Feb 10, 2026

Understanding R1-Zero-Like Training: A Critical Perspective

Python 1,236 57 Updated Aug 27, 2025

The Abstraction and Reasoning Corpus

JavaScript 4,736 704 Updated Apr 4, 2025

Logical puzzles generator

Python 1 Updated Dec 7, 2024

[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Python 1,373 114 Updated Mar 25, 2026
Next