Skip to content
View pengzju's full-sized avatar
  • Zhejiang University

Block or report pengzju

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
200 results for source starred repositories
Clear filter

From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning.

Python 24 1 Updated Oct 7, 2025

xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

Jupyter Notebook 143 7 Updated Nov 13, 2025

HPA-HLE is an open-source framework for Human Last Examing using multi-agent collaboration, dynamic routing, and entropy-reducing evaluation. It achieved 27.5% accuracy across multiple tests withou…

Python 2 1 Updated Jun 13, 2025

Resources for the Enigmata Project.

Python 77 5 Updated Aug 13, 2025

Fully open data curation for reasoning models

Python 2,205 185 Updated Dec 2, 2025

[ACL2025] A novel complex reasoning enhancement method that utilizes widely available algorithmic questions and their codes to generate logical reasoning data.

8 Updated Aug 4, 2025

LogicBench is a natural language question-answering dataset consisting of 25 different reasoning patterns spanning over propositional, first-order, and non-monotonic logics.

36 4 Updated May 2, 2024

Train your Agent model via our easy and efficient framework

Python 1,701 159 Updated Dec 5, 2025
Python 333 25 Updated Aug 29, 2025

verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in-Group Policy Optimization for LLM Agent Training"

Python 1,492 130 Updated Jan 30, 2026

SkyRL: A Modular Full-stack RL Library for LLMs

Python 1,537 247 Updated Feb 4, 2026

Training VLM agents with multi-turn reinforcement learning

Python 390 43 Updated Feb 1, 2026

A live stream development of RL tunning for LLM agents

Python 3,889 533 Updated Oct 8, 2025

RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.

Python 2,504 205 Updated Jan 25, 2026

[ICML 2025 Oral] CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction

Python 566 32 Updated May 6, 2025
4 Updated Aug 30, 2024

[ICLR 2026] LLM/VLM gaming agents and model evaluation through games.

Python 857 91 Updated Nov 16, 2025

A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning

Python 348 81 Updated Feb 3, 2026

The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra

TypeScript 25,553 2,476 Updated Jan 14, 2026

Multiple datasets for ARC (Abstraction and Reasoning Corpus)

Python 87 15 Updated Mar 28, 2025

☁️ KUMO: Generative Evaluation of Complex Reasoning in Large Language Models

Jupyter Notebook 19 Updated Jun 4, 2025

Code and Data for ACL 2025 Paper "Aristotle: Mastering Logical Reasoning with A Logic-Complete Decompose-Search-Resolve Framework".

Python 23 5 Updated Oct 3, 2025

[COLM 2025] Official repository for R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents

Python 230 49 Updated Jul 13, 2025

Materials for ConceptARC paper

112 9 Updated Nov 6, 2024

Understanding R1-Zero-Like Training: A Critical Perspective

Python 1,205 56 Updated Aug 27, 2025

The Abstraction and Reasoning Corpus

JavaScript 4,716 700 Updated Apr 4, 2025

Logical puzzles generator

Python 1 Updated Dec 7, 2024

[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Python 1,329 112 Updated Jan 16, 2026
Next