pengzju

Follow

peng2001 pengzju

Follow

38 followers · 15 following

Zhejiang University

Stars

hkust-nlp / RL-Verifier-Robustness

From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning.

Python 23 1 Updated Oct 7, 2025

IAAR-Shanghai / xVerify

xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

Jupyter Notebook 143 7 Updated Nov 13, 2025

daonet / HPA-HLE

HPA-HLE is an open-source framework for Human Last Examing using multi-agent collaboration, dynamic routing, and entropy-reducing evaluation. It achieved 27.5% accuracy across multiple tests withou…

Python 2 1 Updated Jun 13, 2025

BytedTsinghua-SIA / Enigmata

Resources for the Enigmata Project.

Python 74 4 Updated Aug 13, 2025

open-thoughts / open-thoughts

Fully open data curation for reasoning models

Python 2,173 182 Updated Dec 2, 2025

jiangjin1999 / LogicPro

[ACL2025] A novel complex reasoning enhancement method that utilizes widely available algorithmic questions and their codes to generate logical reasoning data.

8 Updated Aug 4, 2025

Mihir3009 / LogicBench

LogicBench is a natural language question-answering dataset consisting of 25 different reasoning patterns spanning over propositional, first-order, and non-monotonic logics.

33 3 Updated May 2, 2024

Simple-Efficient / RL-Factory

Train your Agent model via our easy and efficient framework

Python 1,668 156 Updated Dec 5, 2025

InternLM / InternBootcamp

Python 332 24 Updated Aug 29, 2025

multimodal-art-projection / KORGym

Python 52 2 Updated May 21, 2025

langfengQ / verl-agent

verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in-Group Policy Optimization for LLM Agent Training"

Python 1,305 117 Updated Dec 11, 2025

NovaSky-AI / SkyRL

SkyRL: A Modular Full-stack RL Library for LLMs

Python 1,397 204 Updated Dec 23, 2025

mll-lab-nu / VAGEN

Training VLM agents with multi-turn reinforcement learning

Python 350 42 Updated Dec 1, 2025

OpenManus / OpenManus-RL

A live stream development of RL tunning for LLM agents

Python 3,686 514 Updated Oct 8, 2025

mll-lab-nu / RAGEN

RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.

Jupyter Notebook 2,448 195 Updated Dec 3, 2025

hkust-nlp / CodeIO

[ICML 2025 Oral] CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction

Python 565 32 Updated May 6, 2025

huziyi19 / RMath

4 Updated Aug 30, 2024

lmgame-org / GamingAgent

LLM/VLM gaming agents and model evaluation through games.

Python 833 88 Updated Nov 16, 2025

LeonGuertler / TextArena

A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning

Python 330 77 Updated Oct 29, 2025

bytedance / UI-TARS-desktop

The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra

TypeScript 20,074 1,911 Updated Dec 15, 2025

neoneye / arc-dataset-collection

Multiple datasets for ARC (Abstraction and Reasoning Corpus)

Python 85 15 Updated Mar 28, 2025

linhaowei1 / kumo

☁️ KUMO: Generative Evaluation of Complex Reasoning in Large Language Models

Jupyter Notebook 19 Updated Jun 4, 2025

Aiden0526 / Aristotle

Code and Data for ACL 2025 Paper "Aristotle: Mastering Logical Reasoning with A Logic-Complete Decompose-Search-Resolve Framework".

Python 22 4 Updated Oct 3, 2025

R2E-Gym / R2E-Gym

[COLM 2025] Official repository for R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents

Python 214 41 Updated Jul 13, 2025

MAC-AutoML / Awesome-Abstract-Reasoning-Benchmark-List

4 Updated Feb 26, 2025

victorvikram / ConceptARC

Materials for ConceptARC paper

109 9 Updated Nov 6, 2024

sail-sg / understand-r1-zero

Understanding R1-Zero-Like Training: A Critical Perspective

Python 1,177 53 Updated Aug 27, 2025

fchollet / ARC-AGI

The Abstraction and Reasoning Corpus

JavaScript 4,677 700 Updated Apr 4, 2025

warpdynamicsltd / guessn

Logical puzzles generator

Python 1 Updated Dec 7, 2024

open-thought / reasoning-gym

[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Python 1,283 106 Updated Dec 15, 2025