Skip to content
View zlngan's full-sized avatar

Block or report zlngan

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A dedicated scratchpad for power users

JavaScript 5,176 258 Updated Feb 11, 2026

Compress2Focus: Efficient Coordinate Compression for Policy Optimization in Multi-Turn GUI Agents

Python 7 Updated Jan 21, 2026

FinMTM: A Multi-Turn Multimodal Benchmark for Financial Reasoning and Agent Evaluation

Python 22 Updated Feb 6, 2026

Simple code sandbox supporting jupyter notebook style code execution. Used for agent training

Python 21 2 Updated Dec 5, 2025

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 19,791 2,037 Updated Jan 13, 2026
Python 522 51 Updated Jan 28, 2026

Super basic implementation (gist-like) of RLMs with REPL environments.

Python 679 108 Updated Jan 7, 2026

Next paradigm for LLM Agent. Unify plan and action through recursive code generation for adaptive, human-like decision-making.

Python 536 62 Updated Dec 1, 2025

PuzzleClone: An SMT-Powered Framework for Synthesizing Verified Mathematical Reasoning Data

Python 5 Updated Jan 9, 2026

General AI evaluation and Gauge Engine. A unified evaluation engine for LLMs, MLLMs, audio, and diffusion models.

Python 40 5 Updated Feb 11, 2026

The official code of PuzzleClone (submitted to ACL'26)

Python 2 Updated Jan 12, 2026

🔧Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learning

Python 318 21 Updated Jan 3, 2026
Python 2,578 745 Updated Aug 6, 2025

开放式的缠论python实现框架,支持形态学/动力学买卖点分析计算,多级别K线联立,区间套策略,可视化绘图,多种数据接入,策略开发,交易系统对接;

Python 1,582 627 Updated Dec 26, 2025

A Business-Driven Real-World Financial Benchmark for Evaluating LLMs

Python 224 9 Updated Jan 9, 2026

deepResearch

Python 87 12 Updated Apr 23, 2025

An Open-source RL System from ByteDance Seed and Tsinghua AIR

Python 1,733 80 Updated May 11, 2025

MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning

Python 769 31 Updated Sep 7, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 19,259 3,257 Updated Feb 18, 2026

Witness the aha moment of VLM with less than $3.

Python 4,033 285 Updated May 19, 2025

MM-Eureka V0 also called R1-Multimodal-Journey, Latest version is in MM-Eureka

Python 324 11 Updated Jun 21, 2025

Fully open reproduction of DeepSeek-R1

Python 25,884 2,413 Updated Nov 24, 2025

Reproduce R1 Zero on Logic Puzzle

Python 2,435 164 Updated Mar 20, 2025

Minimal reproduction of DeepSeek R1-Zero

Python 12,761 1,555 Updated Apr 24, 2025

Simple RL training for reasoning

Python 3,827 283 Updated Dec 23, 2025

A bibliography and survey of the papers surrounding o1

TeX 1,211 51 Updated Nov 16, 2024

An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & TIS & vLLM & Ray & Async RL)

Python 9,003 879 Updated Feb 6, 2026

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …

Python 12,679 1,203 Updated Feb 18, 2026

Scalable RL solution for advanced reasoning of language models

Python 1,805 103 Updated Mar 18, 2025
Next