zlngan

Follow

zlngan

Follow

3 followers · 0 following

Achievements

Achievements

Stars

heyman / heynote

A dedicated scratchpad for power users

JavaScript 5,176 258 Updated Feb 11, 2026

HiThink-Research / CCPO

Compress2Focus: Efficient Coordinate Compression for Policy Optimization in Multi-Turn GUI Agents

Python 7 Updated Jan 21, 2026

HiThink-Research / FinMTM

FinMTM: A Multi-Turn Multimodal Benchmark for Financial Reasoning and Agent Evaluation

Python 22 Updated Feb 6, 2026

ChenShawn / MultiModal-Jupyter-Sandbox

Simple code sandbox supporting jupyter notebook style code execution. Used for agent training

Python 21 2 Updated Dec 5, 2025

openai / gpt-oss

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 19,791 2,037 Updated Jan 13, 2026

Visual-Agent / DeepEyesV2

Python 522 51 Updated Jan 28, 2026

alexzhang13 / rlm-minimal

Super basic implementation (gist-like) of RLMs with REPL environments.

Python 679 108 Updated Jan 7, 2026

FoundationAgents / ReCode

Next paradigm for LLM Agent. Unify plan and action through recursive code generation for adaptive, human-like decision-making.

Python 536 62 Updated Dec 1, 2025

HiThink-Research / PuzzleClone

PuzzleClone: An SMT-Powered Framework for Synthesizing Verified Mathematical Reasoning Data

Python 5 Updated Jan 9, 2026

HiThink-Research / GAGE

General AI evaluation and Gauge Engine. A unified evaluation engine for LLMs, MLLMs, audio, and diffusion models.

Python 40 5 Updated Feb 11, 2026

puzzleclone / PuzzleClone

The official code of PuzzleClone (submitted to ACL'26)

Python 2 Updated Jan 12, 2026

RUC-NLPIR / Tool-Star

🔧Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learning

Python 318 21 Updated Jan 3, 2026

HuaYaoAI / FinGenius

Python 2,578 745 Updated Aug 6, 2025

Vespa314 / chan.py

开放式的缠论python实现框架，支持形态学/动力学买卖点分析计算，多级别K线联立，区间套策略，可视化绘图，多种数据接入，策略开发，交易系统对接；

Python 1,582 627 Updated Dec 26, 2025

HiThink-Research / BizFinBench

A Business-Driven Real-World Financial Benchmark for Evaluating LLMs

Python 224 9 Updated Jan 9, 2026

zhoujx4 / python-node-deepresearch

deepResearch

Python 87 12 Updated Apr 23, 2025

BytedTsinghua-SIA / DAPO

An Open-source RL System from ByteDance Seed and Tsinghua AIR

Python 1,733 80 Updated May 11, 2025

ModalMinds / MM-EUREKA

MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning

Python 769 31 Updated Sep 7, 2025

verl-project / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 19,259 3,257 Updated Feb 18, 2026

StarsfieldAI / R1-V

Witness the aha moment of VLM with less than $3.

Python 4,033 285 Updated May 19, 2025

FanqingM / MM-Eureka-V0

MM-Eureka V0 also called R1-Multimodal-Journey, Latest version is in MM-Eureka

Python 324 11 Updated Jun 21, 2025

huggingface / open-r1

Fully open reproduction of DeepSeek-R1

Python 25,884 2,413 Updated Nov 24, 2025

Unakar / Logic-RL

Reproduce R1 Zero on Logic Puzzle

Python 2,435 164 Updated Mar 20, 2025

Jiayi-Pan / TinyZero

Minimal reproduction of DeepSeek R1-Zero

Python 12,761 1,555 Updated Apr 24, 2025

hkust-nlp / simpleRL-reason

Simple RL training for reasoning

Python 3,827 283 Updated Dec 23, 2025

srush / awesome-o1

A bibliography and survey of the papers surrounding o1

TeX 1,211 51 Updated Nov 16, 2024

OpenRLHF / OpenRLHF

An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & TIS & vLLM & Ray & Async RL)

Python 9,003 879 Updated Feb 6, 2026

deepseek-ai / DeepSeek-R1

91,842 11,773 Updated Jun 27, 2025

modelscope / ms-swift

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …

Python 12,679 1,203 Updated Feb 18, 2026

PRIME-RL / PRIME

Scalable RL solution for advanced reasoning of language models

Python 1,805 103 Updated Mar 18, 2025