Stars
Compress2Focus: Efficient Coordinate Compression for Policy Optimization in Multi-Turn GUI Agents
FinMTM: A Multi-Turn Multimodal Benchmark for Financial Reasoning and Agent Evaluation
Simple code sandbox supporting jupyter notebook style code execution. Used for agent training
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
Super basic implementation (gist-like) of RLMs with REPL environments.
Next paradigm for LLM Agent. Unify plan and action through recursive code generation for adaptive, human-like decision-making.
PuzzleClone: An SMT-Powered Framework for Synthesizing Verified Mathematical Reasoning Data
General AI evaluation and Gauge Engine. A unified evaluation engine for LLMs, MLLMs, audio, and diffusion models.
The official code of PuzzleClone (submitted to ACL'26)
🔧Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learning
开放式的缠论python实现框架,支持形态学/动力学买卖点分析计算,多级别K线联立,区间套策略,可视化绘图,多种数据接入,策略开发,交易系统对接;
A Business-Driven Real-World Financial Benchmark for Evaluating LLMs
An Open-source RL System from ByteDance Seed and Tsinghua AIR
MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning
verl: Volcano Engine Reinforcement Learning for LLMs
Witness the aha moment of VLM with less than $3.
MM-Eureka V0 also called R1-Multimodal-Journey, Latest version is in MM-Eureka
Fully open reproduction of DeepSeek-R1
Minimal reproduction of DeepSeek R1-Zero
A bibliography and survey of the papers surrounding o1
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & TIS & vLLM & Ray & Async RL)
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …
Scalable RL solution for advanced reasoning of language models