-
Seed-Prover Public
Forked from ByteDance-Seed/Seed-ProverLean Apache License 2.0 UpdatedDec 19, 2025 -
torchforge Public
Forked from meta-pytorch/torchforgePyTorch-native post-training at scale
Python BSD 3-Clause "New" or "Revised" License UpdatedNov 5, 2025 -
-
-
verl_megatron_practice Public
Forked from ISEEKYAN/verl_megatron_practice(best/better) practices of megatron on veRL and tuning guide
Shell Apache License 2.0 UpdatedSep 26, 2025 -
RLinf Public
Forked from RLinf/RLinfRLinf is a flexible and scalable open-source infrastructure designed for post-training foundation models (LLMs, VLMs, VLAs) via reinforcement learning.
Python Apache License 2.0 UpdatedSep 22, 2025 -
-
Verlog Public
Forked from WentseChen/VerlogVerlog: A Multi-turn RL framework for LLM agents
Python Apache License 2.0 UpdatedAug 16, 2025 -
-
IRL-VLA Public
Forked from IRL-VLA/IRL-VLAOfficial repo for IRL-VLA
Apache License 2.0 UpdatedAug 13, 2025 -
-
Agent_Foundation_Models Public
Forked from OPPO-PersonalAI/Agent_Foundation_ModelsChain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL.
Python Apache License 2.0 UpdatedAug 11, 2025 -
MiroRL Public
Forked from MiroMindAI/MiroRLMiroRL is an MCP-first reinforcement learning framework for deep research agent.
Python Apache License 2.0 UpdatedAug 8, 2025 -
JAxtar Public
Forked from tinker495/JAxtarJAxtar is a project with a JAX-native implementation of parallelizeable A* & Q* solver for neural heuristic search research.
Python MIT License UpdatedAug 7, 2025 -
-
terminal-bench-rl Public
Forked from Danau5tin/terminal-bench-rlGRPO training code which scales to 32xH100s for long horizon terminal/coding tasks. Base agent is now the top Qwen3 agent on Stanford's TerminalBench leaderboard.
-
vsag Public
Forked from antgroup/vsagvsag is a vector indexing library used for similarity search.
C++ Apache License 2.0 UpdatedJul 29, 2025 -
pipelining-sft Public
Forked from character-ai/pipelining-sftSimple and efficient DeepSeek V3 SFT using pipeline parallel and expert parallel, with both FP8 and BF16 trainings
Python MIT License UpdatedJul 27, 2025 -
DeepResearchAgent Public
Forked from SkyworkAI/DeepResearchAgentJavaScript MIT License UpdatedJul 25, 2025 -
mini-swe-agent Public
Forked from SWE-agent/mini-swe-agentThe 100 line AI agent that solves GitHub issues or helps you in your command line. Radically simple, no crazy configs, no giant monorepo—but scores 65% on SWE-bench verified!
Python MIT License UpdatedJul 25, 2025 -
agent-lightning Public
Forked from microsoft/agent-lightningPython MIT License UpdatedJul 24, 2025 -
llm-sandbox Public
Forked from vndee/llm-sandboxLightweight and portable LLM sandbox runtime (code interpreter) Python library.
Python MIT License UpdatedJul 23, 2025 -
FinGenius Public
Forked from HuaYaoAI/FinGeniusPython GNU General Public License v3.0 UpdatedJul 22, 2025 -
HRM Public
Forked from sapientinc/HRMHierarchical Reasoning Model Official Release
Python Apache License 2.0 UpdatedJul 21, 2025 -
Awesome-ML-SYS-Tutorial Public
Forked from zhaochenyang20/Awesome-ML-SYS-TutorialMy learning notes/codes for ML SYS.
Python Apache License 2.0 UpdatedJul 21, 2025 -
Awesome-Uncertainty-based-Reinforcement-Learning Public
Forked from falonss703/Awesome-Uncertainty-based-Reinforcement-Learning🔥🔥🔥Latest Papers, Codes on Uncertainty-based RL
UpdatedJun 20, 2025 -
slime Public
Forked from THUDM/slimeslime is a LLM post-training framework aiming at scaling RL.
Python Apache License 2.0 UpdatedJun 20, 2025 -
TreeRL Public
Forked from THUDM/TreeRLTreeRL: LLM Reinforcement Learning with On-Policy Tree Search in ACL'25
Python Apache License 2.0 UpdatedJun 16, 2025 -
Meta-rater Public
Forked from opendatalab/Meta-rater[ACL 2025] This is the official implementation for the paper: "Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models"
-
llm-reasoners Public
Forked from maitrix-org/llm-reasonersA library for advanced large language model reasoning
Python Apache License 2.0 UpdatedJun 10, 2025