-
tinker-cookbook Public
Forked from thinking-machines-lab/tinker-cookbookPost-training with Tinker
Python Apache License 2.0 UpdatedOct 29, 2025 -
MMaDA Public
Forked from Gen-Verse/MMaDAMMaDA - Open-Sourced Multimodal Large Diffusion Language Models
Python MIT License UpdatedSep 9, 2025 -
VLM-R1 Public
Forked from om-ai-lab/VLM-R1Solve Visual Understanding with Reinforced VLMs
Python Apache License 2.0 UpdatedAug 29, 2025 -
mixture_of_recursions Public
Forked from raymin0223/mixture_of_recursionsMixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation
Python Apache License 2.0 UpdatedAug 5, 2025 -
autogen Public
Forked from microsoft/autogenA programming framework for agentic AI 🤖 PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour
Python Creative Commons Attribution 4.0 International UpdatedAug 2, 2025 -
TokenSkip Public
Forked from hemingkx/TokenSkip[EMNLP 2025] TokenSkip: Controllable Chain-of-Thought Compression in LLMs
Python Apache License 2.0 UpdatedJun 28, 2025 -
inksight Public
Forked from google-research/inksightJupyter Notebook Apache License 2.0 UpdatedJun 24, 2025 -
LC-R1 Public
Forked from zxiangx/LC-R1Code for paper: Optimizing Length Compression in Large Reasoning Models
Python UpdatedJun 24, 2025 -
ThinkPrune Public
Forked from UCSB-NLP-Chang/ThinkPrunePython Apache License 2.0 UpdatedApr 16, 2025 -
LightThinker Public
Forked from zjunlp/LightThinker[EMNLP 2025] LightThinker: Thinking Step-by-Step Compression
Python MIT License UpdatedApr 12, 2025 -
O1-Pruner Public
Forked from StarDewXXX/O1-PrunerOfficial repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
Python MIT License UpdatedFeb 21, 2025 -
-
unlock-deepseek Public
Forked from datawhalechina/unlock-deepseekDeepSeek 系列工作解读、扩展和复现。
Python UpdatedFeb 15, 2025 -
verifiers Public
Forked from PrimeIntellect-ai/verifiersVerifiers for LLM Reinforcement Learning
Python UpdatedFeb 15, 2025 -
PRIME Public
Forked from PRIME-RL/PRIMEScalable RL solution for the advanced reasoning of language models
Python Apache License 2.0 UpdatedFeb 14, 2025 -
MM-self-improve-qwen2vl Public
Forked from Liac-li/MM-self-improve-qwen2vlPython Apache License 2.0 UpdatedFeb 14, 2025 -
openr Public
Forked from openreasoner/openrOpenR: An Open Source Framework for Advanced Reasoning with Large Language Models
Python MIT License UpdatedFeb 14, 2025 -
simpleRL-reason Public
Forked from hkust-nlp/simpleRL-reasonThis is a replicate of DeepSeek-R1-Zero and DeepSeek-R1 training on small models with limited data
Python MIT License UpdatedFeb 7, 2025 -
OpenRLHF Public
Forked from OpenRLHF/OpenRLHFAn Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
Python Apache License 2.0 UpdatedJan 20, 2025 -
trl Public
Forked from huggingface/trlTrain transformer language models with reinforcement learning.
Python Apache License 2.0 UpdatedNov 14, 2024 -
LLM-Dojo Public
Forked from mst272/LLM-Dojo欢迎来到 LLM-Dojo,这里是一个开源大模型学习场所,使用简洁且易阅读的代码构建模型训练框架(支持各种主流模型如Qwen、Llama、GLM等等)、RLHF框架(DPO/CPO/KTO/PPO)等各种功能。👩🎓👨🎓
Python UpdatedNov 8, 2024 -
-
LLaVA-MoD Public
Forked from shufangxun/LLaVA-MoDMaking LLaVA Tiny via MoE-Knowledge Distillation
Python Apache License 2.0 UpdatedOct 24, 2024 -
Self-Correcting-LLM--Reinforcement-Learning- Public
Forked from sanowl/Self-Correcting-LLM--Reinforcement-Learning-This my attempt to create Self-Correcting-LLM based on the paper Training Language Models to Self-Correct via Reinforcement Learning by google
Python UpdatedOct 16, 2024 -
TinyLLaVA_Factory Public
Forked from TinyLLaVA/TinyLLaVA_FactoryA Framework of Small-scale Large Multimodal Models
Python Apache License 2.0 UpdatedOct 16, 2024 -
-
SuperCorrect-llm Public
Forked from YangLing0818/SuperCorrect-llmSuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights
Python UpdatedOct 14, 2024 -
g1 Public
Forked from build-with-groq/g1g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains
Python MIT License UpdatedOct 7, 2024 -
Google_SCoRe Public
Forked from daje0601/Google_SCoRePaper Reproduction Google SCoRE(Training Language Models to Self-Correct via Reinforcement Learning)
Jupyter Notebook Apache License 2.0 UpdatedSep 21, 2024 -
GOT-OCR2.0 Public
Forked from Ucas-HaoranWei/GOT-OCR2.0Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Python UpdatedSep 19, 2024