Stars
Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"
Implement a reasoning LLM in PyTorch from scratch, step by step
An Open-source RL System from ByteDance Seed and Tsinghua AIR
Minimal yet performant LLM examples in pure JAX
This repo contains the dataset and code for the paper "SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?"
[NeurIPS 2025 Spotlight] ReasonFlux (long-CoT), ReasonFlux-PRM (process reward model) and ReasonFlux-Coder (code generation)
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek, Qwen, Llama, Gemma, TTS 2x faster with 70% less VRAM.
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.
Janus-Series: Unified Multimodal Understanding and Generation Models
Minimal reproduction of DeepSeek R1-Zero
Fully open reproduction of DeepSeek-R1
DeepSeek-VL: Towards Real-World Vision-Language Understanding
DeepSeek LLM: Let there be answers
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
DeepSeek R1 distilled into smaller OSS models
A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.
CleanDiffuser: An Easy-to-use Modularized Library for Diffusion Models in Decision Making
Scalable RL solution for advanced reasoning of language models
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & TIS & vLLM & Ray & Async RL)
verl: Volcano Engine Reinforcement Learning for LLMs
Let your Claude able to think