Stars
Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
A PyTorch native platform for training generative AI models
Post-training with Tinker
SWE-agent takes a GitHub issue and tries to automatically fix it, using your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges. [NeurIPS 2024]
[NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents
slime is an LLM post-training framework for RL Scaling.
SGLang is a high-performance serving framework for large language models and multimodal models.
The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while control…
Our library for RL environments + evals
Lightweight coding agent that runs in your terminal
Renderer for the harmony response format to be used with gpt-oss
Copilot Chat extension for VS Code
Code at the speed of thought – Zed is a high-performance, multiplayer code editor from the creators of Atom and Tree-sitter.
✨ A synthetic dataset generation framework that produces diverse coding questions and verifiable solutions - all in one framwork
This repo contains the dataset and code for the paper "SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?"
Model Context Protocol Servers
andyl98 / trl
Forked from huggingface/trlTrain transformer language models with reinforcement learning.
TransMLA: Multi-Head Latent Attention Is All You Need (NeurIPS 2025 Spotlight)
verl: Volcano Engine Reinforcement Learning for LLMs
Fully open reproduction of DeepSeek-R1
⏩ Source-controlled AI checks, enforceable in CI. Powered by the open-source Continue CLI
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & TIS & vLLM & Ray & Async RL)
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
The Open Cookbook for Top-Tier Code Large Language Model