Stars
DFlash: Block Diffusion for Flash Speculative Decoding
Post-training framework for large models, from new objectives to new rollout systems.
slime is an LLM post-training framework for RL Scaling.
Use agent to learn agent - A skeleton course on how to design, build, and operate production AI agents
Code for ProactBench: Beyond What The User Asked For — measuring conversational proactivity in multi-turn LLM dialogues.
Code for Latent Speech-Text Transformer (LST)
An agent-managed museum exhibit, built in Rust with Gajae-Code / LazyCodex — developed and maintained with no human intervention.
ATLAS by General Intelligence Capital — Self-improving AI trading agents using Karpathy-style autoresearch
Our library for RL environments + evals
[NeurIPS 2025] Thinkless: LLM Learns When to Think
Open-core workflow engine powering Bubble Lab — and fully runnable, hostable, and extensible on its own.
An Agentic Framework for Reflective PowerPoint Generation
Tile primitives for speedy kernels
The Frontend Stack for Agents & Generative UI. React, Angular, Mobile, Slack, and more. Makers of the AG-UI Protocol
AG-UI: the Agent-User Interaction Protocol. Bring Agents into Frontend Applications.
A conversational, AI device + software framework for companionship, entertainment, education, healthcare, IoT applications, and DIY robotics. Built with Python, NextJS, Arduino, ESP32, LLMs (GPT-4o…
A Datacenter Scale Distributed Inference Serving Framework
Model Context Protocol Servers
Cost-efficient and pluggable Infrastructure components for GenAI inference
A script for creating your very own AI-Powered stock screener
Build reliable customer-facing AI agents with Parlant: an interaction control harness optimized for controlled, consistent, and predictable LLM interactions.
FlashInfer: Kernel Library for LLM Serving
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Verilog Ethernet components for FPGA implementation
CLI tool which uses the GitHub GraphQL API to rank users according to number of contributions, and corresponding static website.
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration