Stars
fmchisel: Efficient Compression and Training Algorithms for Foundation Models
Efficient Triton Kernels for LLM Training
The official Rust SDK for the Model Context Protocol
Build resilient language agents as graphs.
A Datacenter Scale Distributed Inference Serving Framework
TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM gateway, observability, optimization, evaluation, and experimentation.
A command-line interface tool for serving LLM using vLLM.
Gensis is a lightweight deep learning framework written from scratch in Python, with Triton as its backend for high-performance computing.
A minimal, easy-to-read PyTorch reimplementation of the Qwen3 and Qwen2.5 VL with a fancy CLI
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
Examples and guides for using the OpenAI API
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serving systems.
OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)
The simplest, fastest repository for training/finetuning small-sized VLMs.
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
verl: Volcano Engine Reinforcement Learning for LLMs
A Java library to use the OpenAI Api in the simplest possible way.
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation