Stars
Large Language Model (LLM) Systems Paper List
ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale
LLMServingSim 2.0: A Unified Simulator for Heterogeneous and Disaggregated LLM Serving Infrastructure
ykcombat / sglang
Forked from sgl-project/sglangSGLang is a fast serving framework for large language models and vision language models.
Supercharge Your LLM with the Fastest KV Cache Layer
A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
Achieve state of the art inference performance with modern accelerators on Kubernetes
❤️ 1000+ Hand-Crafted Go Examples, Exercises, and Quizzes. 🚀 Learn Go by fixing 1000+ tiny programs.
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs
Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.6, DeepSeek, gpt-oss locally.
Running large language models on a single GPU for throughput-oriented scenarios.
Text-audio foundation model from Boson AI
SGLang is a high-performance serving framework for large language models and multimodal models.
Hackable and optimized Transformers building blocks, supporting a composable construction.
A high-throughput and memory-efficient inference and serving engine for LLMs
Reference implementations of MLPerf® inference benchmarks
FlashInfer: Kernel Library for LLM Serving
Flash Attention in ~100 lines of CUDA (forward pass only)
Fast and memory-efficient exact attention
A Datacenter Scale Distributed Inference Serving Framework
Fast CUDA matrix multiplication from scratch
GoogleTest - Google Testing and Mocking Framework
CUDA Matrix Multiplication Optimization