Stars
HAMi-core compiles libvgpu.so, which ensures hard limit on GPU in container
A workload for deploying LLM inference services on Kubernetes
SGLang is a fast serving framework for large language models and vision language models.
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
verl: Volcano Engine Reinforcement Learning for LLMs
My learning notes/codes for ML SYS.
CUDA Templates and Python DSLs for High-Performance Linear Algebra
vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
Production-Grade Container Scheduling and Management
AI Native Data App Development framework with AWEL(Agentic Workflow Expression Language) and Agents
Wan: Open and Advanced Large-Scale Video Generative Models
Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.
Chat2Graph: Graph Native Agentic System.
Cost-efficient and pluggable Infrastructure components for GenAI inference
FlashInfer: Kernel Library for LLM Serving
KCL Programming Language (CNCF Sandbox Project). https://kcl-lang.io
LeaderWorkerSet: An API for deploying a group of pods as a unit of replication
Heterogeneous AI Computing Virtualization Middleware(Project under CNCF)
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
FlashMLA: Efficient Multi-head Latent Attention Kernels
HugeSCM - A next generation cloud-based version control system
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Efficient and easy multi-instance LLM serving
The Triton TensorRT-LLM Backend
AI 基础知识 - GPU 架构、CUDA 编程、大模型基础及AI Agent 相关知识