Stars
Production-Grade Container Scheduling and Management
Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.
SGLang is a fast serving framework for large language models and vision language models.
A high-throughput and memory-efficient inference and serving engine for LLMs
An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models
Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.
A workload for deploying LLM inference services on Kubernetes
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
vCluster - Create fully functional virtual Kubernetes clusters - Each vcluster runs inside a namespace of the underlying k8s cluster. It's cheaper than creating separate full-blown clusters and it …
Distributed reliable key-value store for the most critical data of a distributed system
FlashInfer: Kernel Library for LLM Serving
Manage k8s resources effectively with risk under control.
verl: Volcano Engine Reinforcement Learning for LLMs
The Triton TensorRT-LLM Backend
Heterogeneous AI Computing Virtualization Middleware(Project under CNCF)
CUDA Templates and Python DSLs for High-Performance Linear Algebra
My learning notes/codes for ML SYS.
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Declarative Intent Driven Platform Orchestrator for Internal Developer Platform (IDP).
Cost-efficient and pluggable Infrastructure components for GenAI inference
HAMi-core compiles libvgpu.so, which ensures hard limit on GPU in container
HugeSCM - A next generation cloud-based version control system
AI Native Data App Development framework with AWEL(Agentic Workflow Expression Language) and Agents
vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
Multi-Cluster application progressive delivery controller