Stars
Production-Grade Container Scheduling and Management
A high-throughput and memory-efficient inference and serving engine for LLMs
Distributed reliable key-value store for the most critical data of a distributed system
Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
SGLang is a fast serving framework for large language models and vision language models.
AI Native Data App Development framework with AWEL(Agentic Workflow Expression Language) and Agents
NVIDIA Linux open GPU kernel module source
verl: Volcano Engine Reinforcement Learning for LLMs
Wan: Open and Advanced Large-Scale Video Generative Models
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
The official GitHub page for the survey paper "A Survey of Large Language Models".
FlashMLA: Efficient Multi-head Latent Attention Kernels
vCluster - Create fully functional virtual Kubernetes clusters - Each vcluster runs inside a namespace of the underlying k8s cluster. It's cheaper than creating separate full-blown clusters and it …
CUDA Templates and Python DSLs for High-Performance Linear Algebra
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
Automated management of large-scale applications on Kubernetes (incubating project under CNCF)
An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models
Cost-efficient and pluggable Infrastructure components for GenAI inference
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
My learning notes/codes for ML SYS.
FlashInfer: Kernel Library for LLM Serving