- Beijing, China
Starred repositories
SkyRL: A Modular Full-stack RL Library for LLMs
LLMPerf is a library for validating and benchmarking LLMs
Manages Unified Access to Generative AI Services built on Envoy Gateway
Inference server benchmarking tool
Fluid, elastic data abstraction and acceleration for BigData/AI applications in cloud. (Project under CNCF)
Using CRDs to manage GPU resources in Kubernetes.
Simple, scalable AI model deployment on GPU clusters
A framework for serving and evaluating LLM routers - save LLM costs without compromising quality
A toolkit to run Ray applications on Kubernetes
Heterogeneous AI Computing Virtualization Middleware(Project under CNCF)
AI on GKE is a collection of examples, best-practices, and prebuilt solutions to help build, deploy, and scale AI Platforms on Google Kubernetes Engine
Cloud Native Benchmarking of Foundation Models
Gateway API Inference Extension
A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
Open source AI coding agent. Designed for large projects and real world tasks.
A CLI inspector for the Model Context Protocol
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
A Datacenter Scale Distributed Inference Serving Framework
High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.
Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.
📰 Must-read papers on KV Cache Compression (constantly updating 🤗).
My learning notes/codes for ML SYS.