Skip to content
View andyxning's full-sized avatar
🎉
I may be slow to respond.
🎉
I may be slow to respond.
  • Beijing, China

Organizations

@nsqio @kubernetes

Block or report andyxning

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 250 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

SkyRL: A Modular Full-stack RL Library for LLMs

Python 994 129 Updated Oct 9, 2025

LLMPerf is a library for validating and benchmarking LLMs

Python 1,021 191 Updated Dec 9, 2024

Manages Unified Access to Generative AI Services built on Envoy Gateway

Go 1,103 108 Updated Oct 9, 2025
C++ 507 41 Updated Sep 12, 2025

Inference server benchmarking tool

Rust 114 19 Updated Oct 2, 2025

Fluid, elastic data abstraction and acceleration for BigData/AI applications in cloud. (Project under CNCF)

Go 1,825 1,128 Updated Oct 9, 2025

Using CRDs to manage GPU resources in Kubernetes.

Go 209 28 Updated Nov 21, 2022

Simple, scalable AI model deployment on GPU clusters

Python 3,823 385 Updated Oct 9, 2025

A framework for serving and evaluating LLM routers - save LLM costs without compromising quality

Python 4,320 337 Updated Aug 10, 2024

A toolkit to run Ray applications on Kubernetes

Go 2,073 627 Updated Oct 9, 2025

Heterogeneous AI Computing Virtualization Middleware(Project under CNCF)

Go 2,225 384 Updated Oct 9, 2025

AI on GKE is a collection of examples, best-practices, and prebuilt solutions to help build, deploy, and scale AI Platforms on Google Kubernetes Engine

Jupyter Notebook 324 243 Updated Jun 23, 2025

Cloud Native Benchmarking of Foundation Models

Python 44 20 Updated Jul 31, 2025

Gateway API Inference Extension

Go 494 178 Updated Oct 9, 2025

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

C++ 1,232 172 Updated Aug 19, 2025

LLM KV cache compression made easy

Python 648 66 Updated Oct 9, 2025

Open source AI coding agent. Designed for large projects and real world tasks.

Go 14,520 1,022 Updated Oct 3, 2025

A CLI inspector for the Model Context Protocol

JavaScript 384 31 Updated Aug 13, 2025

Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs

Python 615 87 Updated Oct 9, 2025

Serving multiple LoRA finetuned LLM as one

Python 1,096 53 Updated May 8, 2024

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Python 1,855 112 Updated Jan 21, 2024

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 3,639 280 Updated Oct 9, 2025

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

Python 3,452 270 Updated May 21, 2025

A Datacenter Scale Distributed Inference Serving Framework

Rust 5,262 632 Updated Oct 9, 2025

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

Python 1,293 84 Updated Oct 7, 2025

Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.

370 22 Updated Mar 3, 2025

📰 Must-read papers on KV Cache Compression (constantly updating 🤗).

553 13 Updated Sep 30, 2025

My learning notes/codes for ML SYS.

Python 3,823 232 Updated Oct 6, 2025
Next