andyxning

🎉

I may be slow to respond.

Ning Xie andyxning

🎉

I may be slow to respond.

Infrastructure || LLM Inference @kubernetes Member @nsqio Member

240 followers · 56 following

Beijing, China

Achievements

x2 x2 x3

Achievements

x2 x2 x3

Organizations

Starred repositories

NovaSky-AI / SkyRL

SkyRL: A Modular Full-stack RL Library for LLMs

Python 994 129 Updated Oct 9, 2025

ray-project / llmperf

LLMPerf is a library for validating and benchmarking LLMs

Python 1,021 191 Updated Dec 9, 2024

envoyproxy / ai-gateway

Manages Unified Access to Generative AI Services built on Envoy Gateway

Go 1,103 108 Updated Oct 9, 2025

Tencent / KsanaLLM

C++ 507 41 Updated Sep 12, 2025

huggingface / inference-benchmarker

Inference server benchmarking tool

Rust 114 19 Updated Oct 2, 2025

fluid-cloudnative / fluid

Fluid, elastic data abstraction and acceleration for BigData/AI applications in cloud. (Project under CNCF)

Go 1,825 1,128 Updated Oct 9, 2025

elastic-ai / elastic-gpu

Using CRDs to manage GPU resources in Kubernetes.

Go 209 28 Updated Nov 21, 2022

gpustack / gpustack

Simple, scalable AI model deployment on GPU clusters

Python 3,823 385 Updated Oct 9, 2025

lm-sys / RouteLLM

A framework for serving and evaluating LLM routers - save LLM costs without compromising quality

Python 4,320 337 Updated Aug 10, 2024

ray-project / kuberay

A toolkit to run Ray applications on Kubernetes

Go 2,073 627 Updated Oct 9, 2025

Project-HAMi / HAMi

Heterogeneous AI Computing Virtualization Middleware(Project under CNCF)

Go 2,225 384 Updated Oct 9, 2025

GoogleCloudPlatform / ai-on-gke

AI on GKE is a collection of examples, best-practices, and prebuilt solutions to help build, deploy, and scale AI Platforms on Google Kubernetes Engine

Jupyter Notebook 324 243 Updated Jun 23, 2025

openshift-psap / llm-load-test

Python 49 29 Updated Aug 1, 2025

triton-inference-server / perf_analyzer

Python 112 32 Updated Sep 9, 2025

fmperf-project / fmperf

Cloud Native Benchmarking of Foundation Models

Python 44 20 Updated Jul 31, 2025

kubernetes-sigs / gateway-api-inference-extension

Gateway API Inference Extension

Go 494 178 Updated Oct 9, 2025

NVIDIA / gdrcopy

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

C++ 1,232 172 Updated Aug 19, 2025

NVIDIA / kvpress

LLM KV cache compression made easy

Python 648 66 Updated Oct 9, 2025

plandex-ai / plandex

Open source AI coding agent. Designed for large projects and real world tasks.

Go 14,520 1,022 Updated Oct 3, 2025

wong2 / mcp-cli

A CLI inspector for the Model Context Protocol

JavaScript 384 31 Updated Aug 13, 2025

vllm-project / guidellm

Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs

Python 615 87 Updated Oct 9, 2025

punica-ai / punica

Serving multiple LoRA finetuned LLM as one

Python 1,096 53 Updated May 8, 2024

S-LoRA / S-LoRA

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Python 1,855 112 Updated Jan 21, 2024

ModelTC / LightLLM

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 3,639 280 Updated Oct 9, 2025