Skip to content
View samzong's full-sized avatar
👋
👋

Highlights

  • Pro

Organizations

@DaoCloud @kubernetes @istio @kubernetes-sigs @karmada-io @lfapac-open-source-evangelist @InftyAI @Project-HAMi @d-run

Block or report samzong

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 2,448 331 Updated Dec 19, 2025

Claude Code superpowers: core skills library

Shell 10,833 899 Updated Dec 18, 2025

A curated list of awesome Claude Skills, resources, and tools for customizing Claude AI workflows

Python 7,987 881 Updated Dec 11, 2025

A high-performance and light-weight router for vLLM large scale deployment

Rust 60 11 Updated Dec 17, 2025

free codex and claude code if you have github copilot !

TypeScript 14 3 Updated Dec 10, 2025

Swagger Online is a lightweight React tool that aggregates multiple Swagger/OpenAPI specs into one unified, searchable, and comparable interface.

JavaScript 1 Updated Dec 12, 2025

Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search

Go 41,842 3,733 Updated Dec 22, 2025

Containerd snapshots quota NRI plugin, user can set every container ephemeral storage, but in ephemeral storage use full pod will not restart.

Go 4 1 Updated Jun 24, 2025

A framework for efficient model inference with omni-modality models

Python 1,234 165 Updated Dec 22, 2025

ArkSphere website

TypeScript 4 Updated Dec 13, 2025

ArkSphere Community

5 1 Updated Dec 1, 2025

Discover ingress-nginx usage and auto-generate Gateway API migration plans before ingress-nginx reaches end-of-life (March 2026).

Go 13 Updated Nov 26, 2025

Persist and reuse KV Cache to speedup your LLM.

Python 215 54 Updated Dec 22, 2025

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 12,446 1,970 Updated Dec 22, 2025

An early research stage expert-parallel load balancer for MoE models based on linear programming.

Python 472 27 Updated Nov 19, 2025

Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Zk

Python 1,887 156 Updated Dec 22, 2025

Fastest LLM gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.

Go 1,407 145 Updated Dec 22, 2025

NVIDIA Inference Xfer Library (NIXL)

C++ 775 209 Updated Dec 22, 2025

A fast multi-producer, multi-consumer lock-free concurrent queue for C++11

C++ 11,891 1,865 Updated Jul 6, 2025

A high-performance inference system for large language models, designed for production environments.

C++ 489 37 Updated Dec 19, 2025

KV cache store for distributed LLM inference

C++ 377 32 Updated Nov 13, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 4,325 613 Updated Dec 22, 2025

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

Python 1,356 91 Updated Dec 20, 2025

Repository for out-of-tree scheduler plugins based on scheduler framework.

Go 1,259 582 Updated Dec 5, 2025

Achieve state of the art inference performance with modern accelerators on Kubernetes

Shell 2,206 267 Updated Dec 19, 2025

The Intelligent Inference Scheduler for Large-scale Inference Services.

Go 43 9 Updated Nov 25, 2025

⚒️ AlphaTrion is an open-source framework to help build GenAI applications, including experiment tracking, adaptive model routing, prompt optimization and performance evaluation.

Python 11 4 Updated Dec 19, 2025

containerd sandbox runtime using vms

Go 66 11 Updated Dec 18, 2025

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python 4,854 329 Updated Nov 28, 2025

RouterArena: An open framework for evaluating LLM routers with standardized datasets, metrics, an automated framework, and a live leaderboard.

Python 55 5 Updated Dec 8, 2025
Next