samzong

👋

samzong samzong

👋

Focus AI & Cloud Native. at @DaoCloud

121 followers · 60 following

@DaoCloud
21:30 (UTC +08:00)
in/samzong

Achievements

x3 x3

Achievements

x3 x3

Highlights

Organizations

Lists (17)

Sort

Starred repositories

vllm-project / llm-compressor

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 2,448 331 Updated Dec 19, 2025

obra / superpowers

Claude Code superpowers: core skills library

Shell 10,833 899 Updated Dec 18, 2025

ComposioHQ / awesome-claude-skills

A curated list of awesome Claude Skills, resources, and tools for customizing Claude AI workflows

Python 7,987 881 Updated Dec 11, 2025

vllm-project / router

A high-performance and light-weight router for vLLM large scale deployment

Rust 60 11 Updated Dec 17, 2025

xianml / copilot-api-pro

free codex and claude code if you have github copilot !

TypeScript 14 3 Updated Dec 10, 2025

samzong / swagger-online

Swagger Online is a lightweight React tool that aggregates multiple Swagger/OpenAPI specs into one unified, searchable, and comparable interface.

JavaScript 1 Updated Dec 12, 2025

milvus-io / milvus

Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search

Go 41,842 3,733 Updated Dec 22, 2025

lengrongfu / snapshots-quota

Containerd snapshots quota NRI plugin, user can set every container ephemeral storage, but in ephemeral storage use full pod will not restart.

Go 4 1 Updated Jun 24, 2025

vllm-project / vllm-omni

A framework for efficient model inference with omni-modality models

Python 1,234 165 Updated Dec 22, 2025

arksphere / website

ArkSphere website

TypeScript 4 Updated Dec 13, 2025

arksphere / community

ArkSphere Community

5 1 Updated Dec 1, 2025

ubermorgenland / ingress-migration-kit

Discover ingress-nginx usage and auto-generate Gateway API migration plans before ingress-nginx reaches end-of-life (March 2026).

Go 13 Updated Nov 26, 2025

ModelEngine-Group / unified-cache-management

Persist and reuse KV Cache to speedup your LLM.

Python 215 54 Updated Dec 22, 2025

NVIDIA / TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 12,446 1,970 Updated Dec 22, 2025

deepseek-ai / LPLB

An early research stage expert-parallel load balancer for MoE models based on linear programming.

Python 472 27 Updated Nov 19, 2025

lemonade-sdk / lemonade

Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Zk

Python 1,887 156 Updated Dec 22, 2025

maximhq / bifrost

Fastest LLM gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.

Go 1,407 145 Updated Dec 22, 2025

ai-dynamo / nixl

NVIDIA Inference Xfer Library (NIXL)

C++ 775 209 Updated Dec 22, 2025

cameron314 / concurrentqueue

A fast multi-producer, multi-consumer lock-free concurrent queue for C++11

C++ 11,891 1,865 Updated Jul 6, 2025

vectorch-ai / ScaleLLM

A high-performance inference system for large language models, designed for production environments.

C++ 489 37 Updated Dec 19, 2025

bytedance / InfiniStore

KV cache store for distributed LLM inference

C++ 377 32 Updated Nov 13, 2025

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 4,325 613 Updated Dec 22, 2025

thu-pacman / chitu

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

Python 1,356 91 Updated Dec 20, 2025

kubernetes-sigs / scheduler-plugins

Repository for out-of-tree scheduler plugins based on scheduler framework.

Go 1,259 582 Updated Dec 5, 2025

llm-d / llm-d

Achieve state of the art inference performance with modern accelerators on Kubernetes

Shell 2,206 267 Updated Dec 19, 2025

aigw-project / aigw

The Intelligent Inference Scheduler for Large-scale Inference Services.

Go 43 9 Updated Nov 25, 2025

InftyAI / alphatrion

⚒️ AlphaTrion is an open-source framework to help build GenAI applications, including experiment tracking, adaptive model routing, prompt optimization and performance evaluation.

Python 11 4 Updated Dec 19, 2025