sglang

Star

Here are 106 public repositories matching this topic...

kvcache-ai / Mooncake

Star

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

inference rdma disaggregation llm vllm sglang kvcache

Updated Apr 27, 2026
C++

sakthismarther / matrixhub

Star

🔗 Accelerate AI inference with MatrixHub, a self-hosted model registry that ensures zero-wait distribution and secure private access for enterprise workloads.

kubernetes self-hosted mlops model-registry huggingface llm vllm llm-inference sglang

Updated Apr 27, 2026

SemiAnalysisAI / InferenceX

Star

Open Source Continuous Inference Benchmarking Qwen3.5, DeepSeek, GPTOSS - GB200 NVL72 vs MI355X vs B200 vs GB300 NVL72 vs H100 & soon™ TPUv6e/v7/Trainium2/3

benchmark ai amd cuda pytorch nvidia rocm llm vllm sglang gb200

Updated Apr 27, 2026
Python

bogdannadev / inference-ops

Star

Production Docker Compose configs for RAG pipelines

inference rag llamacpp vllm qwen sglang

Updated Apr 27, 2026

KnightLordHUN / private-ai-setup-dream-guide

Star

🤖 Automate local private AI setups for demos, showcasing models for diverse tasks like coding, image generation, and business planning effectively.

Updated Apr 27, 2026
Shell

matrixhub-ai / matrixhub

Star

An Open-source, self-hosted AI model hub with Hugging Face compatibility, accelerating vLLM/SGLang performance.

kubernetes self-hosted artificial-intelligence mlops model-registry huggingface llm vllm llm-inference sglang

Updated Apr 27, 2026
Go

thushan / olla

Sponsor

Star

High-performance lightweight proxy and load balancer for LLM infrastructure. Intelligent routing, automatic failover and unified model discovery across local and remote inference backends.

Updated Apr 27, 2026
Go

carlosfundora / sglang-1-bit-turbo

Star

AMD ROCm (gfx1030) inference fork with RotorQuant/TurboQuant KV compression, PHANTOM-X zero-copy draft speculation, EAGLE3 speculative decoding, 12 RDNA2 crash fixes, and PrismML Bonsai Q1_0_G128 1-bit GGUF support.

triton hip bonsai rocm amd-gpu gguf speculative-decoding sglang rdna2 eagle3 turboquant prismml gfx1030 p-eagle radix-cache

Updated Apr 27, 2026
Python

sczhengyabin / llm-wrapper

Star

A lightweight OpenAI & Anthropic protocol aggregation wrapper, similar to LiteLLM but with a more streamlined feature set.

docker wrapper openai-api llm vllm ollama sglang anthropic-api

Updated Apr 27, 2026
HTML

gpustack / gpustack

Star

A GPU cluster manager that configures and orchestrates inference engines like vLLM and SGLang for high-performance AI model deployment.

cuda inference openai llama maas rocm ascend llm llm-serving vllm genai llm-inference qwen deepseek sglang distributed-inference high-performance-inference mindie

Updated Apr 27, 2026
Python

intel / auto-round

Star

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

transformers rounding quantization int4 llms vllm gguf vlms sglang mxfp4 nvfp4

Updated Apr 27, 2026
Python

OpenDCAI / One-Eval

Star

Automated system for LLM evaluation via agents. Doc as below:

agent data-science data benchmark evaluation data-analysis agents llm llms vllm sglang

Updated Apr 27, 2026
Python

ModelCloud / GPTQModel

Star

LLM model quantization (compression) toolkit with HW acceleration support for Nvidia, AMD, Intel GPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.

transformers quantization optimum peft vllm gptq sglang

Updated Apr 27, 2026
Python

lightseekorg / TorchSpec

Star

A PyTorch native library for training speculative decoding models

pytorch mooncake llm vllm fsdp sglang eagle3 lightseek

Updated Apr 27, 2026
Python

lightseekorg / smg

Star

Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat history, tokenization caching, Responses API, embeddings, WASM plugins, MCP, and multi-tenant auth.