sglang
Here are 106 public repositories matching this topic...
🔗 Accelerate AI inference with MatrixHub, a self-hosted model registry that ensures zero-wait distribution and secure private access for enterprise workloads.
-
Updated
Apr 27, 2026
🤖 Automate local private AI setups for demos, showcasing models for diverse tasks like coding, image generation, and business planning effectively.
-
Updated
Apr 27, 2026 - Shell
An Open-source, self-hosted AI model hub with Hugging Face compatibility, accelerating vLLM/SGLang performance.
-
Updated
Apr 27, 2026 - Go
High-performance lightweight proxy and load balancer for LLM infrastructure. Intelligent routing, automatic failover and unified model discovery across local and remote inference backends.
-
Updated
Apr 27, 2026 - Go
AMD ROCm (gfx1030) inference fork with RotorQuant/TurboQuant KV compression, PHANTOM-X zero-copy draft speculation, EAGLE3 speculative decoding, 12 RDNA2 crash fixes, and PrismML Bonsai Q1_0_G128 1-bit GGUF support.
-
Updated
Apr 27, 2026 - Python
A lightweight OpenAI & Anthropic protocol aggregation wrapper, similar to LiteLLM but with a more streamlined feature set.
-
Updated
Apr 27, 2026 - HTML
A GPU cluster manager that configures and orchestrates inference engines like vLLM and SGLang for high-performance AI model deployment.
-
Updated
Apr 27, 2026 - Python
A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.
-
Updated
Apr 27, 2026 - Python
Automated system for LLM evaluation via agents. Doc as below:
-
Updated
Apr 27, 2026 - Python
LLM model quantization (compression) toolkit with HW acceleration support for Nvidia, AMD, Intel GPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.
-
Updated
Apr 27, 2026 - Python
Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat history, tokenization caching, Responses API, embeddings, WASM plugins, MCP, and multi-tenant auth.
-
Updated
Apr 27, 2026 - Rust
Train and customize OpenClaw agents using reinforcement learning with simple language feedback and fully asynchronous optimization.
-
Updated
Apr 27, 2026 - JavaScript
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
-
Updated
Apr 26, 2026 - Python
Improve this page
Add a description, image, and links to the sglang topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the sglang topic, visit your repo's landing page and select "manage topics."