sglang

Here are 54 public repositories matching this topic...

Gen-Verse / OpenClaw-RL

OpenClaw-RL: Train any agent simply by talking

async gui-application coding slime tinker memory-systems skill-learning rlhf sglang grpo on-policy-distillation openclaw-skills open-claw

Updated Apr 18, 2026
Python

gpustack / gpustack

Star

A GPU cluster manager that configures and orchestrates inference engines like vLLM and SGLang for high-performance AI model deployment.

cuda inference openai llama maas rocm ascend llm llm-serving vllm genai llm-inference qwen deepseek sglang distributed-inference high-performance-inference mindie

Updated Apr 17, 2026
Python

MOSS-TTSD is a spoken dialogue generation model designed for expressive multi-speaker synthesis. It features long-context modeling, flexible speaker control, and multilingual support, while enabling zero-shot voice cloning from short audio references.

streaming finetune text-to-speeh large-language-models sglang speech-dialogue-generation

Updated Mar 23, 2026
Python

ModelCloud / GPTQModel

Star

LLM model quantization (compression) toolkit with HW acceleration support for Nvidia, AMD, Intel GPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.

transformers quantization optimum peft vllm gptq sglang

Updated Apr 19, 2026
Python

intel / auto-round

Star

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

transformers rounding quantization int4 llms vllm gguf vlms sglang mxfp4 nvfp4

Updated Apr 19, 2026
Python

OpenMOSS / MOVA

Star

MOVA: Towards Scalable and Synchronized Video–Audio Generation

multimodal diffusion-models sglang video-audio-generation

Updated Apr 1, 2026
Python

ovg-project / kvcached

Star

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

serverless inference-engine llm llm-serving vllm llm-inference ollama llm-framework sglang kvcache gpu-sharing kvcached gpu-mutiplexing kvcache-optimization elastic-kvcache online-offline-coserve

Updated Apr 7, 2026
Python

SemiAnalysisAI / InferenceX

Star

Open Source Continuous Inference Benchmarking Qwen3.5, DeepSeek, GPTOSS - GB200 NVL72 vs MI355X vs B200 vs GB300 NVL72 vs H100 & soon™ TPUv6e/v7/Trainium2/3

benchmark ai amd cuda pytorch nvidia rocm llm vllm sglang gb200

Updated Apr 18, 2026
Python

sgl-project / SpecForge

Star

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

training eagle pytorch llm fsdp sglang eagle3

Updated Apr 2, 2026
Python

HuiResearch / FlashTTS

Star

基于SparkTTS、OrpheusTTS等模型，提供高质量中文语音合成与声音克隆服务。

vllm sglang llamacpp-python sparktts spark-tts orpheus-tts megatts3 flashtts

Updated May 18, 2025
Python

shell-nlp / gpt_server

Star

gpt_server是一个用于生产级部署LLMs、Embedding、Reranker、ASR、TTS、文生图、图片编辑和文生视频的开源框架。

tts openai llama gpt infinity embedding asr text-moderation llm prompt-injection vllm fastchat function-calling rerank sglang lmdeploy

Updated Mar 22, 2026
Python

spark-arena / sparkrun

Star

sparkrun - launch, manage, and stop LLM inference workloads on NVIDIA DGX Spark systems

inference llama-cpp vllm sglang dgx-spark

Updated Apr 18, 2026
Python

VectorInstitute / vector-inference

Star

Efficient LLM inference on Slurm clusters.

inference speech-to-text vlm text-embedding multimodal audio-transcription llm vllm reward-model llm-infernece sglang llm-infrastructure

Updated Apr 13, 2026
Python

lightseekorg / TorchSpec

Star

A PyTorch native library for training speculative decoding models

pytorch mooncake llm vllm fsdp sglang eagle3 lightseek

Updated Apr 19, 2026
Python

guqiong96 / Lsglang

Star

Lsglang is a special extension of sglang that fully utilizes CPU and GPU computing resources with an efficient GPU parallel + NUMA parallel architecture, suitable for MOE model hybrid inference.