Stars
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang is a high-performance serving framework for large language models and multimodal models.
LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.
🎯An accuracy-first, highly efficient quantization toolkit for LLMs, designed to minimize quality degradation across Weight-Only Quantization, MXFP4, NVFP4, GGUF, and adaptive schemes.
An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs
ChromaDB-powered local indexing support for Cursor, exposed as an MCP server