Skip to content
View avtc's full-sized avatar

Block or report avtc

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
8 stars written in Python
Clear filter

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 69,666 13,250 Updated Feb 6, 2026

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 23,401 4,347 Updated Feb 6, 2026

Develop software autonomously.

Python 2,203 218 Updated Jan 30, 2026

LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.

Python 1,010 157 Updated Feb 6, 2026

🎯An accuracy-first, highly efficient quantization toolkit for LLMs, designed to minimize quality degradation across Weight-Only Quantization, MXFP4, NVFP4, GGUF, and adaptive schemes.

Python 845 78 Updated Feb 6, 2026

An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs

Python 626 69 Updated Jan 26, 2026

ChromaDB-powered local indexing support for Cursor, exposed as an MCP server

Python 33 11 Updated Mar 20, 2025
Python 18 1 Updated May 10, 2025