awq

Here are 31 public repositories matching this topic...

intel / neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

sparsity pruning quantization knowledge-distillation auto-tuning int8 low-precision quantization-aware-training post-training-quantization awq int4 large-language-models gptq smoothquant sparsegpt fp4 mxformat

Updated Apr 17, 2026
Python

ModelTC / LightCompress

Star

[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.

benchmark deployment tool evaluation pruning quantization wan awq large-language-models llm token-pruning vllm smoothquant token-reduction mixtral internlm2 token-merging deepseek-v3

Updated Apr 1, 2026
Python

hcd233 / Aris-AI-Model-Server

Star

An OpenAI Compatible API which integrates LLM, Embedding and Reranker. 一个集成 LLM、Embedding 和 Reranker 的 OpenAI 兼容 API

ai embedding mlx reranker rag fastapi sentence-transformers awq llm vllm gptq openai-compatible-api

Updated Aug 21, 2025
Python

harleyszhang / harleyszhang.github.io

Star

🧗‍♂️ harleyszhang 的个人博客

blog awq llm llm-inference

Updated Feb 10, 2026
HTML

rookiemann / vllm-windows-build

Star

Native Windows build of vLLM 0.19.0 — no WSL, no Docker. Pre-built wheels + 33-file Windows patch + Multi-TurboQuant KV cache compression (6 methods, 2x cache capacity). PyTorch 2.10 + CUDA 12.6 + Triton + Flash-Attention 2.

Updated Apr 12, 2026
Python

AEON-7 / supergemma4-26b-abliterated-multimodal-nvfp4

Star

NVFP4 AWQ Full quantization of SuperGemma4-26B-Abliterated-Multimodal for Blackwell GPUs — pre-built vLLM container + patches included

moe quantization multimodal blackwell awq llm vllm nvfp4 dgx-spark gemma4 modelopt

Updated Apr 15, 2026
Python

ShipItAndPray / turboquant

Star

Compress Any LLM Up to 6x in One Command. Unified CLI for GGUF, GPTQ, and AWQ quantization.

quantization model-compression awq llm llama-cpp vllm gptq ollama gguf

Updated Mar 25, 2026
Python

mtecnic / research-test-Qwen3-Coder-Next-REAP-AWQ

Star

Research Test: REAP expert pruning + AWQ quantization of Qwen3-Coder-Next MoE model

python machine-learning research ai deep-learning optimization transformers moe pruning quantization model-compression mixture-of-experts awq llm

Updated Apr 4, 2026
Python

psunlpgroup / Compression-Effects

Star

[ICLR2026] When Reasoning Meets Compression: Understanding the Effects of LLMs Compression on Large Reasoning Models. Support interpretation of Qwen, Llama, etc.

pruning quantization distillation awq llm mechanistic-interpretability gptq llm-compression

Updated Feb 3, 2026
Python

GURPREETKAURJETHRA / Quantize-LLM-using-AWQ

Star

Quantize LLM using AWQ

quantize awq large-language-models llms generative-ai llm-training

Updated Apr 26, 2024
Jupyter Notebook

neosun100 / kimi-linear-vllm-docker-serve

Star

Dockerized vLLM serving for Kimi-Linear-48B-A3B (AWQ-4bit), from 128K to 1M context.

docker awq long-context llm-serving vllm kimi-linear

Updated Apr 13, 2026
Python

lpalbou / model-quantizer

Star

Effortlessly quantize, benchmark, and publish Hugging Face models with cross-platform support for CPU/GPU. Reduce model size by 75% while maintaining performance.

python nlp machine-learning cross-platform optimization transformers inference pytorch quantization model-compression huggingface awq llm gptq bitsandbytes cpu-compatible

Updated Mar 15, 2025
Python

chris-colinsky / Zorac

Star

Self-hosted LLM chat client with streaming UI for vLLM servers. Run Mistral-24B locally on RTX 4090/3090. Privacy-focused ChatGPT alternative for homelab/gaming PCs. Python/Rich terminal UI.

python cli ai self-hosted chat-client homelab mistral nvidia-gpu awq llm vllm chatgpt-alternative local-llm llm-inference offline-ai consumer-gpu

Updated Feb 27, 2026
Python

stef41 / quantbenchx

Star

Quantization quality analyzer - benchmark GGUF/GPTQ/AWQ quantization accuracy.

python benchmarking quantization awq llm gptq gguf

Updated Apr 11, 2026
Python

FireStrike1010 / artificial_personality

Star

Artificial Personality is text2text AI chatbot that can use character cards

ai chatbot transformers neural-networks chatbot-framework awq tavernai

Updated May 28, 2024
Python

ronantakizawa / phi4-reasoning-awq

Star

AWQ Quantization of Microsoft/Phi-4-Reasoning

quantization awq phi-4

Updated Oct 6, 2025
Jupyter Notebook

gabrielmaialva33 / viva_tensor

Star

Pure Gleam tensor library with quantization (INT8, NF4, AWQ), Flash Attention, and 2:4 Sparsity - 7.5x memory multiplication

machine-learning sparsity deep-learning tensor quantization gleam awq llm flash-attention nf4

Updated Feb 13, 2026
Gleam

cwccie / quantnet

Star

Network-specific model quantization benchmarks — GPTQ, AWQ, GGUF on infrastructure NLP tasks

python nlp benchmarking model-quantization awq gptq gguf ai-infrastructure

Updated Feb 22, 2026
Python

ai-art-dev99 / vLLM-efficient-serving-stack

Star

Production-grade vLLM serving with an OpenAI-compatible API, per-request LoRA routing, KEDA autoscaling on Prometheus metrics, Grafana/OTel observability, and a benchmark comparing AWQ vs GPTQ vs GGUF.

grafana openai-api keda-scalers awq large-language-models vllm low-rank-adaptation vllm-serve

Updated Aug 30, 2025
Python

MayurVijayPatil / amd-llm-rocm

Star

White paper & reproducible benchmark suite for LLM inference optimization on AMD MI300X using ROCm 6.1

benchmark amd hip quantization rocm awq vllm llm-inference mi300x flashattention

Updated Apr 17, 2026
Jupyter Notebook

Improve this page

Add a description, image, and links to the awq topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the awq topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

awq

Here are 31 public repositories matching this topic...

intel / neural-compressor

ModelTC / LightCompress

hcd233 / Aris-AI-Model-Server

harleyszhang / harleyszhang.github.io

rookiemann / vllm-windows-build

AEON-7 / supergemma4-26b-abliterated-multimodal-nvfp4

ShipItAndPray / turboquant

mtecnic / research-test-Qwen3-Coder-Next-REAP-AWQ

psunlpgroup / Compression-Effects

GURPREETKAURJETHRA / Quantize-LLM-using-AWQ

neosun100 / kimi-linear-vllm-docker-serve

lpalbou / model-quantizer

chris-colinsky / Zorac

stef41 / quantbenchx

FireStrike1010 / artificial_personality

ronantakizawa / phi4-reasoning-awq

gabrielmaialva33 / viva_tensor

cwccie / quantnet

ai-art-dev99 / vLLM-efficient-serving-stack

MayurVijayPatil / amd-llm-rocm

Improve this page

Add this topic to your repo