awq

Here are 14 public repositories matching this topic...

intel / neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

sparsity pruning quantization knowledge-distillation auto-tuning int8 low-precision quantization-aware-training post-training-quantization awq int4 large-language-models gptq smoothquant sparsegpt fp4 mxformat

Updated Nov 12, 2025
Python

ModelTC / LightCompress

Star

[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLM, VLM, and video generation models.

benchmark deployment tool evaluation pruning quantization wan awq large-language-models llm token-pruning vllm smoothquant token-reduction mixtral internlm2 token-merging deepseek-v3

Updated Nov 10, 2025
Python

neosun100 / kimi-linear-vllm-docker-serve

Star

Dockerized vLLM serving for Kimi-Linear-48B-A3B (AWQ-4bit), from 128K to 1M context.

docker awq long-context llm-serving vllm kimi-linear

Updated Oct 31, 2025
Python

harleyszhang / harleyszhang.github.io

Star

🧗‍♂️ harleyszhang 的个人博客

blog awq llm llm-inference

Updated Oct 28, 2025
HTML

psunlpgroup / Compression-Effects

Star

Interpretation code for analyzing LLMs compression effects for the paper "When Reasoning Meets Compression: Understanding the Effects of LLMs Compression on Large Reasoning Models"

pruning quantization distillation awq llm gptq llm-compression

Updated Oct 8, 2025
Python

ronantakizawa / phi4-reasoning-awq

Star

AWQ Quantization of Microsoft/Phi-4-Reasoning

quantization awq phi-4

Updated Oct 6, 2025
Jupyter Notebook

ai-art-dev99 / vLLM-efficient-serving-stack

Star

Production-grade vLLM serving with an OpenAI-compatible API, per-request LoRA routing, KEDA autoscaling on Prometheus metrics, Grafana/OTel observability, and a benchmark comparing AWQ vs GPTQ vs GGUF.

grafana openai-api keda-scalers awq large-language-models vllm low-rank-adaptation vllm-serve

Updated Aug 30, 2025
Python

hcd233 / Aris-AI-Model-Server

Star

An OpenAI Compatible API which integrates LLM, Embedding and Reranker. 一个集成 LLM、Embedding 和 Reranker 的 OpenAI 兼容 API

ai embedding mlx reranker rag fastapi sentence-transformers awq llm vllm gptq openai-compatible-api

Updated Aug 21, 2025
Python

lpalbou / model-quantizer

Star

Effortlessly quantize, benchmark, and publish Hugging Face models with cross-platform support for CPU/GPU. Reduce model size by 75% while maintaining performance.

python nlp machine-learning cross-platform optimization transformers inference pytorch quantization model-compression huggingface awq llm gptq bitsandbytes cpu-compatible

Updated Mar 15, 2025
Python

RajVenkat20 / LLM-Optimizations-QLoRA-AWQ

Star

This project takes the Flan-T5 LLM and applies QLoRA and AWQ quantization techniques

python3 kaggle huggingface-transformers awq flan-t5 llm-inference qlora

Updated Dec 13, 2024
Python

FireStrike1010 / artificial_personality

Star

Artificial Personality is text2text AI chatbot that can use character cards

ai chatbot transformers neural-networks chatbot-framework awq tavernai

Updated May 28, 2024
Python

GURPREETKAURJETHRA / Quantize-LLM-using-AWQ

Star

Quantize LLM using AWQ

quantize awq large-language-models llms generative-ai llm-training

Updated Apr 26, 2024
Jupyter Notebook

vpgits / sdgp-ml

Star

This repository contains notebooks and resources related to the Software Development Group Project (SDGP) machine learning component. Specifically, it includes two notebooks used for creating a dataset and fine-tuning a Mistral-7B-v0.1-Instruct model.

machine-learning transformers pytorch peft awq qlora autoawq

Updated Mar 21, 2024
Jupyter Notebook

glurp / rfilter

Star

programmable filter, as posix awq, with ruby syntaxe and embeddable function

ruby bash filter plotting awq

Updated Apr 25, 2022
Ruby

Improve this page

Add a description, image, and links to the awq topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the awq topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

awq

Here are 14 public repositories matching this topic...

intel / neural-compressor

ModelTC / LightCompress

neosun100 / kimi-linear-vllm-docker-serve

harleyszhang / harleyszhang.github.io

psunlpgroup / Compression-Effects

ronantakizawa / phi4-reasoning-awq

ai-art-dev99 / vLLM-efficient-serving-stack

hcd233 / Aris-AI-Model-Server

lpalbou / model-quantizer

RajVenkat20 / LLM-Optimizations-QLoRA-AWQ

FireStrike1010 / artificial_personality

GURPREETKAURJETHRA / Quantize-LLM-using-AWQ

vpgits / sdgp-ml

glurp / rfilter

Improve this page

Add this topic to your repo