quantization

Star

Here are 466 public repositories matching this topic...

hiyouga / LLaMA-Factory

Star

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Updated Nov 12, 2025
Python

intel / auto-round

Star

Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU.

transformers rounding quantization int4 vllm mxfp4 nvfp4

Updated Nov 12, 2025
Python

sivenz / hybrid-agent-framework

Star

🤖 Build AI agents that combine OpenAI's orchestration and Claude's execution for effective production solutions.

python nlp reinforcement-learning coinbase mcp quantization research-and-development autonomous-agents reranking rag vector-database ai-models hybrid-search llm deepsearch ai-memory agentic-ai-cli memory-agents

Updated Nov 12, 2025
Python

PDewangan / neo4j-agentframework

Star

📊 Transform documents into a smart knowledge base using Neo4j and Azure AI for efficient, intelligent searching and answer generation.

python docker machine-learning neo4j knowledge-graph graph-database cypher quantization semantic-search ai-agents bitnet rag github-container-registry hybrid-search azure-openai llm-inference enterprise-ai zero-build-time

Updated Nov 12, 2025
Python

ModelCloud / GPTQModel

Star

LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.

transformers quantization optimum peft vllm gptq sglang

Updated Nov 12, 2025
Python

pytorch / ao

Star

PyTorch native quantization and sparsity for training and inference

training sparsity cuda inference optimizer pytorch transformer offloading llama quantization mx brrr dtypes float8

Updated Nov 12, 2025
Python

intel / neural-compressor

Star

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

sparsity pruning quantization knowledge-distillation auto-tuning int8 low-precision quantization-aware-training post-training-quantization awq int4 large-language-models gptq smoothquant sparsegpt fp4 mxformat

Updated Nov 12, 2025
Python

cenZO00 / autopack

Star

🚀 Simplify running, sharing, and shipping Hugging Face models with autopack; it quantizes and exports to multiple formats effortlessly.

react java shell bash docker minecraft ios haskell spring-boot keycloak model sphinx cabal quantization autodiscover ant-design huggingface large-language-models

Updated Nov 12, 2025
Python

ambv231 / tinyllama-coreml-ios18-quantization

Star

Quantize TinyLlama-1.1B-Chat from PyTorch to CoreML (float16, int8, int4) for efficient on-device inference on iOS 18+.

nlp mobile ai transformers pytorch llama quantization int8 coreml on-device huggingface apple-silicon int4 llm tinyllama ios18 mlpackage

Updated Nov 12, 2025
Python

foundation-model-stack / fms-model-optimizer

Star

FMS Model Optimizer is a framework for developing reduced precision neural network models.

python machine-learning cloud framework ai optimizer quantization

Updated Nov 12, 2025
Python

vllm-project / llm-compressor

Sponsor

Star

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

sparsity compression quantization

Updated Nov 12, 2025
Python

quic / aimet

Star

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

open-source machine-learning opensource deep-neural-networks compression deep-learning pruning quantization auto-ml network-quantization network-compression

Updated Nov 11, 2025
Python

winstxnhdw / ct2hf

Sponsor

Star

A friendly CLI tool for converting and uploading transformers for CTranslate2.

transformers quantization huggingface quantisation huggingface-transformers ctranslate2

Updated Nov 12, 2025
Python

vipulSharma18 / Survey-of-Quantization-Formats

Star

A survey of modern quantization formats (e.g., MXFP8, NVFP4) and inference optimization tools (e.g., TorchAO, GemLite), illustrated through the example of Llama-3.1 inference.

quantization llama3 torchao gemlite

Updated Nov 11, 2025
Python

open-edge-platform / training_extensions

Star

Train, Evaluate, Optimize, Deploy Computer Vision Models via OpenVINO™

Updated Nov 12, 2025
Python

nunchaku-tech / nunchaku

Star

[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

flux lora quantization iclr diffusion-models mlsys comfyui genai iclr2025

Updated Nov 12, 2025
Python

Tencent / AngelSlim

Star

Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.

quantization diffusion vlm llm qwen speculative-decoding llm-compression hunyuan deepseek fp4

Updated Nov 11, 2025
Python

ParagEkbote / quantized-containerized-models

Star

A project that demonstrates how to deploy AI models with significant improvements, within containerized environments using Cog. Ideal for reproducible, scalable and hardware-efficient inference.

flux ai cog torch quantization peft huggingface diffusers bitsandbytes unsloth torchao smollm3 pruna

Updated Nov 10, 2025
Python

sylvesterkaczmarek / phisat2-trustworthy-onboard-ai

Star

Trustworthy onboard satellite AI in PyTorch→ONNX→INT8 with calibration, telemetry, and a PhiSat-2 EO tile-filter demo.

space telemetry calibration esa satellites cubesat quantization earth-observation int8 onnx edge-ai onnxruntime quantization-efficient-network satellite-security onboard-ai phisat-2 phisat2

Updated Nov 10, 2025
Python

openvinotoolkit / nncf

Star

Neural Network Compression Framework for enhanced OpenVINO™ inference

nlp sparsity compression deep-learning tensorflow transformers pytorch classification pruning object-detection quantization semantic-segmentation bert onnx openvino mixed-precision-training quantization-aware-training llm genai

Updated Nov 11, 2025
Python

Improve this page

Add a description, image, and links to the quantization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the quantization topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quantization

Here are 466 public repositories matching this topic...

hiyouga / LLaMA-Factory

intel / auto-round

sivenz / hybrid-agent-framework

PDewangan / neo4j-agentframework

ModelCloud / GPTQModel

pytorch / ao

intel / neural-compressor

cenZO00 / autopack

ambv231 / tinyllama-coreml-ios18-quantization

foundation-model-stack / fms-model-optimizer

vllm-project / llm-compressor

quic / aimet

winstxnhdw / ct2hf

vipulSharma18 / Survey-of-Quantization-Formats

open-edge-platform / training_extensions

nunchaku-tech / nunchaku

Tencent / AngelSlim

ParagEkbote / quantized-containerized-models

sylvesterkaczmarek / phisat2-trustworthy-onboard-ai

openvinotoolkit / nncf

Improve this page

Add this topic to your repo