quantization

Star

Here are 466 public repositories matching this topic...

hiyouga / LLaMA-Factory

Star

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Updated Nov 12, 2025
Python

ymcui / Chinese-LLaMA-Alpaca

Star

中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)

nlp llama lora quantization alpaca plm pre-trained-language-models large-language-models llm llama-2 alpaca-2

Updated Jul 15, 2025
Python

SYSTRAN / faster-whisper

Star

Faster Whisper transcription with CTranslate2

deep-learning inference transformer speech-recognition openai speech-to-text quantization whisper

Updated Oct 31, 2025
Python

bitsandbytes-foundation / bitsandbytes

Sponsor

Star

Accessible large language models via k-bit quantization for PyTorch.

machine-learning pytorch quantization llm qlora

Updated Nov 4, 2025
Python

huawei-noah / Pretrained-Language-Model

Star

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

pretrained-models quantization knowledge-distillation model-compression large-scale-distributed

Updated Jan 22, 2024
Python

aaron-xichen / pytorch-playground

Star

Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)

pytorch quantization pytorch-tutorial pytorch-tutorials

Updated Nov 22, 2022
Python

huggingface / optimum

Star

🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools

training optimization intel transformers inference pytorch quantization onnx tflite onnxruntime graphcore habana

Updated Oct 31, 2025
Python

AutoGPTQ / AutoGPTQ

Star

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

nlp deep-learning transformers inference pytorch transformer quantization large-language-models llms

Updated Apr 11, 2025
Python

micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、reg…

Updated May 6, 2025
Python

open-edge-platform / training_extensions

Star

Train, Evaluate, Optimize, Deploy Computer Vision Models via OpenVINO™

Updated Nov 12, 2025
Python

IntelLabs / nlp-architect

Star

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

nlp deep-learning tensorflow nlu transformers pytorch deeplearning quantization bert dynet

Updated Nov 7, 2022
Python

quic / aimet

Star

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

open-source machine-learning opensource deep-neural-networks compression deep-learning pruning quantization auto-ml network-quantization network-compression

Updated Nov 11, 2025
Python

pytorch / ao

Star

PyTorch native quantization and sparsity for training and inference

training sparsity cuda inference optimizer pytorch transformer offloading llama quantization mx brrr dtypes float8

Updated Nov 12, 2025
Python

ikergarcia1996 / Easy-Translate

Star

Easy-Translate is a script for translating large text files with a SINGLE COMMAND. Easy-Translate is designed to be as easy as possible for beginners and as seamlesscustomizable and as possible for advanced users.

cpu translation gpu machine-translation prompt transformers pytorch easy-to-use easy quantization begginers 8-bit 4-bit huggingface-transformers hugginface llm m2m100 hugginface-hub nllb200

Updated Nov 12, 2024
Python

PaddlePaddle / PaddleSlim

Star

PaddleSlim is an open-source library for deep model compression and architecture search.

sparsity compression detection transformer segmentation pruning quantization nas bert tensorrt distillation ernie yolov5 yolov6 yolov7

Updated Oct 27, 2025
Python

tensorflow / model-optimization

Star

A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.

machine-learning sparsity compression deep-learning tensorflow optimization keras ml pruning quantization model-compression quantized-training quantized-neural-networks quantized-networks

Updated Nov 12, 2025
Python

intel / intel-extension-for-pytorch

Star

A Python package for extending the official PyTorch that can easily obtain performance on Intel platform

machine-learning deep-learning neural-network intel pytorch quantization

Updated Nov 7, 2025
Python

vllm-project / llm-compressor

Sponsor

Star

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

sparsity compression quantization

Updated Nov 12, 2025
Python

intel / neural-compressor

Star

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

sparsity pruning quantization knowledge-distillation auto-tuning int8 low-precision quantization-aware-training post-training-quantization awq int4 large-language-models gptq smoothquant sparsegpt fp4 mxformat

Updated Nov 12, 2025
Python

Xilinx / finn

Star

Dataflow compiler for QNN inference on FPGAs

fpga neural-network compiler dataflow quantization

Updated Nov 11, 2025
Python

Improve this page

Add a description, image, and links to the quantization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the quantization topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quantization

Here are 466 public repositories matching this topic...

hiyouga / LLaMA-Factory

ymcui / Chinese-LLaMA-Alpaca

SYSTRAN / faster-whisper

bitsandbytes-foundation / bitsandbytes

huawei-noah / Pretrained-Language-Model

aaron-xichen / pytorch-playground

huggingface / optimum

AutoGPTQ / AutoGPTQ

666DZY666 / micronet

open-edge-platform / training_extensions

IntelLabs / nlp-architect

quic / aimet

pytorch / ao

ikergarcia1996 / Easy-Translate

PaddlePaddle / PaddleSlim

tensorflow / model-optimization

intel / intel-extension-for-pytorch

vllm-project / llm-compressor

intel / neural-compressor

Xilinx / finn

Improve this page

Add this topic to your repo