Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
-
Updated
Nov 12, 2025 - Python
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
Faster Whisper transcription with CTranslate2
Accessible large language models via k-bit quantization for PyTorch.
Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.
Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)
🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、reg…
Train, Evaluate, Optimize, Deploy Computer Vision Models via OpenVINO™
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
PyTorch native quantization and sparsity for training and inference
Easy-Translate is a script for translating large text files with a SINGLE COMMAND. Easy-Translate is designed to be as easy as possible for beginners and as seamlesscustomizable and as possible for advanced users.
PaddleSlim is an open-source library for deep model compression and architecture search.
A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
Dataflow compiler for QNN inference on FPGAs
Add a description, image, and links to the quantization topic page so that developers can more easily learn about it.
To associate your repository with the quantization topic, visit your repo's landing page and select "manage topics."