Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
-
Updated
Nov 12, 2025 - Python
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU.
🤖 Build AI agents that combine OpenAI's orchestration and Claude's execution for effective production solutions.
📊 Transform documents into a smart knowledge base using Neo4j and Azure AI for efficient, intelligent searching and answer generation.
LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.
PyTorch native quantization and sparsity for training and inference
SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
🚀 Simplify running, sharing, and shipping Hugging Face models with autopack; it quantizes and exports to multiple formats effortlessly.
Quantize TinyLlama-1.1B-Chat from PyTorch to CoreML (float16, int8, int4) for efficient on-device inference on iOS 18+.
FMS Model Optimizer is a framework for developing reduced precision neural network models.
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
A friendly CLI tool for converting and uploading transformers for CTranslate2.
A survey of modern quantization formats (e.g., MXFP8, NVFP4) and inference optimization tools (e.g., TorchAO, GemLite), illustrated through the example of Llama-3.1 inference.
Train, Evaluate, Optimize, Deploy Computer Vision Models via OpenVINO™
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.
A project that demonstrates how to deploy AI models with significant improvements, within containerized environments using Cog. Ideal for reproducible, scalable and hardware-efficient inference.
Trustworthy onboard satellite AI in PyTorch→ONNX→INT8 with calibration, telemetry, and a PhiSat-2 EO tile-filter demo.
Neural Network Compression Framework for enhanced OpenVINO™ inference
Add a description, image, and links to the quantization topic page so that developers can more easily learn about it.
To associate your repository with the quantization topic, visit your repo's landing page and select "manage topics."