A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks …

Python 2,433 344 Updated Apr 11, 2026

NVIDIA / TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 13,338 2,271 Updated Apr 11, 2026

huggingface / optimum-quanto

A pytorch quantization backend for optimum

Python 1,035 85 Updated Apr 2, 2026

dropbox / hqq

Official implementation of Half-Quadratic Quantization (HQQ)

Python 926 90 Updated Feb 26, 2026

casper-hansen / AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Python 2,324 298 Updated May 11, 2025

Vahe1994 / AQLM

Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf and PV-Tuning: Beyond Straight-Through Estimation for Ext…

Python 1,314 195 Updated Feb 26, 2026

google-research / arxiv-latex-cleaner

arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv

Python 6,789 392 Updated Mar 27, 2026

casszhao / PruneHall

Codebase, data and models for hallucination of pruned models

Python 16 Updated Jan 11, 2025

IST-DASLab / sparsegpt

Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".

Python 877 120 Updated Aug 20, 2024

Vahe1994 / SpQR

Python 553 43 Updated Feb 8, 2026

locuslab / wanda

A simple and effective LLM pruning approach.

Python 861 128 Updated Aug 9, 2024

AutoGPTQ / AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Python 5,047 536 Updated Apr 11, 2025

huggingface / safetensors

Simple, safe way to store and distribute tensors

Python 3,694 307 Updated Apr 2, 2026

EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.

Python 12,112 3,177 Updated Apr 8, 2026

Niko-Group / paper_writing_info

106 7 Updated Oct 16, 2025

huggingface / tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Rust 10,615 1,069 Updated Apr 11, 2026

huggingface / datasets

🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools

Python 21,387 3,165 Updated Apr 10, 2026

google / sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.

C++ 11,749 1,336 Updated Apr 11, 2026

google / seqio

Task-based datasets, preprocessing, and evaluation for sequence models.

Python 593 60 Updated Mar 27, 2026

pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 99,029 27,460 Updated Apr 11, 2026

huggingface / transformers

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 159,197 32,833 Updated Apr 11, 2026

jax-ml / jax

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

Python 35,362 3,517 Updated Apr 11, 2026

google-research / t5x

Python 2,962 342 Updated Apr 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Miles Williams mlsw

Highlights

Block or report mlsw

Stars

huggingface / lighteval

sgl-project / sglang

vllm-project / vllm

pytorch / ao

cshaib / diversity

NVIDIA / Model-Optimizer