Starred repositories
Official code for the paper Q-resafe (https://www.arxiv.org/abs/2506.20251)
repo for paper https://arxiv.org/abs/2504.13837
(AAAI 2026) First-Order Error Matters: Accurate Compensation for Quantized Large Language Models
A agent framework based on the tutorial hello-agents
[ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.
Code for "RSQ: Learning from Important Tokens Leads to Better Quantized LLMs"
Official PyTorch implementation of DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs (ICML 2025 Oral)
[NeurIPS 2025 Spotlight] A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone.
A selective knowledge distillation algorithm for efficient speculative decoders
Official implementation for BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation
Train transformer language models with reinforcement learning.
Code for data-aware compression of DeepSeek models
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …
This repository contains the training code of ParetoQ introduced in our work "ParetoQ Scaling Laws in Extremely Low-bit LLM Quantization"
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
[COLM 2025] Official PyTorch implementation of "Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models"
[ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
[ICML 2025] Official code for the paper "RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models"
Official inference framework for 1-bit LLMs
LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.
Code implementation of GPTAQ (https://arxiv.org/abs/2504.02692)
Welcome to the official repository of SINQ! A novel, fast and high-quality quantization method designed to make any Large Language Model smaller while preserving accuracy.
Awesome list for LLM quantization