Skip to content
View brisker's full-sized avatar

Block or report brisker

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Official code for the paper Q-resafe (https://www.arxiv.org/abs/2506.20251)

Python 15 3 Updated Jun 28, 2025

repo for paper https://arxiv.org/abs/2504.13837

Python 303 17 Updated Dec 17, 2025

(AAAI 2026) First-Order Error Matters: Accurate Compensation for Quantized Large Language Models

Python 10 Updated Nov 17, 2025

A agent framework based on the tutorial hello-agents

Python 245 67 Updated Dec 4, 2025

QeRL enables RL for 32B LLMs on a single H100 GPU.

Python 470 46 Updated Nov 27, 2025

[ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.

Python 132 17 Updated May 16, 2024

Code for "RSQ: Learning from Important Tokens Leads to Better Quantized LLMs"

Python 20 1 Updated Jun 11, 2025

Official PyTorch implementation of DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs (ICML 2025 Oral)

Python 54 7 Updated Jun 27, 2025

[NeurIPS 2025 Spotlight] A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone.

Python 41 Updated Oct 29, 2025

A selective knowledge distillation algorithm for efficient speculative decoders

31 3 Updated Nov 27, 2025

Official implementation for BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation

Python 97 5 Updated Jul 18, 2025

Train transformer language models with reinforcement learning.

Python 16,736 2,371 Updated Dec 22, 2025

Code for data-aware compression of DeepSeek models

Python 66 10 Updated Dec 11, 2025

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …

Python 11,779 1,076 Updated Dec 22, 2025

Official PyTorch implementation of QA-LoRA

Python 145 11 Updated Mar 13, 2024

This repository contains the training code of ParetoQ introduced in our work "ParetoQ Scaling Laws in Extremely Low-bit LLM Quantization"

Python 116 8 Updated Oct 15, 2025

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 64,327 7,793 Updated Dec 21, 2025

[COLM 2025] Official PyTorch implementation of "Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models"

Python 61 5 Updated Jul 8, 2025

[ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models

Python 319 25 Updated Nov 26, 2025
Python 2 1 Updated May 16, 2025
Python 5 1 Updated Jun 6, 2025
Python 11 1 Updated Apr 7, 2025
Python 5 Updated Nov 28, 2025

[ICML 2025] Official code for the paper "RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models"

Python 6 Updated May 29, 2025

Official inference framework for 1-bit LLMs

Python 24,460 1,913 Updated Jun 3, 2025

LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.

Python 940 141 Updated Dec 22, 2025

Code implementation of GPTAQ (https://arxiv.org/abs/2504.02692)

Python 79 1 Updated Jul 28, 2025

Welcome to the official repository of SINQ! A novel, fast and high-quality quantization method designed to make any Large Language Model smaller while preserving accuracy.

Python 585 50 Updated Dec 19, 2025

Awesome list for LLM quantization

Python 371 20 Updated Oct 11, 2025
Next