4bit

Star

Here are 9 public repositories matching this topic...

IST-DASLab / marlin

Star

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

kernel quantization 4bit llm

Updated Sep 4, 2024
Python

codewithdark-git / QuantLLM

Star

QuantLLM is a Python library designed for developers, researchers, and teams who want to fine-tune and deploy large language models (LLMs) efficiently using 4-bit and 8-bit quantization techniques.

python quantum pypi transformers torch python3 pytorch quantization 8bit 4bit pypi-packages python-lambda pypi-package huggingface llm

Updated Apr 29, 2026
Python

smpanaro / apple-silicon-4bit-quant

Star

Supporting code for "LLMs for your iPhone: Whole-Tensor 4 Bit Quantization"

quantization 4bit applesilicon llm

Updated Mar 31, 2024
Python

groxaxo / GPTQ-Pro

Star

GPTQ optimized for Ampere (RTX 3090/3060) — Marlin JIT nvcc compatibility fixes included

nvidia marlin ampere 4bit pro gptq gemma4

Updated Apr 30, 2026
Python

eason69113-source / Chat-HuanHuan

Star

基于 Meta-Llama-3.1-8B-Instruct + 4-bit 量化 + QLoRA，训练与推理全程显存占用 < 9 GB，RTX 3060/4060 即可玩转。

llama lora 4bit llm modelscope qlora

Updated Dec 24, 2025
Python

Jonnnty / Wan2.2-Animate-14B-Quant-Compression

Star

Mixed-precision quantization scheme (16/8/4bit mixed quantization) for the Wan2.2-Animate-14B model. Compresses the original 35GB base model to 17GB, balancing inference performance and model size.

quantization 8bit 4bit 16bit model-compression image2video wan2 wan2-2-animate wan2-animate mixed-precision-quantization 14b-model

Updated Dec 22, 2025
Python

lessw2020 / QuantFour_AdamW_Cuda

Star

Fused 4bit AdamW in Cuda

cuda cuda-kernels 4bit quantized optimizers

Updated Feb 20, 2024
Python

werdl / 4bit

Star

a 4 bit TTL computer

ttl 4bit homebrew-computer 4bit-computer ttl-computer

Updated Oct 1, 2024
Python

KLM-corporation / ACE-Step-1.5-4Bit-Turing-Compatible

Star

ACE-Step 1.5 XL Optimized Fork: 4-Bit (INT4) Windows support + RTX 2080 Ti (Turing) stability fixes. Turing architecture compatible. Runs XL SFT on 11GB VRAM without OOM.

windows music-generation quantization turing 4bit ace-step rtx-2080ti turing-architecture

Updated Apr 23, 2026
Python

Improve this page

Add a description, image, and links to the 4bit topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the 4bit topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

4bit

Here are 9 public repositories matching this topic...

IST-DASLab / marlin

codewithdark-git / QuantLLM

smpanaro / apple-silicon-4bit-quant

groxaxo / GPTQ-Pro

eason69113-source / Chat-HuanHuan

Jonnnty / Wan2.2-Animate-14B-Quant-Compression

lessw2020 / QuantFour_AdamW_Cuda

werdl / 4bit

KLM-corporation / ACE-Step-1.5-4Bit-Turing-Compatible

Improve this page

Add this topic to your repo