Advanced quantization toolkit for LLMs and VLMs. Support for WOQ, MXFP4, NVFP4, GGUF, Adaptive Schemes and seamless integration with Transformers, vLLM, SGLang, and llm-compressor
-
Updated
Dec 23, 2025 - Python
Advanced quantization toolkit for LLMs and VLMs. Support for WOQ, MXFP4, NVFP4, GGUF, Adaptive Schemes and seamless integration with Transformers, vLLM, SGLang, and llm-compressor
LLM fine-tuning with LoRA + NVFP4/MXFP8 on NVIDIA DGX Spark (Blackwell GB10)
ARCQuant: Boosting Fine-Grained Quantization with Augmented Residual Channels for LLMs
Add a description, image, and links to the nvfp4 topic page so that developers can more easily learn about it.
To associate your repository with the nvfp4 topic, visit your repo's landing page and select "manage topics."