cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it
-
Updated
May 20, 2026 - Python
cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it
LLM fine-tuning with LoRA + NVFP4/MXFP8 on NVIDIA DGX Spark (Blackwell GB10)
🔧 Fine-tune large language models efficiently on NVIDIA DGX Spark with LoRA adapters and optimized quantization for high performance.
Patches + recipe to deploy festr2/MiMo-V2.5-Pro-NVFP4-MXFP8-attn-TP8 on 8-node DGX Spark sm_121 (Ray + vLLM, TP=8). Fixes the fused-qkv loader bug that mis-slotted Q values as K/V on 7 of 8 ranks.
Add a description, image, and links to the mxfp8 topic page so that developers can more easily learn about it.
To associate your repository with the mxfp8 topic, visit your repo's landing page and select "manage topics."