Skip to content
#

fp4

Here are 2 public repositories matching this topic...

From-scratch C++/CUDA LLM inference engine for the NVIDIA RTX 5090 (sm_120a). The fastest single-user inference on the 5090: faster decode than llama.cpp, at-or-ahead of vLLM on NVFP4, and the only engine running native NVFP4 on consumer Blackwell. 100% written by Claude Code.

  • Updated Jun 13, 2026
  • Cuda

Improve this page

Add a description, image, and links to the fp4 topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the fp4 topic, visit your repo's landing page and select "manage topics."

Learn more