hipfire

LLM inference for AMD RDNA GPUs. Rust + HIP. Single binary. No Python in the hot path. Ollama-style UX.

hipfire pull qwen3.5:9b
hipfire run  qwen3.5:9b "What is the capital of France?"
hipfire serve -d        # background daemon, OpenAI-compatible API on :11435

Current release: v0.1.20 — engine modularization. See CHANGELOG.md.

Discord: https://discord.gg/F3BaywB8Rs

Why

llama.cpp + ROCm works on RDNA but is painful: upstream ROCm officially supports only a handful of datacenter cards; consumer RDNA is a second-class citizen. hipfire targets the entire RDNA family (RDNA1 → RDNA4, consumer + pro + APU) with a single Rust binary that ships pre-compiled kernel blobs when possible and JIT-compiles the rest through HIP. No Python, no PyTorch, no ROCm userspace stack at runtime.

Headline numbers — 7900 XTX (gfx1100)

Decode tok/s, default config (asym3 KV, FlashAttention auto):

Model	hipfire decode	hipfire prefill (peak)	vs ollama Q4_K_M
Qwen 3.5 0.8B	391	7383	2.10× decode
Qwen 3.5 4B	180	2487	1.78× decode
Qwen 3.5 9B	132	1663	1.71× decode
Qwen 3.5 27B	47	478	—

DFlash speculative decode lifts code prompts further: 218 tok/s peak on 27B HumanEval/53 (4.45× over AR), 372 tok/s peak on 9B. DFlash speedup is genre-conditional — see docs/BENCHMARKS.md for the full per-genre table and the cross-arch matrix (RDNA1 / RDNA2 / APU / MI300X).

Install

Linux with ROCm 6+:

curl -L https://raw.githubusercontent.com/Kaden-Schutt/hipfire/master/scripts/install.sh | bash

For Windows, source builds, and verifying the install: docs/GETTING_STARTED.md.

NixOS

First-class support via Nix flake. See docs/NIXOS.md.

nix develop github:Kaden-Schutt/hipfire  # dev shell with Rust + ROCm + bun
nix build github:Kaden-Schutt/hipfire    # build package

NixOS module:

{
  inputs.hipfire.url = "github:Kaden-Schutt/hipfire";
  # then in configuration.nix:
  services.hipfire.enable = true;
  services.hipfire.gpuTargets = [ "gfx1100" ];
}

Inspiration: Lucebox

hipfire's DFlash work was substantially shaped by Davide Ciffa's Lucebox DFlash on ggml — a standalone C++/ggml/CUDA DFlash for Qwen 3.5-27B on a single RTX 3090. Different stack, different vendor — but Lucebox's blog gave us concrete published numbers to target, n_gen-aware bench methodology, and pointers at where the fat is. Cached snapshot at .research-cache/lucebox-dflash27b.html for forensic reproducibility.

Inspiration: gfx906 (MI50/MI60) optimizations

hipfire's gfx906 prefill MMQ kernel and AR-decode optimizations were shaped by two community forks of llama.cpp that target Vega 20:

iacopPBK/llama.cpp-gfx906 — the original fork that ported and tuned gfx906-specific code paths (warp-cooperative GEMV via half-wave split, Y-tile prefetch via inline-asm global_load_dword, __builtin_amdgcn_readfirstlane-based SGPR hoisting, separate HBM-load → register-cache → LDS-store pipelining in the MMQ body). The "2602.01 version" commit eec153c086df6a9e7a69499bea3639597c085fff was the canonical reference we audited against.
skyne98/llama.cpp-gfx906 — fork-of-fork that propagates iacop's optimizations (commit 42c298c "port iacop optimizations") and tracks upstream more aggressively. The accompanying skyne98/wiki-gfx906 is the best public reference for gfx906 ISA quirks (LDS bank-conflict patterns at stride 32, dp4a issue-rate ceiling, Q8_1 activation layout) — we used it as a sanity-check for several PMC-driven redesign decisions.

And of course an extra shout-out to ggml-org/llama.cpp itself: the templated mmq_x body in mul_mat_q.cu was the architectural scaffold we ported to gfx906 (templated mmq_x ladder, per-thread accumulator layout, MMQ_TILE_NE_K=32 sub-block factoring, Q8_1 quantize math). The inner loop is gfx906-specific; the outer shape is descendant.

A standalone gfx906 perf investigation log is at docs/perf-checkpoints/2026-05-05-gfx906-decode-investigation.md; the prefill MMQ redesign log is at docs/perf-checkpoints/2026-05-05-gfx906-mmq-redesign-final.md.

Documentation

Page	Topic
GETTING_STARTED.md	Install, first run, what to read next
NIXOS.md	NixOS flake, module, dev shell
CLI.md	Every subcommand, flags, file locations
MODELS.md	Curated tags, BYO models, file extensions
QUANTIZE.md	`hipfire quantize` for HF / safetensors / GGUF
CONFIG.md	Every config key, env overrides
SERVE.md	OpenAI-compatible HTTP API
BENCHMARKS.md	Measured perf per arch, vs ollama
ARCHITECTURE.md	Engine layout, dispatch, two model paths
QUANTIZATION.md	MQ4 / HF4 design, asym KV cache, FWHT math
multi-gpu.md	Pipeline-parallel (pp≥2) — memory budget, deployment, refusals
methodology/perf-benchmarking.md	Bench protocol — read before claiming a perf win

License

MIT — see LICENSE.

Contributing

See CONTRIBUTING.md. Any change to kernels, quant formats, dispatch, fusion, rotation, rmsnorm, or the spec-decode path must pass ./scripts/coherence-gate-dflash.sh before commit. The canonical correctness gate is per-arch channel-test; the speed-gate catches regressions on the baseline arch. Don't bypass either with --no-verify — see methodology/perf-benchmarking.md.

Name		Name	Last commit message	Last commit date
Latest commit History 1,084 Commits
.agents/skills/astrea		.agents/skills/astrea
.githooks		.githooks
.github		.github
.skills		.skills
bench		bench
benchmarks		benchmarks
cli		cli
crates		crates
docker		docker
docs		docs
findings		findings
kernels/src		kernels/src
nix		nix
plans		plans
scripts		scripts
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
CREDITS.md		CREDITS.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
MANUAL_REVIEW.md		MANUAL_REVIEW.md
PR-115-lloyd-max-cb-plan-rev-gemini.md		PR-115-lloyd-max-cb-plan-rev-gemini.md
README.md		README.md
flake.lock		flake.lock
flake.nix		flake.nix
harness.sh		harness.sh
test-serve.sh		test-serve.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hipfire

Why

Headline numbers — 7900 XTX (gfx1100)

Install

NixOS

Inspiration: Lucebox

Inspiration: gfx906 (MI50/MI60) optimizations

Documentation

License

Contributing

About

Uh oh!

Releases 19

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

hipfire

Why

Headline numbers — 7900 XTX (gfx1100)

Install

NixOS

Inspiration: Lucebox

Inspiration: gfx906 (MI50/MI60) optimizations

Documentation

License

Contributing

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 19

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages