Run large language models — now with Vision and MoE support — on AMD Ryzen™ AI NPUs in minutes.
No GPU required. Faster and over 10× more power-efficient. Supports context lengths up to 256k tokens. Ultra-Lightweight (14 MB). Installs within 20 seconds.
📦 The only out-of-box, NPU-first runtime built exclusively for Ryzen™ AI.
🤝 Think Ollama — but deeply optimized for NPUs.
✨ From Idle Silicon to Instant Power — FastFlowLM Makes Ryzen™ AI Shine.
FastFlowLM (FLM) supports all Ryzen™ AI Series chips with XDNA2 NPUs (Strix, Strix Halo, and Kraken).
🔽 Download | 📊 Benchmarks | 📦 Model List
📖 Docs | 📺 Demos | 🧪 Test Drive | 💬 Discord
A packaged FLM Windows installer is available here: flm-setup.exe. For more details, see the release notes.
⚠️ Ensure NPU driver is 32.0.203.258 or later (check via Task Manager→Performance→NPU or Device Manager) — Driver Download.
After installation, open PowerShell (Win + X → I
). To run a model in terminal (CLI Mode):
flm run llama3.2:1b
Notes:
- Internet access to HuggingFace is required to download the optimized model kernels.
- Sometimes downloads from HuggingFace may get corrupted. If this happens, run
flm pull <model_tag> --force
(e.g.flm pull llama3.2:1b --force
) to re-download and fix them.- By default, models are stored in:
C:\Users\<USER>\Documents\flm\models\
- During installation, you can select a different base folder (e.g., if you choose
C:\Users\<USER>\flm
, models will be saved underC:\Users\<USER>\flm\models\
).⚠️ If HuggingFace is not accessible in your region, manually download the model (check this issue) and place it in the chosen directory.
🎉🚀 FastFlowLM (FLM) is ready — your NPU is unlocked and you can start chatting with models right away!
Open Task Manager (Ctrl + Shift + Esc
). Go to the Performance tab → click NPU to monitor usage.
⚡ Quick Tips:
- Use
/verbose
during a session to turn on performance reporting (toggle off with/verbose
again).- Type
/bye
to exit a conversation.- Run
flm list
in PowerShell to show all available models.
To start the local server (Server Mode):
flm serve llama3.2:1b
The model tag (e.g.,
llama3.2:1b
) sets the initial model, which is optional. If another model is requested, FastFlowLM will automatically switch to it. Local server is on port 52625 (default).
- 10/01/2025 🎉 FLM was integrated into AMD's Lemonade Server 🍋. Watch this short demo about using FLM in Lemonade.
FLM makes it easy to run cutting-edge LLMs (and now VLMs) locally with:
- ⚡ Fast and low power
- 🧰 Simple CLI and API (REST and OpenAI API)
- 🔐 Fully private and offline
No model rewrites, no tuning — it just works.
- Runs fully on AMD Ryzen™ AI NPU — no GPU or CPU load
- Lightweight runtime (14 MB) — installs within 20 seconds, easy to integrate
- Developer-first flow — like Ollama, but optimized for NPU
- Support for long context windows — up to 256k tokens (e.g., Qwen3-4B-Thinking-2507)
- No low-level tuning required — You focus on your app, we handle the rest
- All orchestration code and CLI tools are open-source under the MIT License.
- NPU-accelerated kernels are proprietary binaries, free for non-commercial use only — see LICENSE_BINARY.txt and TERMS.md for details.
- Non-commercial users: Please acknowledge FastFlowLM in your README/project page:
Powered by [FastFlowLM](https://github.com/FastFlowLM/FastFlowLM)
For commercial use or licensing inquiries, email us: info@fastflowlm.com
💬 Have feedback/issues or want early access to our new releases? Open an issue or Join our Discord community
- Powered by the advanced AMD Ryzen™ AI NPU architecture
- Inspired by the widely adopted Ollama
- Tokenization accelerated with MLC-ai/tokenizers-cpp
- Chat formatting via Google/minja
- Low-level kernels optimized using the powerful IRON+AIE-MLIR