Skip to content
View FastFlowLM's full-sized avatar

Highlights

  • Pro

Block or report FastFlowLM

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
FastFlowLM/README.md

FastFlowLM Logo

⚡ FastFlowLM (FLM) — Unlock Ryzen™ AI NPUs

Run large language models — now with Vision and MoE support — on AMD Ryzen™ AI NPUs in minutes.
No GPU required. Faster and over 10× more power-efficient. Supports context lengths up to 256k tokens. Ultra-Lightweight (14 MB). Installs within 20 seconds.

📦 The only out-of-box, NPU-first runtime built exclusively for Ryzen™ AI.
🤝 Think Ollama — but deeply optimized for NPUs.
From Idle Silicon to Instant Power — FastFlowLM Makes Ryzen™ AI Shine.

FastFlowLM (FLM) supports all Ryzen™ AI Series chips with XDNA2 NPUs (Strix, Strix Halo, and Kraken).


🔗 Quick Links

🔽 Download | 📊 Benchmarks | 📦 Model List

📖 Docs | 📺 Demos | 🧪 Test Drive | 💬 Discord


🚀 Quick Start

A packaged FLM Windows installer is available here: flm-setup.exe. For more details, see the release notes.

📺 Watch the quick start video

⚠️ Ensure NPU driver is 32.0.203.258 or later (check via Task Manager→Performance→NPU or Device Manager) — Driver Download.

After installation, open PowerShell (Win + X → I). To run a model in terminal (CLI Mode):

flm run llama3.2:1b

Notes:

  • Internet access to HuggingFace is required to download the optimized model kernels.
  • Sometimes downloads from HuggingFace may get corrupted. If this happens, run flm pull <model_tag> --force (e.g. flm pull llama3.2:1b --force) to re-download and fix them.
  • By default, models are stored in: C:\Users\<USER>\Documents\flm\models\
  • During installation, you can select a different base folder (e.g., if you choose C:\Users\<USER>\flm, models will be saved under C:\Users\<USER>\flm\models\).
  • ⚠️ If HuggingFace is not accessible in your region, manually download the model (check this issue) and place it in the chosen directory.

🎉🚀 FastFlowLM (FLM) is ready — your NPU is unlocked and you can start chatting with models right away!

Open Task Manager (Ctrl + Shift + Esc). Go to the Performance tab → click NPU to monitor usage.

⚡ Quick Tips:

  • Use /verbose during a session to turn on performance reporting (toggle off with /verbose again).
  • Type /bye to exit a conversation.
  • Run flm list in PowerShell to show all available models.

To start the local server (Server Mode):

flm serve llama3.2:1b

The model tag (e.g., llama3.2:1b) sets the initial model, which is optional. If another model is requested, FastFlowLM will automatically switch to it. Local server is on port 52625 (default).

FastFlowLM Docs


📰 In the News


🧠 Local AI on NPU

FLM makes it easy to run cutting-edge LLMs (and now VLMs) locally with:

  • ⚡ Fast and low power
  • 🧰 Simple CLI and API (REST and OpenAI API)
  • 🔐 Fully private and offline

No model rewrites, no tuning — it just works.


✅ Highlights

  • Runs fully on AMD Ryzen™ AI NPU — no GPU or CPU load
  • Lightweight runtime (14 MB) — installs within 20 seconds, easy to integrate
  • Developer-first flow — like Ollama, but optimized for NPU
  • Support for long context windows — up to 256k tokens (e.g., Qwen3-4B-Thinking-2507)
  • No low-level tuning required — You focus on your app, we handle the rest

📄 License

  • All orchestration code and CLI tools are open-source under the MIT License.
  • NPU-accelerated kernels are proprietary binaries, free for non-commercial use only — see LICENSE_BINARY.txt and TERMS.md for details.
  • Non-commercial users: Please acknowledge FastFlowLM in your README/project page:
    Powered by [FastFlowLM](https://github.com/FastFlowLM/FastFlowLM)
    

For commercial use or licensing inquiries, email us: info@fastflowlm.com


💬 Have feedback/issues or want early access to our new releases? Open an issue or Join our Discord community


🙏 Acknowledgements

Popular repositories Loading

  1. FastFlowLM FastFlowLM Public

    Run LLMs on AMD Ryzen™ AI NPUs in minutes. Just like Ollama - but purpose-built and deeply optimized for the AMD NPUs.

    C++ 311 8