We provide nightly builds of llama.cpp with AMD ROCm™ 7 acceleration based on TheRock - delivering the freshest, cutting-edge builds available. Our automated pipeline specifically targets seamless integration with 🍋 Lemonade and similar AI applications requiring high-performance GPU inference.
Important
Contribution & Support Notice: While this project currently focuses on integrating llama.cpp+ROCm in a specific production context, our broader goal is to contribute meaningfully to the llama.cpp+ROCm ecosystem. We're not set up to provide comprehensive technical support, but we welcome collaborations, idea exchanges, or contributions that help advance this space.
This build specifically targets the following GPU architectures:
- gfx1151 (STX Halo APU) - Ryzen AI MAX+ Pro 395
- gfx1150 (STX Point APU) - Ryzen AI 300
- gfx120X (RDNA4 GPUs) - includes AMD Radeon RX 9070 XT/GRE/9070, RX 9060 XT/9060
- gfx110X (RDNA3 GPUs) - includes AMD Radeon PRO W7900/W7800/W7700/W7600, RX 7900 XTX/XT/GRE, RX 7800 XT, RX 7700 XT/7700, RX 7600 XT/7600
All builds include ROCm™ 7 built-in - no separate ROCm™ installation required!
Our automated GitHub Actions workflow creates nightly builds for:
- Windows and Ubuntu operating systems
- Multiple GPU targets:
gfx1151,gfx1150,gfx110X,gfx120X - ROCm™ 7 built-in - complete runtime libraries included
| GPU Target | Ubuntu | Windows |
|---|---|---|
| gfx110X | ||
| gfx1150 | ||
| gfx1151 | ||
| gfx120X |
⚡ Ready to Run: All releases include complete ROCm™ 7 runtime libraries - just download and go!
To verify your download is working correctly:
- Download the appropriate build for your GPU target from our latest releases
- Extract the archive to your preferred directory
- Test with any GGUF model from Hugging Face:
llama-server -m YOUR_GGUF_MODEL_PATH -ngl 99💡 Tip: Use
-ngl 99to offload all layers to GPU for maximum acceleration. The exact number of layers may vary by model, but 99 ensures all available layers are offloaded.
🍋 Lemonade Integration: You can also test these builds directly with Lemonade for a seamless AI application experience (coming soon!)
This project relies on the following external software and tools:
- Llama.cpp - Efficient, cross-platform inference engine for running GGUF models locally.
- ROCm SDK (TheRock) - AMD’s open-source platform for GPU-accelerated computing.
- HIP - C++ API for writing portable GPU code within the ROCm ecosystem.
- Visual Studio 2022 Build Tools - Microsoft C++ build tools
- CMake - Cross-platform build system (version 3.31.0)
- Ninja - Small build system with focus on speed
- Clang/Clang++ - C/C++ compiler (bundled with ROCm)
Note
Active Development: This project is under active development. Code and artifact structure are subject to change as we continue to improve and expand functionality.
docs/- Contains build documentation and setup guidesutils/- Houses utility scripts for build automation and dependency management- GitHub Actions Workflows - Located in
.github/workflows/(automated build pipeline) - Build Artifacts - Generated during CI/CD and published as releases
The build process is primarily handled through GitHub Actions, with the repository serving as the source for automated compilation and packaging of llama.cpp with ROCm™ 7 support.
For detailed manual build instructions, please see: docs/manual_instructions.md
This project is licensed under the MIT License - see the LICENSE file for details.