Local inference engine
- lemonade-router.exe - Core HTTP server executable that handles requests and LLM backend orchestration
- lemonade-server.exe - Console CLI client for terminal users that manages server lifecycle, executes commands via HTTP API
Lemonade supports 3 backends
| Backend | Model Format | Description |
|---|---|---|
| Llama.cpp | .GGUF | Uses llama.cpp's llama-server backend |
| ONNX Runtime GenAI (OGA) | .ONNX | Uses Lemonade's own ryzenai-server backend |
| FastFlowLM | .q4nx | Uses FLM's flm serve backend |
Llama.cpp backend is very similar to Ollama.
AMD Ryzen SDK and 20+GB disk space is necessary to build OONX backend.