A native macOS desktop app for managing and interacting with llama-server instances. Built with Tauri.
- Download models from HuggingFace -- live search, automatic split-file handling, download progress with ETA
- Run multiple servers simultaneously -- each model gets its own random port, manage them all from one place
- Auto-detects llama-server -- finds the binary on your PATH or common install locations
- Downloads to the standard HuggingFace cache (
~/.cache/huggingface/hub/) -- shared withhuggingface-cli, LM Studio, and other tools - Open llama-server's built-in web UI in your browser with one click
- Configure server options -- context size, GPU layers, flash attention, parallel slots, and more
- Tune parameters with interactive sliders and presets (Creative, Balanced, Precise, Deterministic)
- Playground for completions and chat with performance metrics
- Live server log -- see model loading progress, layer offloading, and errors in real time
- Slot monitor with real-time polling
Search HuggingFace for GGUF models, download them, and start serving with a few clicks. Split models (multi-file GGUFs) are detected and downloaded as a bundle automatically.
- Live search -- type to search HuggingFace for GGUF models, see download counts and likes
- Smart file picker -- shows available quantizations with file sizes, groups split models into bundles
- Download progress -- real-time progress bar with ETA, visible from any tab
- Popular model suggestions -- quick-pick chips for Gemma, Qwen, Llama, Mistral, Phi, and more
- Multiple servers -- run several models at once, each on its own port. Connect, open in browser, or stop individually
Configure llama-server options from the Server tab. Settings apply when starting any model.
- Context & Memory -- context size, GPU layers, batch size, flash attention
- Slots & Parallelism -- parallel slots, slot monitoring, continuous batching
- Endpoints & API -- expose properties, enable metrics, listen host
- Server log -- live stderr output from the running server process
- Running servers list -- see all active servers with stop/open/connect controls
- OpenCode support – OpenCode can connect directly to your llama-server instance via the integration panel, enabling seamless model management and inference from within the OpenCode IDE.
brew tap AlexsJones/llama-panel
brew install llama-panelThis installs the .app bundle to /Applications and a llama-panel command on your PATH.
Grab the latest .tar.gz from Releases, extract it, and drag llama-panel.app to /Applications:
tar -xzf llama-panel-v*.tar.gz
mv llama-panel.app /Applications/Requires Rust and the Tauri CLI:
cargo install tauri-cli
cargo tauri buildThe .app bundle will be in target/release/bundle/macos/.
Launch from Spotlight, the Applications folder, or the command line:
llama-panel# Install Tauri CLI
cargo install tauri-cli
# Run in dev mode (hot-reload for the UI)
cargo tauri devThe frontend is vanilla HTML/CSS/JS in ui/ -- no build step required.