# Add to PATH and reload shell:
export PATH="$HOME/.local/bin:$PATH"
source ~/.bashrc🛡️ Windows Security Notice ClaraCore may be flagged by antivirus software - this is a false positive. Complete guide to antivirus warnings | Why is this flagged?
# Quick fix - Add Windows Defender exclusion:
Add-MpPreference -ExclusionPath "$env:LOCALAPPDATA\ClaraCore"
# Or unblock the file:
Unblock-File "$env:LOCALAPPDATA\ClaraCore\claracore.exe"
# Or run troubleshooter:
curl -fsSL https://raw.githubusercontent.com/claraverse-space/ClaraCore/main/scripts/troubleshoot.ps1 | powershell✅ Verify it's safe: Check SHA256 hash against official releases or build from source.
Need help? See our Setup Guide or Container Guide
Auto-setup for llama.cpp - Point it at your GGUF models folder and get a complete AI inference server in seconds.
ClaraCore extends llama-swap with intelligent automation, bringing zero-configuration setup to llama.cpp deployments.
Linux/macOS (Recommended):
curl -fsSL https://raw.githubusercontent.com/claraverse-space/ClaraCore/main/scripts/install.sh | bashWindows:
irm https://raw.githubusercontent.com/claraverse-space/ClaraCore/main/scripts/install.ps1 | iexThen start immediately:
claracore --models-folder /path/to/your/gguf/models
# Visit: http://localhost:5800/ui/setupCUDA (NVIDIA):
docker run -d --gpus all -p 5800:5800 -v ./models:/models claracore:cuda --models-folder /modelsROCm (AMD):
docker run -d --device=/dev/kfd --device=/dev/dri -p 5800:5800 -v ./models:/models claracore:rocm --models-folder /models📦 Optimized containers: 2-3GB vs 8-12GB full SDKs. See Container Guide
✨ Features include: Auto-setup, hardware detection, binary management, and production configs!
While maintaining 100% compatibility with llama-swap, ClaraCore adds:
- 🎯 Auto-Setup Engine - Automatic GGUF detection and configuration
- 🔍 Smart Hardware Detection - CUDA/ROCm/Vulkan/Metal optimization
- ⬇️ Binary Management - Automatic llama-server downloads
- ⚙️ Intelligent Configs - Production-ready settings out of the box
- 🚀 Speculative Decoding - Automatic draft model pairing
# One command setup - that's it!
./claracore --models-folder /path/to/your/gguf/models --backend vulkan
# ClaraCore will:
# 1. Scan for GGUF files
# 2. Detect your hardware (GPUs, CUDA, etc.)
# 3. Download optimal binaries
# 4. Generate intelligent configuration
# 5. Start serving immediatelyLinux/macOS:
curl -fsSL https://raw.githubusercontent.com/claraverse-space/ClaraCore/main/scripts/install.sh | bashWindows:
irm https://raw.githubusercontent.com/claraverse-space/ClaraCore/main/scripts/install.ps1 | iexThe installer will:
- Download the latest binary for your platform
- Set up configuration files
- Add to system PATH
- Configure auto-start service (systemd/launchd/Windows Service)
# Windows
curl -L -o claracore.exe https://github.com/claraverse-space/ClaraCore/releases/latest/download/claracore-windows-amd64.exe
# Linux
curl -L -o claracore https://github.com/claraverse-space/ClaraCore/releases/latest/download/claracore-linux-amd64
chmod +x claracore
# macOS Intel
curl -L -o claracore https://github.com/claraverse-space/ClaraCore/releases/latest/download/claracore-darwin-amd64
chmod +x claracore
# macOS Apple Silicon
curl -L -o claracore https://github.com/claraverse-space/ClaraCore/releases/latest/download/claracore-darwin-arm64
chmod +x claracoregit clone https://github.com/claraverse-space/ClaraCore.git
cd ClaraCore
python build.py # Builds UI + Go backend with version info
# or: go build -o claracore .All the power of llama-swap, plus intelligent automation:
- ✅ OpenAI API compatible endpoints
- ✅ Automatic model swapping on-demand
- ✅ Multiple models with
groups - ✅ Auto-unload with
ttl - ✅ Web UI for monitoring
- ✅ Docker/Podman support
- ✅ Full llama.cpp server control
- ✅ Zero-configuration setup
- ✅ Automatic binary downloads
- ✅ Hardware capability detection
- ✅ Intelligent resource allocation
- ✅ Speculative decoding setup
- ✅ Model metadata parsing
# Just point to your models
./claracore --models-folder ~/modelsManual Setup - for devices like strix halo and others who want to customize or setup without relying on auto-detection
# Create a config file
./claracore -ram 64 -vram 24 -backend cuda
# it will download the binaries and create a config file automatically and UI will have all the features needed to manage models# List models
curl http://localhost:8080/v1/models
# Chat completion
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3-8b",
"messages": [{"role": "user", "content": "Hello!"}]
}'ClaraCore generates configurations automatically, but you can customize everything:
# Auto-generated by ClaraCore
models:
"llama-3-70b":
cmd: |
binaries/llama-server/llama-server
--model models/llama-3-70b-q4.gguf
--host 127.0.0.1 --port ${PORT}
--flash-attn auto -ngl 99
draft: "llama-3-8b" # Automatic speculative decoding
proxy: "http://127.0.0.1:${PORT}"
groups:
"large-models":
swap: true
exclusive: true
members: ["llama-3-70b", "qwen-72b"]- Complete API Documentation - Full API reference with examples
- Quick API Reference - Concise API overview
- OpenAI-Compatible Endpoints:
/v1/chat/completions,/v1/embeddings,/v1/models - Configuration Management:
/api/config/*- Manage models and settings - Model Downloads:
/api/models/download- Download from Hugging Face - System Detection:
/api/system/detection- Hardware and backend detection - Real-time Events:
/api/events- SSE stream for live updates
# Get all available models
curl http://localhost:5800/v1/models
# Update model parameters with restart prompt
curl -X POST http://localhost:5800/api/config/model/llama-3-8b \
-H "Content-Type: application/json" \
-d '{"temperature": 0.8, "max_tokens": 1024}'
# Smart configuration regeneration
curl -X POST http://localhost:5800/api/config/regenerate-from-db \
-H "Content-Type: application/json" \
-d '{"options": {"forceBackend": "vulkan", "preferredContext": 8192}}'
# Monitor setup progress
curl http://localhost:5800/api/setup/progress- Setup Wizard:
http://localhost:5800/ui/setup- Initial configuration - Model Management:
http://localhost:5800/ui/models- Chat and model controls - Configuration:
http://localhost:5800/ui/configuration- Edit settings - Downloads:
http://localhost:5800/ui/downloads- Model download manager
ClaraCore is built on llama-swap by @mostlygeek
This project extends llama-swap's excellent proxy architecture with automatic setup capabilities. Approximately 90% of the core functionality comes from the original llama-swap codebase. We're deeply grateful for @mostlygeek's work in creating such a solid foundation.
- @mostlygeek - Creator of llama-swap
- llama.cpp team - The inference engine
- Georgi Gerganov - Creator of llama.cpp
We welcome contributions! Whether it's bug fixes, new features, or documentation improvements.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
For maintainers creating releases:
# Quick release (interactive)
./release.bat # Windows
./release.sh # Linux/macOS
# Manual release
python release.py --version v0.1.1 --token-file .github_token
# Draft release for testing
python release.py --version v0.1.1 --token-file .github_token --draftSee RELEASE_MANAGEMENT.md for detailed release procedures.
MIT License - Same as llama-swap. See LICENSE for details.
Built with ❤️ by the community, for the community
Standing on the shoulders of giants