One API endpoint. Any backend. Zero configuration.
Nexus is a distributed LLM orchestrator that unifies heterogeneous inference backends behind a single, intelligent API gateway. Local first, cloud when needed.
- ๐ Auto-Discovery โ Finds LLM backends on your network via mDNS
- ๐ฏ Intelligent Routing โ Routes by model capabilities, load, and latency
- ๐ Transparent Failover โ Retries with fallback backends automatically
- ๐ OpenAI-Compatible โ Works with any OpenAI API client
- โก Zero Config โ Just run it โ works out of the box with Ollama
- ๐ Privacy Zones โ Structural enforcement prevents data from reaching cloud backends
- ๐ฐ Budget Management โ Token-aware cost tracking with automatic spend limits
- ๐ Real-time Dashboard โ Monitor backends, models, and requests in your browser
- ๐ง Quality Tracking โ Profiles backend response quality to inform routing decisions
- ๐ Embeddings API โ OpenAI-compatible
/v1/embeddingswith capability-aware routing - ๐ Request Queuing โ Holds requests when backends are busy, with priority support
- ๐ง Model Lifecycle โ Load, unload, and migrate models across backends via API
- ๐ฎ Fleet Intelligence โ Pattern analysis with pre-warming recommendations
| Backend | Status | Discovery |
|---|---|---|
| Ollama | โ Supported | mDNS (auto) |
| LM Studio | โ Supported | Static config |
| vLLM | โ Supported | Static config |
| llama.cpp | โ Supported | Static config |
| exo | โ Supported | mDNS (auto) |
| OpenAI | โ Supported | Static config |
# Install from source
cargo install --path .
# Start with auto-discovery (zero config)
nexus serve
# Or with Docker
docker run -d -p 8000:8000 leocamello/nexusOnce running, send your first request:
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "llama3:70b", "messages": [{"role": "user", "content": "Hello!"}]}'Point any OpenAI-compatible client to http://localhost:8000/v1 โ Claude Code, Continue.dev, OpenAI SDK, or plain curl.
โ Full setup guide โ installation, configuration, CLI reference, and more.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Nexus Orchestrator โ
โ - Discovers backends via mDNS โ
โ - Tracks model capabilities & quality โ
โ - Routes to best available backend โ
โ - Queues requests when backends are busy โ
โ - OpenAI-compatible API + Embeddings โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ โ โ
โผ โผ โผ โผ
โโโโโโโโโโ โโโโโโโโโโ โโโโโโโโโโ โโโโโโโโโโ
โ Ollama โ โ vLLM โ โ exo โ โ OpenAI โ
โ 7B โ โ 70B โ โ 32B โ โ cloud โ
โโโโโโโโโโ โโโโโโโโโโ โโโโโโโโโโ โโโโโโโโโโ
| Document | What you'll find | |
|---|---|---|
| ๐ | Getting Started | Installation, configuration, CLI, environment variables |
| ๐ | REST API | HTTP endpoints, X-Nexus-* headers, error responses |
| ๐ | WebSocket API | Real-time dashboard protocol |
| ๐๏ธ | Architecture | System design, module structure, data flows |
| ๐บ๏ธ | Roadmap | Feature index (F01โF23), version history, future plans |
| ๐ง | Troubleshooting | Common errors, debugging tips |
| โ | FAQ | What Nexus is (and isn't), common questions |
| ๐ค | Contributing | Dev workflow, coding standards, PR guidelines |
| ๐ | Changelog | Release history |
| ๐ | Security | Vulnerability reporting |
Apache License 2.0 โ see LICENSE for details.