Skip to content

leocamello/nexus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

118 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Nexus

Rust License GitHub Release Docker Crates.io docs.rs codecov CI

One API endpoint. Any backend. Zero configuration.

Nexus is a distributed LLM orchestrator that unifies heterogeneous inference backends behind a single, intelligent API gateway. Local first, cloud when needed.

Features

  • ๐Ÿ” Auto-Discovery โ€” Finds LLM backends on your network via mDNS
  • ๐ŸŽฏ Intelligent Routing โ€” Routes by model capabilities, load, and latency
  • ๐Ÿ”„ Transparent Failover โ€” Retries with fallback backends automatically
  • ๐Ÿ”Œ OpenAI-Compatible โ€” Works with any OpenAI API client
  • โšก Zero Config โ€” Just run it โ€” works out of the box with Ollama
  • ๐Ÿ”’ Privacy Zones โ€” Structural enforcement prevents data from reaching cloud backends
  • ๐Ÿ’ฐ Budget Management โ€” Token-aware cost tracking with automatic spend limits
  • ๐Ÿ“Š Real-time Dashboard โ€” Monitor backends, models, and requests in your browser
  • ๐Ÿง  Quality Tracking โ€” Profiles backend response quality to inform routing decisions
  • ๐Ÿ“ Embeddings API โ€” OpenAI-compatible /v1/embeddings with capability-aware routing
  • ๐Ÿ“‹ Request Queuing โ€” Holds requests when backends are busy, with priority support
  • ๐Ÿ”ง Model Lifecycle โ€” Load, unload, and migrate models across backends via API
  • ๐Ÿ”ฎ Fleet Intelligence โ€” Pattern analysis with pre-warming recommendations

Supported Backends

Backend Status Discovery
Ollama โœ… Supported mDNS (auto)
LM Studio โœ… Supported Static config
vLLM โœ… Supported Static config
llama.cpp โœ… Supported Static config
exo โœ… Supported mDNS (auto)
OpenAI โœ… Supported Static config

Quick Start

# Install from source
cargo install --path .

# Start with auto-discovery (zero config)
nexus serve

# Or with Docker
docker run -d -p 8000:8000 leocamello/nexus

Once running, send your first request:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3:70b", "messages": [{"role": "user", "content": "Hello!"}]}'

Point any OpenAI-compatible client to http://localhost:8000/v1 โ€” Claude Code, Continue.dev, OpenAI SDK, or plain curl.

โ†’ Full setup guide โ€” installation, configuration, CLI reference, and more.

Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚              Nexus Orchestrator                   โ”‚
โ”‚  - Discovers backends via mDNS                   โ”‚
โ”‚  - Tracks model capabilities & quality           โ”‚
โ”‚  - Routes to best available backend              โ”‚
โ”‚  - Queues requests when backends are busy        โ”‚
โ”‚  - OpenAI-compatible API + Embeddings            โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
        โ”‚           โ”‚           โ”‚           โ”‚
        โ–ผ           โ–ผ           โ–ผ           โ–ผ
   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
   โ”‚ Ollama โ”‚  โ”‚  vLLM  โ”‚  โ”‚  exo   โ”‚  โ”‚ OpenAI โ”‚
   โ”‚  7B    โ”‚  โ”‚  70B   โ”‚  โ”‚  32B   โ”‚  โ”‚ cloud  โ”‚
   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Documentation

Document What you'll find
๐Ÿš€ Getting Started Installation, configuration, CLI, environment variables
๐Ÿ“– REST API HTTP endpoints, X-Nexus-* headers, error responses
๐Ÿ”Œ WebSocket API Real-time dashboard protocol
๐Ÿ—๏ธ Architecture System design, module structure, data flows
๐Ÿ—บ๏ธ Roadmap Feature index (F01โ€“F23), version history, future plans
๐Ÿ”ง Troubleshooting Common errors, debugging tips
โ“ FAQ What Nexus is (and isn't), common questions
๐Ÿค Contributing Dev workflow, coding standards, PR guidelines
๐Ÿ“‹ Changelog Release history
๐Ÿ”’ Security Vulnerability reporting

License

Apache License 2.0 โ€” see LICENSE for details.

Related Projects

  • exo โ€” Distributed AI inference
  • LM Studio โ€” Desktop app for local LLMs
  • Ollama โ€” Easy local LLM serving
  • vLLM โ€” High-throughput LLM serving
  • LiteLLM โ€” Cloud LLM API router

About

Distributed LLM model serving orchestrator - unified API gateway for heterogeneous inference backends

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors