A distributed-by-design AI API gateway with a separated control-plane (master) / data-plane (agent) architecture. Provides OpenAI/Claude-compatible /v1/* relay endpoints, built-in management APIs, Web UI, and single-binary distributed deployment.
- Control Plane Management — Users (groups), tokens, channels, models, and agents
- Data Plane Relay — OpenAI/Claude-compatible API endpoints (
/v1/chat/completions,/v1/responses,/v1/messages, etc.) with automatic cross-protocol conversion - Real-Time Config Sync — Master/agent incremental sync over WebSocket; lightweight distributed deployment with zero external dependencies
- Multi-Region Routing — Route requests from region A to agents in region B, enabling cross-region load balancing and bypassing regional restrictions
- Quota & Billing — Usage-based settlement and quota enforcement
- Model Routing — Aggregate multiple upstream models under one name with priority/weight load balancing and error retries
- Single Binary — Frontend static assets embedded; no separate web server needed
┌─────────────────────────────────────────────────────┐
│ master (control plane) │
│ ┌──────────┐ ┌──────────┐ ┌───────────────────┐ │
│ │ Admin API│ │ Web UI │ │ Agent Sync Hub │ │
│ │ & Auth │ │ (embed) │ │ (WebSocket) │ │
│ └──────────┘ └──────────┘ └───────────────────┘ │
│ ┌──────────────────────────────────────────────┐ │
│ │ Billing & Quota Settlement │ │
│ └──────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────┘
│ WebSocket sync
▼
┌─────────────────────────────────────────────────────┐
│ agent (data plane) │
│ ┌──────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ /v1/* Relay │ │ Token/Chan │ │ Usage │ │
│ │ Endpoints │ │ Cache │ │ Reporter │ │
│ └──────────────┘ └────────────┘ └────────────┘ │
└─────────────────────────────────────────────────────┘
| Topology | Pros | Cons | Use Case |
|---|---|---|---|
| Single node (master + embedded agent) | Simplest setup; one container | Shared resources; single point of failure | PoC, testing, small production |
| Multi-node (master + external agents) | Horizontal scaling; fault isolation; geo-distribution | Higher ops complexity; enrollment lifecycle | Medium/large production, multi-region |
# 1. Prepare config
mkdir -p deploy data
cp config.example.yaml deploy/config.yaml
# Edit deploy/config.yaml — set jwt_secret and admin_password
# 2. Run with Docker Compose
export AI_GATEWAY_IMAGE=vaalacat/ai-gateway:latest
docker compose up -d
# 3. Access
# Web UI: http://localhost:8140
# Health: http://localhost:8140/pingThe configuration file accepts these top-level keys:
log_level— Logging verbosity (debug, info, warn, error)master— Control plane settings (listen address, DB, JWT, admin credentials)agent— Data plane settings (listen address, master URL, enrollment)runtime— Optional advanced tuning (timeouts, heartbeat, retry)
See config.example.yaml for a complete template.
See the Quick Start section above. Full details in docker-compose.yml.
- Generate an enrollment token from master
- Configure agent with
master_urlandenrollment_token - Start with
docker compose -f docker-compose.yml -f docker-compose.agent.yml up -d
See docker-compose.agent.yml for the overlay template.
See docs/k8s-deployment.md for Kubernetes deployment guidance.
# Prerequisites: Go 1.25+, Node.js 20+, pnpm
# Build (frontend + backend)
CGO_ENABLED=0 bash ./build.sh
# Run tests
CGO_ENABLED=0 go test ./... -count=1 -timeout=120s
# Frontend dev server (port 8141, proxies to :8140)
cd web && pnpm install && pnpm devReleases are cut by pushing a v* git tag. GitHub Actions builds a multi-arch
image (linux/amd64 + linux/arm64) and pushes it to
Dockerhub.
# Stable release — also updates :latest
git tag v1.2.3
git push origin v1.2.3
# Pre-release — pushes :v1.2.3-rc1 only, does NOT update :latest
git tag v1.2.3-rc1
git push origin v1.2.3-rc1The git tag is injected into the binary as internal/version.Version.
See CONTRIBUTING.md for development setup, code style, and PR process.
This project supports native code (purely self-developed, supporting chat, response, and messages protocols), while other protocols are supported by the new-api channel.
It builds upon the work of the following:
- new-api by @QuantumNous — the legacy channel adaptor, 50+ upstream provider constants, model-fetch protocols, and token-counting utilities are reused via
github.com/QuantumNous/new-api. Without this prior work, out-of-the-box support for 50+ providers would not be feasible. Sincere thanks to the new-api maintainers and contributors.