AI Gateway

My Blog: https://vaala.cat/posts/vibe-ai-gateway-oss/

AI Gateway

A distributed-by-design AI API gateway with a separated control-plane (master) / data-plane (agent) architecture. Provides OpenAI/Claude-compatible /v1/* relay endpoints, built-in management APIs, Web UI, and single-binary distributed deployment.

中文文档

Features

Control Plane Management — Users (groups), tokens, channels, models, and agents
Data Plane Relay — OpenAI/Claude-compatible API endpoints (/v1/chat/completions, /v1/responses, /v1/messages, etc.) with automatic cross-protocol conversion
Real-Time Config Sync — Master/agent incremental sync over WebSocket; lightweight distributed deployment with zero external dependencies
Multi-Region Routing — Route requests from region A to agents in region B, enabling cross-region load balancing and bypassing regional restrictions
Quota & Billing — Usage-based settlement and quota enforcement
Model Routing — Aggregate multiple upstream models under one name with priority/weight load balancing and error retries
Single Binary — Frontend static assets embedded; no separate web server needed

Screenshots


_{Channels — upstream provider configuration}	_{Models — per-model pricing}
_{Model Routings — priority/weight aggregation}	_{Usage Logs — per-request audit trail}
_{Billing — daily rollups by token and channel}	_{Playground — in-browser chat tester}

See all 20 screenshots →

Architecture

┌─────────────────────────────────────────────────────┐
│                   master (control plane)             │
│  ┌──────────┐  ┌──────────┐  ┌───────────────────┐ │
│  │ Admin API│  │  Web UI  │  │ Agent Sync Hub    │ │
│  │ & Auth   │  │ (embed)  │  │ (WebSocket)       │ │
│  └──────────┘  └──────────┘  └───────────────────┘ │
│  ┌──────────────────────────────────────────────┐   │
│  │         Billing & Quota Settlement           │   │
│  └──────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────┘
          │ WebSocket sync
          ▼
┌─────────────────────────────────────────────────────┐
│                   agent (data plane)                 │
│  ┌──────────────┐  ┌────────────┐  ┌────────────┐  │
│  │ /v1/* Relay  │  │ Token/Chan │  │  Usage     │  │
│  │ Endpoints    │  │ Cache      │  │  Reporter  │  │
│  └──────────────┘  └────────────┘  └────────────┘  │
└─────────────────────────────────────────────────────┘

Deployment Topologies

Topology	Pros	Cons	Use Case
Single node (master + embedded agent)	Simplest setup; one container	Shared resources; single point of failure	PoC, testing, small production
Multi-node (master + external agents)	Horizontal scaling; fault isolation; geo-distribution	Higher ops complexity; enrollment lifecycle	Medium/large production, multi-region

Quick Start

# 1. Prepare config
mkdir -p deploy data
cp config.example.yaml deploy/config.yaml
# Edit deploy/config.yaml — set jwt_secret and admin_password

# 2. Run with Docker Compose
export AI_GATEWAY_IMAGE=vaalacat/ai-gateway:latest
docker compose up -d

# 3. Access
# Web UI: http://localhost:8140
# Health: http://localhost:8140/ping

Configuration

The configuration file accepts these top-level keys:

log_level — Logging verbosity (debug, info, warn, error)
master — Control plane settings (listen address, DB, JWT, admin credentials)
agent — Data plane settings (listen address, master URL, enrollment)
runtime — Optional advanced tuning (timeouts, heartbeat, retry)

See config.example.yaml for a complete template.

Deployment

Single Node (Docker Compose)

See the Quick Start section above. Full details in docker-compose.yml.

Multi-Node (External Agents)

Generate an enrollment token from master
Configure agent with master_url and enrollment_token
Start with docker compose -f docker-compose.yml -f docker-compose.agent.yml up -d

See docker-compose.agent.yml for the overlay template.

Kubernetes

See docs/k8s-deployment.md for Kubernetes deployment guidance.

Development

# Prerequisites: Go 1.25+, Node.js 20+, pnpm

# Build (frontend + backend)
CGO_ENABLED=0 bash ./build.sh

# Run tests
CGO_ENABLED=0 go test ./... -count=1 -timeout=120s

# Frontend dev server (port 8141, proxies to :8140)
cd web && pnpm install && pnpm dev

Releasing

Releases are cut by pushing a v* git tag. GitHub Actions builds a multi-arch image (linux/amd64 + linux/arm64) and pushes it to Dockerhub.

# Stable release — also updates :latest
git tag v1.2.3
git push origin v1.2.3

# Pre-release — pushes :v1.2.3-rc1 only, does NOT update :latest
git tag v1.2.3-rc1
git push origin v1.2.3-rc1

The git tag is injected into the binary as internal/version.Version.

Contributing

See CONTRIBUTING.md for development setup, code style, and PR process.

Acknowledgments

This project supports native code (purely self-developed, supporting chat, response, and messages protocols), while other protocols are supported by the new-api channel.

It builds upon the work of the following:

new-api by @QuantumNous — the legacy channel adaptor, 50+ upstream provider constants, model-fetch protocols, and token-counting utilities are reused via github.com/QuantumNous/new-api. Without this prior work, out-of-the-box support for 50+ providers would not be feasible. Sincere thanks to the new-api maintainers and contributors.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
cmd		cmd
docs		docs
internal		internal
test		test
web		web
.gitignore		.gitignore
.ko.yaml		.ko.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README.zh.md		README.zh.md
build.sh		build.sh
config.example.yaml		config.example.yaml
docker-compose.agent.yml		docker-compose.agent.yml
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Gateway

Features

Screenshots

Architecture

Deployment Topologies

Quick Start

Configuration

Deployment

Single Node (Docker Compose)

Multi-Node (External Agents)

Kubernetes

Development

Releasing

Contributing

Acknowledgments

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Gateway

Features

Screenshots

Architecture

Deployment Topologies

Quick Start

Configuration

Deployment

Single Node (Docker Compose)

Multi-Node (External Agents)

Kubernetes

Development

Releasing

Contributing

Acknowledgments

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages