Skip to content

VaalaCat/ai-gateway

Repository files navigation

My Blog: https://vaala.cat/posts/vibe-ai-gateway-oss/

AI Gateway

A distributed-by-design AI API gateway with a separated control-plane (master) / data-plane (agent) architecture. Provides OpenAI/Claude-compatible /v1/* relay endpoints, built-in management APIs, Web UI, and single-binary distributed deployment.

中文文档

Features

  • Control Plane Management — Users (groups), tokens, channels, models, and agents
  • Data Plane Relay — OpenAI/Claude-compatible API endpoints (/v1/chat/completions, /v1/responses, /v1/messages, etc.) with automatic cross-protocol conversion
  • Real-Time Config Sync — Master/agent incremental sync over WebSocket; lightweight distributed deployment with zero external dependencies
  • Multi-Region Routing — Route requests from region A to agents in region B, enabling cross-region load balancing and bypassing regional restrictions
  • Quota & Billing — Usage-based settlement and quota enforcement
  • Model Routing — Aggregate multiple upstream models under one name with priority/weight load balancing and error retries
  • Single Binary — Frontend static assets embedded; no separate web server needed

Screenshots

Dashboard
Channels
Channels — upstream provider configuration
Models
Models — per-model pricing
Model Routings
Model Routings — priority/weight aggregation
Usage Logs
Usage Logs — per-request audit trail
Billing
Billing — daily rollups by token and channel
Playground
Playground — in-browser chat tester

See all 20 screenshots →

Architecture

┌─────────────────────────────────────────────────────┐
│                   master (control plane)             │
│  ┌──────────┐  ┌──────────┐  ┌───────────────────┐ │
│  │ Admin API│  │  Web UI  │  │ Agent Sync Hub    │ │
│  │ & Auth   │  │ (embed)  │  │ (WebSocket)       │ │
│  └──────────┘  └──────────┘  └───────────────────┘ │
│  ┌──────────────────────────────────────────────┐   │
│  │         Billing & Quota Settlement           │   │
│  └──────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────┘
          │ WebSocket sync
          ▼
┌─────────────────────────────────────────────────────┐
│                   agent (data plane)                 │
│  ┌──────────────┐  ┌────────────┐  ┌────────────┐  │
│  │ /v1/* Relay  │  │ Token/Chan │  │  Usage     │  │
│  │ Endpoints    │  │ Cache      │  │  Reporter  │  │
│  └──────────────┘  └────────────┘  └────────────┘  │
└─────────────────────────────────────────────────────┘

Deployment Topologies

Topology Pros Cons Use Case
Single node (master + embedded agent) Simplest setup; one container Shared resources; single point of failure PoC, testing, small production
Multi-node (master + external agents) Horizontal scaling; fault isolation; geo-distribution Higher ops complexity; enrollment lifecycle Medium/large production, multi-region

Quick Start

# 1. Prepare config
mkdir -p deploy data
cp config.example.yaml deploy/config.yaml
# Edit deploy/config.yaml — set jwt_secret and admin_password

# 2. Run with Docker Compose
export AI_GATEWAY_IMAGE=vaalacat/ai-gateway:latest
docker compose up -d

# 3. Access
# Web UI: http://localhost:8140
# Health: http://localhost:8140/ping

Configuration

The configuration file accepts these top-level keys:

  • log_level — Logging verbosity (debug, info, warn, error)
  • master — Control plane settings (listen address, DB, JWT, admin credentials)
  • agent — Data plane settings (listen address, master URL, enrollment)
  • runtime — Optional advanced tuning (timeouts, heartbeat, retry)

See config.example.yaml for a complete template.

Deployment

Single Node (Docker Compose)

See the Quick Start section above. Full details in docker-compose.yml.

Multi-Node (External Agents)

  1. Generate an enrollment token from master
  2. Configure agent with master_url and enrollment_token
  3. Start with docker compose -f docker-compose.yml -f docker-compose.agent.yml up -d

See docker-compose.agent.yml for the overlay template.

Kubernetes

See docs/k8s-deployment.md for Kubernetes deployment guidance.

Development

# Prerequisites: Go 1.25+, Node.js 20+, pnpm

# Build (frontend + backend)
CGO_ENABLED=0 bash ./build.sh

# Run tests
CGO_ENABLED=0 go test ./... -count=1 -timeout=120s

# Frontend dev server (port 8141, proxies to :8140)
cd web && pnpm install && pnpm dev

Releasing

Releases are cut by pushing a v* git tag. GitHub Actions builds a multi-arch image (linux/amd64 + linux/arm64) and pushes it to Dockerhub.

# Stable release — also updates :latest
git tag v1.2.3
git push origin v1.2.3

# Pre-release — pushes :v1.2.3-rc1 only, does NOT update :latest
git tag v1.2.3-rc1
git push origin v1.2.3-rc1

The git tag is injected into the binary as internal/version.Version.

Contributing

See CONTRIBUTING.md for development setup, code style, and PR process.

Acknowledgments

This project supports native code (purely self-developed, supporting chat, response, and messages protocols), while other protocols are supported by the new-api channel.

It builds upon the work of the following:

  • new-api by @QuantumNous — the legacy channel adaptor, 50+ upstream provider constants, model-fetch protocols, and token-counting utilities are reused via github.com/QuantumNous/new-api. Without this prior work, out-of-the-box support for 50+ providers would not be feasible. Sincere thanks to the new-api maintainers and contributors.

License

MIT

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages