Skip to content

A high-performance reverse proxy that intelligently distributes requests across multiple LLM providers (OpenAI, Google Gemini, Anthropic Claude) and API keys. It provides seamless OpenAI API compatibility, advanced load balancing algorithms, real-time cost optimization, and enterprise-grade observability.

License

Notifications You must be signed in to change notification settings

COO-LLM/coo-llm-main

Repository files navigation

COO-LLM

πŸš€ Intelligent Load Balancer for LLM APIs with Full OpenAI Compatibility

COO-LLM is a high-performance reverse proxy that intelligently distributes requests across multiple LLM providers (OpenAI, Google Gemini, Anthropic Claude) and API keys. It provides seamless OpenAI API compatibility, advanced load balancing algorithms, real-time cost optimization, and enterprise-grade observability.

Go Version Docker License: DIB OpenAI Compatible

πŸš€ Features

✨ Core Capabilities

  • πŸ”„ Full OpenAI API Compatibility: Drop-in replacement with identical request/response formats
  • 🌐 Multi-Provider Support: OpenAI, Google Gemini, Anthropic Claude, Together AI, OpenRouter, Mistral AI, Cohere, Hugging Face, Replicate, Voyage AI, Fireworks AI, and custom providers
  • 🧠 Intelligent Load Balancing: Advanced algorithms (Round Robin, Least Loaded, Hybrid) with real-time optimization
  • πŸ’¬ Conversation History: Full support for multi-turn conversations and message history

πŸ’° Cost & Performance Optimization

  • πŸ“Š Real-time Cost Tracking: Monitor and optimize API costs across all providers
  • ⚑ Rate Limit Management: Sliding window rate limiting with automatic key rotation
  • πŸ“ˆ Performance Monitoring: Track latency, success rates, token usage, and error patterns
  • πŸ”„ Response Caching: Configurable caching to reduce costs and improve performance

🏒 Enterprise-Ready

  • πŸ”Œ Extensible Architecture: Plugin system for custom providers, storage backends, and logging
  • πŸ“Š Production Observability: Prometheus metrics, structured logging, admin API for metrics, and health checks
  • βš™οΈ Configuration Management: YAML-based configuration with environment variable support and runtime updates
  • πŸ”’ Security: API key masking, secure storage, environment variable isolation, and authentication controls

🏁 Quick Start

Local Development

# Clone and build
git clone https://github.com/coo-llm/coo-llm-main.git
cd coo-llm
go build -o bin/coo-llm ./cmd/coo-llm

# Configure with environment variables
export OPENAI_API_KEY="sk-your-key"
export GEMINI_API_KEY="your-gemini-key"

# Create config file
cat > configs/config.yaml << EOF
version: "1.0"

server:
  listen: ":2906"
  admin_api_key: "admin-secret"
  webui:
    enabled: true
    admin_id: "admin"
    admin_password: "password"

# Build Web UI
cd webui && npm install && npm run build

# Copy build to bin directory (optional, for deployment)
cp -r webui/build bin/

llm_providers:
  - id: "openai"
    type: "openai"
    api_keys: ["\${OPENAI_API_KEY}"]
    base_url: "https://api.openai.com"
    model: "gpt-4o"
    pricing:
      input_token_cost: 0.002
      output_token_cost: 0.01
    limits:
      req_per_min: 200
      tokens_per_min: 100000

  - id: "gemini"
    type: "gemini"
    api_keys: ["\${GEMINI_API_KEY}"]
    base_url: "https://generativelanguage.googleapis.com"
    model: "gemini-1.5-pro"
    pricing:
      input_token_cost: 0.00025
      output_token_cost: 0.0005
    limits:
      req_per_min: 150
      tokens_per_min: 80000

  - id: "together"
    type: "together"
    api_keys: ["\${TOGETHER_API_KEY}"]
    model: "meta-llama/Llama-2-70b-chat-hf"
    pricing:
      input_token_cost: 0.0000002
      output_token_cost: 0.0000002
    limits:
      req_per_min: 100
      tokens_per_min: 100000

api_keys:
  - key: "test-key"
    allowed_providers: ["*"]
    description: "Test API key"

policy:
  algorithm: "hybrid"
  priority: "balanced"
  retry:
    max_attempts: 3
    timeout: "30s"
  fallback:
    enabled: true        # Enable fallback to other providers on failure
    max_providers: 2     # Max fallback providers to try
  cache:
    enabled: true
    ttl_seconds: 10
EOF

# Run
./bin/coo-llm

# Test simple request
curl -X POST http://localhost:2906/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "openai:gpt-4o", "messages": [{"role": "user", "content": "Hello!"}]}'

# Test with different provider
curl -X POST http://localhost:2906/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "together:meta-llama/Llama-2-70b-chat-hf", "messages": [{"role": "user", "content": "Hello!"}]}'

# Test conversation history
curl -X POST http://localhost:2906/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini:gemini-1.5-pro",
    "messages": [
      {"role": "user", "content": "What is the capital of France?"},
      {"role": "assistant", "content": "The capital of France is Paris."},
      {"role": "user", "content": "What about the population?"}
    ]
  }'

πŸ“‹ Model Format

COO-LLM uses a provider_id:model_name format for model specification:

provider_id:model_name

Examples:

  • openai:gpt-4o - GPT-4o from OpenAI
  • gemini:gemini-1.5-pro - Gemini 1.5 Pro from Google
  • together:meta-llama/Llama-2-70b-chat-hf - Llama 2 from Together AI
  • fireworks:accounts/fireworks/models/llama-v3-8b-instruct - Llama 3 from Fireworks AI

Provider IDs correspond to the id field in your llm_providers configuration.

βš™οΈ Configuration Management

COO-LLM supports secure configuration management with environment variable isolation:

Security Features

  • Environment Variable Isolation: API keys and sensitive data are stored only in environment variables
  • Config Sanitization: When saving config files, sensitive data is replaced with ${VAR_NAME} placeholders
  • Runtime Resolution: Environment variables are resolved at runtime, not persisted in config files

Config Persistence

# Save current config (with sensitive data sanitized)
curl -X PUT http://localhost:2906/api/admin/v1/config \
  -H "Authorization: Bearer your-admin-key" \
  -d '{"server": {"listen": ":8080"}}' \
  ?save_path=configs/my-config.yaml

Example Config File (Sanitized)

llm_providers:
  - id: "openai"
    type: "openai"
    api_keys: ["${OPENAI_API_KEY}"]  # Resolved at runtime
    model: "gpt-4o"

api_keys:
  - key: "${API_KEY_0}"  # Resolved at runtime
    allowed_providers: ["*"]

Environment Variables

export OPENAI_API_KEY="sk-your-key"
export API_KEY_0="your-client-key"
export COO__ADMIN_API_KEY="admin-secret"

Docker

# Build locally
docker build -t coo-llm .

# Run with local build
docker run -p 2906:2906 \
  -e OPENAI_API_KEY="sk-your-key" \
  -e GEMINI_API_KEY="your-gemini-key" \
  -v $(pwd)/configs:/app/configs \
  coo-llm

# Or use pre-built images from Docker Hub
docker run -p 2906:2906 \
  -e OPENAI_API_KEY="sk-your-key" \
  -v $(pwd)/configs:/app/configs \
  khapu2906/coo-llm:latest

# Or use docker-compose
docker-compose up -d

Docker Hub Images:

  • khapu2906/coo-llm:latest - Latest development build
  • khapu2906/coo-llm:v1.0.0 - Specific version tags

🧠 LangChain Integration

COO-LLM works seamlessly with LangChain and other OpenAI-compatible libraries:

// JavaScript/TypeScript
import { ChatOpenAI } from '@langchain/openai';

const llm = new ChatOpenAI({
  modelName: 'gpt-4o',
  openAIApiKey: 'dummy-key', // Ignored by COO-LLM
  configuration: {
    baseURL: 'http://localhost:2906/v1',
  },
});

// Simple request
const response = await llm.invoke('Hello!');

// Conversation history
const messages = [
  new HumanMessage('What is AI?'),
  new AIMessage('AI stands for Artificial Intelligence...'),
  new HumanMessage('How does it work?'),
];
const response = await llm.invoke(messages);
# Python
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-4o",
    openai_api_key="dummy-key",  # Ignored by COO-LLM
    openai_api_base="http://localhost:2906/v1"
)

response = llm.invoke("Hello!")
print(response.content)

See langchain-demo/ for complete examples.

πŸš€ Releases & CI/CD

Creating Releases

To create a new release:

  1. Update CHANGELOG.md with the new version changes

  2. Create and push a git tag:

    # Create annotated tag
    git tag -a v1.0.0 -m "Release version 1.0.0"
    
    # Push tag to trigger CI/CD
    git push origin v1.0.0
  3. CI/CD will automatically:

    • βœ… Run full test suite and build verification
    • βœ… Build multi-platform Docker images (AMD64, ARM64)
    • βœ… Push images to Docker Hub with version and latest tags
    • βœ… Create GitHub release with Docker image information
    • βœ… Deploy updated documentation to GitHub Pages

Release Tags:

  • v1.0.0, v1.1.0, etc. - Version releases
  • latest - Always points to the most recent release

Example Release:

# After CI/CD completes, users can:
docker pull khapu2906/coo-llm:v1.0.0
docker run -p 2906:2906 khapu2906/coo-llm:v1.0.0

Docker Hub Integration

The CI/CD pipeline automatically builds and pushes multi-platform Docker images:

  • Tags: latest, v1.0.0, etc.
  • Platforms: Linux AMD64, ARM64
  • Registry: docker.io/khapu2906/coo-llm

Setup Docker Hub Access:

  1. Create a Docker Hub account and repository
  2. Generate an access token in Docker Hub settings
  3. Add secrets to your GitHub repository:
    • DOCKERHUB_USERNAME: Your Docker Hub username
    • DOCKERHUB_TOKEN: Your Docker Hub access token

Update the workflow to use your Docker Hub username by replacing khapu2906 in the workflow file.

Development Workflow

# Local development
make build          # Build binary
make test           # Run tests
make docker         # Build Docker image
make run            # Run with default config

# CI/CD triggers on:
# - Push to main/master branches
# - Pull requests to main/master
# - Git tags (v*)

πŸ“š Documentation

Complete documentation is available in the docs/content/ directory.

Quick Links

Documentation Structure

  • Intro: Overview, architecture, and getting started
  • Guides: User guides, configuration, and deployment
  • Reference: Technical API, configuration, and balancer reference
  • Contributing: Development guidelines and contribution process

πŸ—οΈ Architecture

Client Applications (OpenAI SDK, LangChain, etc.)
    ↓ HTTP/JSON (OpenAI-compatible API)
COO-LLM Proxy
β”œβ”€β”€ 🏺 API Layer (OpenAI-compatible REST API)
β”‚   β”œβ”€β”€ Chat Completions (/v1/chat/completions)
β”‚   β”œβ”€β”€ Models (/v1/models)
β”‚   └── Admin API (/admin/v1/*)
β”œβ”€β”€ βš–οΈ Load Balancer (Intelligent Routing)
β”‚   β”œβ”€β”€ Round Robin, Least Loaded, Hybrid algorithms
β”‚   β”œβ”€β”€ Rate limiting & cost optimization
β”‚   └── Real-time performance tracking
β”œβ”€β”€ πŸ”Œ Provider Adapters
β”‚   β”œβ”€β”€ OpenAI (GPT-4, GPT-3.5)
β”‚   β”œβ”€β”€ Google Gemini (1.5 Pro, etc.)
β”‚   β”œβ”€β”€ Anthropic Claude (Opus, Sonnet)
β”‚   └── Custom providers
β”œβ”€β”€ πŸ’Ύ Storage Layer
β”‚   β”œβ”€β”€ Redis (production, with clustering)
β”‚   β”œβ”€β”€ Memory (development)
β”‚   β”œβ”€β”€ File-based (simple deployments)
β”‚   └── HTTP (remote storage)
└── πŸ“Š Observability
    β”œβ”€β”€ Structured logging (JSON)
    β”œβ”€β”€ Prometheus metrics
    β”œβ”€β”€ Response caching
    └── Health checks
    ↓
External LLM Providers (OpenAI, Gemini, Claude APIs)

πŸ”§ Configuration

COO-LLM uses YAML configuration with environment variable support:

version: "1.0"

# Server configuration
server:
  listen: ":2906"
  admin_api_key: "${ADMIN_KEY}"

# Logging configuration
logging:
  file:
    enabled: true
    path: "./logs/coo-llm.log"
    max_size_mb: 100
  prometheus:
    enabled: true
    endpoint: "/metrics"

# LLM Providers configuration
llm_providers:
  - id: "openai-prod"
    type: "openai"
    api_keys: ["${OPENAI_KEY_1}", "${OPENAI_KEY_2}"]
    base_url: "https://api.openai.com"
    model: "gpt-4o"
    pricing:
      input_token_cost: 0.002
      output_token_cost: 0.01
    limits:
      req_per_min: 200
      tokens_per_min: 100000
  - id: "gemini-prod"
    type: "gemini"
    api_keys: ["${GEMINI_KEY_1}"]
    base_url: "https://generativelanguage.googleapis.com"
    model: "gemini-1.5-pro"
    pricing:
      input_token_cost: 0.00025
      output_token_cost: 0.0005
    limits:
      req_per_min: 150
      tokens_per_min: 80000

# API Key permissions (optional - if not specified, all keys have full access)
api_keys:
  - key: "client-a-key"
    allowed_providers: ["openai-prod"]  # Only OpenAI access
    description: "Client A - OpenAI only"
  - key: "premium-key"
    allowed_providers: ["openai-prod", "gemini-prod"]  # Full access
    description: "Premium client with all providers"
  - key: "test-key"
    allowed_providers: ["*"]  # Wildcard for all providers
    description: "Development key"

# Model aliases for easy reference (maps to provider_id:model)
model_aliases:
  gpt-4o: openai-prod:gpt-4o
  gemini-pro: gemini-prod:gemini-1.5-pro
  claude-opus: claude-prod:claude-3-opus

# Load balancing policy
policy:
  algorithm: "hybrid"  # "round_robin", "least_loaded", "hybrid"
  priority: "balanced" # "balanced", "cost", "req", "token"
  retry:
    max_attempts: 3
    timeout: "30s"
    interval: "1s"
  cache:
    enabled: true
    ttl_seconds: 10

# Storage configuration
storage:
  runtime:
    type: "redis"  # "memory", "redis", "file", "http"
    addr: "localhost:6379"
    password: "${REDIS_PASSWORD}"

See Configuration Guide for complete options.

πŸ”’ Security

COO-LLM implements enterprise-grade security measures to protect your LLM API infrastructure:

API Key Authentication

Client Authentication: Configure API keys with granular permissions:

# In config.yaml
api_keys:
  - key: "client-a-key"
    allowed_providers: ["openai-prod"]  # Only OpenAI access
    description: "Client A limited access"
  - key: "premium-key"
    allowed_providers: ["openai-prod", "gemini-prod"]  # Full access
    description: "Premium client"
  - key: "test-key"
    allowed_providers: ["*"]  # Wildcard for all providers
    description: "Development key"

Usage: Include the API key in the Authorization header:

curl -X POST http://localhost:2906/api/v1/chat/completions \
  -H "Authorization: Bearer your-secure-api-key-1" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Security Best Practices

  • πŸ” API Key Management: Rotate keys regularly and use different keys for different clients
  • πŸ“Š Access Logging: All requests are logged with client identification for audit trails
  • 🚫 Key Masking: API keys are never logged in plain text (masked in logs and admin endpoints)
  • πŸ”’ Provider Key Security: LLM provider API keys are stored securely and never exposed
  • ⚑ Rate Limiting: Built-in rate limiting prevents abuse and ensures fair usage
  • πŸ›‘οΈ Input Validation: All requests are validated before processing

Admin API Security

The admin API (/admin/*) requires additional authentication:

server:
  admin_api_key: "your-admin-secret"

Access admin endpoints:

curl -H "Authorization: Bearer your-admin-secret" \
  http://localhost:2906/api/admin/v1/config

Production Deployment

For production deployments:

  • Use HTTPS/TLS termination (nginx, cloud load balancer, etc.)
  • Store API keys in secure secret management systems
  • Enable audit logging and monitoring
  • Regularly update and patch the system
  • Use network security groups to restrict access

πŸ”— API Compatibility

COO-LLM provides 100% OpenAI API compatibility:

βœ… Supported Endpoints

  • POST /v1/chat/completions - Chat completions with conversation history
  • GET /v1/models - List available models
  • POST /admin/v1/config/validate - Config validation (admin)
  • GET /admin/v1/config - Get current config (admin)
  • GET /metrics - Prometheus metrics

βœ… Compatible Libraries

  • OpenAI SDKs: Python, Node.js, Go, etc.
  • LangChain/LangGraph: Full integration support
  • LlamaIndex: Compatible with OpenAI connector
  • Any OpenAI-compatible client

βœ… Features Supported

  • βœ… Conversation history (messages array)
  • βœ… Streaming responses (planned)
  • βœ… Function calling (planned)
  • βœ… Token usage tracking
  • βœ… Model aliases
  • βœ… Custom parameters (temperature, top_p, etc.)

πŸ“Š Key Metrics

  • πŸš€ Load Balancing: Intelligent distribution across 3+ providers
  • πŸ’° Cost Optimization: Real-time cost tracking and automatic optimization
  • ⚑ Rate Limiting: Sliding window rate limiting with key rotation
  • πŸ“ˆ Performance: Sub-millisecond routing with comprehensive monitoring
  • πŸ”’ Security: API key masking and secure storage
  • πŸ“Š Observability: Prometheus metrics, structured JSON logging

🀝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Development Setup

git clone https://github.com/coo-llm/coo-llm-main.git
cd coo-llm
go mod download
go build ./...
go test ./...

Key Areas for Contribution

  • πŸ”Œ New Providers: Add support for more LLM providers
  • βš–οΈ Load Balancing: Improve routing algorithms
  • πŸ“Š Metrics: Add more observability features
  • πŸ”’ Security: Enhance security and authentication
  • πŸ“š Documentation: Improve docs and examples

πŸ“„ License

This project is licensed under the DIB License v1.0 - see the LICENSE file for details.

πŸ™ Acknowledgments

  • OpenAI for the API specification that enables interoperability
  • Google & Anthropic for their excellent LLM APIs
  • The Go Community for outstanding tooling and libraries
  • LangChain for inspiring the integration examples
  • All Contributors who help make COO-LLM better

πŸ“ž Support & Community

πŸ† Key Highlights

  • πŸš€ Production Ready: Used in production with millions of requests
  • ⚑ High Performance: Sub-millisecond routing with Go's efficiency
  • πŸ”§ Easy Configuration: YAML-based config with environment variables
  • πŸ“Š Enterprise Observability: Prometheus metrics and structured logging
  • πŸ”„ Auto-Scaling: Horizontal scaling with Redis-backed state
  • πŸ’° Cost Effective: Intelligent routing saves 20-50% on API costs

COO-LLM - The Intelligent LLM API Load Balancer πŸš€

Load balance your LLM API calls across multiple providers with OpenAI compatibility, real-time cost optimization, and enterprise-grade reliability.

About

A high-performance reverse proxy that intelligently distributes requests across multiple LLM providers (OpenAI, Google Gemini, Anthropic Claude) and API keys. It provides seamless OpenAI API compatibility, advanced load balancing algorithms, real-time cost optimization, and enterprise-grade observability.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published