π Intelligent Load Balancer for LLM APIs with Full OpenAI Compatibility
COO-LLM is a high-performance reverse proxy that intelligently distributes requests across multiple LLM providers (OpenAI, Google Gemini, Anthropic Claude) and API keys. It provides seamless OpenAI API compatibility, advanced load balancing algorithms, real-time cost optimization, and enterprise-grade observability.
- π Full OpenAI API Compatibility: Drop-in replacement with identical request/response formats
- π Multi-Provider Support: OpenAI, Google Gemini, Anthropic Claude, Together AI, OpenRouter, Mistral AI, Cohere, Hugging Face, Replicate, Voyage AI, Fireworks AI, and custom providers
- π§ Intelligent Load Balancing: Advanced algorithms (Round Robin, Least Loaded, Hybrid) with real-time optimization
- π¬ Conversation History: Full support for multi-turn conversations and message history
- π Real-time Cost Tracking: Monitor and optimize API costs across all providers
- β‘ Rate Limit Management: Sliding window rate limiting with automatic key rotation
- π Performance Monitoring: Track latency, success rates, token usage, and error patterns
- π Response Caching: Configurable caching to reduce costs and improve performance
- π Extensible Architecture: Plugin system for custom providers, storage backends, and logging
- π Production Observability: Prometheus metrics, structured logging, admin API for metrics, and health checks
- βοΈ Configuration Management: YAML-based configuration with environment variable support and runtime updates
- π Security: API key masking, secure storage, environment variable isolation, and authentication controls
# Clone and build
git clone https://github.com/coo-llm/coo-llm-main.git
cd coo-llm
go build -o bin/coo-llm ./cmd/coo-llm
# Configure with environment variables
export OPENAI_API_KEY="sk-your-key"
export GEMINI_API_KEY="your-gemini-key"
# Create config file
cat > configs/config.yaml << EOF
version: "1.0"
server:
listen: ":2906"
admin_api_key: "admin-secret"
webui:
enabled: true
admin_id: "admin"
admin_password: "password"
# Build Web UI
cd webui && npm install && npm run build
# Copy build to bin directory (optional, for deployment)
cp -r webui/build bin/
llm_providers:
- id: "openai"
type: "openai"
api_keys: ["\${OPENAI_API_KEY}"]
base_url: "https://api.openai.com"
model: "gpt-4o"
pricing:
input_token_cost: 0.002
output_token_cost: 0.01
limits:
req_per_min: 200
tokens_per_min: 100000
- id: "gemini"
type: "gemini"
api_keys: ["\${GEMINI_API_KEY}"]
base_url: "https://generativelanguage.googleapis.com"
model: "gemini-1.5-pro"
pricing:
input_token_cost: 0.00025
output_token_cost: 0.0005
limits:
req_per_min: 150
tokens_per_min: 80000
- id: "together"
type: "together"
api_keys: ["\${TOGETHER_API_KEY}"]
model: "meta-llama/Llama-2-70b-chat-hf"
pricing:
input_token_cost: 0.0000002
output_token_cost: 0.0000002
limits:
req_per_min: 100
tokens_per_min: 100000
api_keys:
- key: "test-key"
allowed_providers: ["*"]
description: "Test API key"
policy:
algorithm: "hybrid"
priority: "balanced"
retry:
max_attempts: 3
timeout: "30s"
fallback:
enabled: true # Enable fallback to other providers on failure
max_providers: 2 # Max fallback providers to try
cache:
enabled: true
ttl_seconds: 10
EOF
# Run
./bin/coo-llm
# Test simple request
curl -X POST http://localhost:2906/api/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "openai:gpt-4o", "messages": [{"role": "user", "content": "Hello!"}]}'
# Test with different provider
curl -X POST http://localhost:2906/api/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "together:meta-llama/Llama-2-70b-chat-hf", "messages": [{"role": "user", "content": "Hello!"}]}'
# Test conversation history
curl -X POST http://localhost:2906/api/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gemini:gemini-1.5-pro",
"messages": [
{"role": "user", "content": "What is the capital of France?"},
{"role": "assistant", "content": "The capital of France is Paris."},
{"role": "user", "content": "What about the population?"}
]
}'COO-LLM uses a provider_id:model_name format for model specification:
provider_id:model_name
Examples:
openai:gpt-4o- GPT-4o from OpenAIgemini:gemini-1.5-pro- Gemini 1.5 Pro from Googletogether:meta-llama/Llama-2-70b-chat-hf- Llama 2 from Together AIfireworks:accounts/fireworks/models/llama-v3-8b-instruct- Llama 3 from Fireworks AI
Provider IDs correspond to the id field in your llm_providers configuration.
COO-LLM supports secure configuration management with environment variable isolation:
- Environment Variable Isolation: API keys and sensitive data are stored only in environment variables
- Config Sanitization: When saving config files, sensitive data is replaced with
${VAR_NAME}placeholders - Runtime Resolution: Environment variables are resolved at runtime, not persisted in config files
# Save current config (with sensitive data sanitized)
curl -X PUT http://localhost:2906/api/admin/v1/config \
-H "Authorization: Bearer your-admin-key" \
-d '{"server": {"listen": ":8080"}}' \
?save_path=configs/my-config.yamlllm_providers:
- id: "openai"
type: "openai"
api_keys: ["${OPENAI_API_KEY}"] # Resolved at runtime
model: "gpt-4o"
api_keys:
- key: "${API_KEY_0}" # Resolved at runtime
allowed_providers: ["*"]export OPENAI_API_KEY="sk-your-key"
export API_KEY_0="your-client-key"
export COO__ADMIN_API_KEY="admin-secret"# Build locally
docker build -t coo-llm .
# Run with local build
docker run -p 2906:2906 \
-e OPENAI_API_KEY="sk-your-key" \
-e GEMINI_API_KEY="your-gemini-key" \
-v $(pwd)/configs:/app/configs \
coo-llm
# Or use pre-built images from Docker Hub
docker run -p 2906:2906 \
-e OPENAI_API_KEY="sk-your-key" \
-v $(pwd)/configs:/app/configs \
khapu2906/coo-llm:latest
# Or use docker-compose
docker-compose up -dDocker Hub Images:
khapu2906/coo-llm:latest- Latest development buildkhapu2906/coo-llm:v1.0.0- Specific version tags
COO-LLM works seamlessly with LangChain and other OpenAI-compatible libraries:
// JavaScript/TypeScript
import { ChatOpenAI } from '@langchain/openai';
const llm = new ChatOpenAI({
modelName: 'gpt-4o',
openAIApiKey: 'dummy-key', // Ignored by COO-LLM
configuration: {
baseURL: 'http://localhost:2906/v1',
},
});
// Simple request
const response = await llm.invoke('Hello!');
// Conversation history
const messages = [
new HumanMessage('What is AI?'),
new AIMessage('AI stands for Artificial Intelligence...'),
new HumanMessage('How does it work?'),
];
const response = await llm.invoke(messages);# Python
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="gpt-4o",
openai_api_key="dummy-key", # Ignored by COO-LLM
openai_api_base="http://localhost:2906/v1"
)
response = llm.invoke("Hello!")
print(response.content)See langchain-demo/ for complete examples.
To create a new release:
-
Update CHANGELOG.md with the new version changes
-
Create and push a git tag:
# Create annotated tag git tag -a v1.0.0 -m "Release version 1.0.0" # Push tag to trigger CI/CD git push origin v1.0.0
-
CI/CD will automatically:
- β Run full test suite and build verification
- β Build multi-platform Docker images (AMD64, ARM64)
- β
Push images to Docker Hub with version and
latesttags - β Create GitHub release with Docker image information
- β Deploy updated documentation to GitHub Pages
Release Tags:
v1.0.0,v1.1.0, etc. - Version releaseslatest- Always points to the most recent release
Example Release:
# After CI/CD completes, users can:
docker pull khapu2906/coo-llm:v1.0.0
docker run -p 2906:2906 khapu2906/coo-llm:v1.0.0The CI/CD pipeline automatically builds and pushes multi-platform Docker images:
- Tags:
latest,v1.0.0, etc. - Platforms: Linux AMD64, ARM64
- Registry:
docker.io/khapu2906/coo-llm
Setup Docker Hub Access:
- Create a Docker Hub account and repository
- Generate an access token in Docker Hub settings
- Add secrets to your GitHub repository:
DOCKERHUB_USERNAME: Your Docker Hub usernameDOCKERHUB_TOKEN: Your Docker Hub access token
Update the workflow to use your Docker Hub username by replacing khapu2906 in the workflow file.
# Local development
make build # Build binary
make test # Run tests
make docker # Build Docker image
make run # Run with default config
# CI/CD triggers on:
# - Push to main/master branches
# - Pull requests to main/master
# - Git tags (v*)Complete documentation is available in the docs/content/ directory.
- Introduction: Overview and architecture
- Configuration: Complete configuration reference
- API Reference: REST API documentation
- Load Balancing: Load balancing algorithms and policies
- Deployment: Installation and production deployment
- LangChain Demo: Integration examples
- Intro: Overview, architecture, and getting started
- Guides: User guides, configuration, and deployment
- Reference: Technical API, configuration, and balancer reference
- Contributing: Development guidelines and contribution process
Client Applications (OpenAI SDK, LangChain, etc.)
β HTTP/JSON (OpenAI-compatible API)
COO-LLM Proxy
βββ πΊ API Layer (OpenAI-compatible REST API)
β βββ Chat Completions (/v1/chat/completions)
β βββ Models (/v1/models)
β βββ Admin API (/admin/v1/*)
βββ βοΈ Load Balancer (Intelligent Routing)
β βββ Round Robin, Least Loaded, Hybrid algorithms
β βββ Rate limiting & cost optimization
β βββ Real-time performance tracking
βββ π Provider Adapters
β βββ OpenAI (GPT-4, GPT-3.5)
β βββ Google Gemini (1.5 Pro, etc.)
β βββ Anthropic Claude (Opus, Sonnet)
β βββ Custom providers
βββ πΎ Storage Layer
β βββ Redis (production, with clustering)
β βββ Memory (development)
β βββ File-based (simple deployments)
β βββ HTTP (remote storage)
βββ π Observability
βββ Structured logging (JSON)
βββ Prometheus metrics
βββ Response caching
βββ Health checks
β
External LLM Providers (OpenAI, Gemini, Claude APIs)
COO-LLM uses YAML configuration with environment variable support:
version: "1.0"
# Server configuration
server:
listen: ":2906"
admin_api_key: "${ADMIN_KEY}"
# Logging configuration
logging:
file:
enabled: true
path: "./logs/coo-llm.log"
max_size_mb: 100
prometheus:
enabled: true
endpoint: "/metrics"
# LLM Providers configuration
llm_providers:
- id: "openai-prod"
type: "openai"
api_keys: ["${OPENAI_KEY_1}", "${OPENAI_KEY_2}"]
base_url: "https://api.openai.com"
model: "gpt-4o"
pricing:
input_token_cost: 0.002
output_token_cost: 0.01
limits:
req_per_min: 200
tokens_per_min: 100000
- id: "gemini-prod"
type: "gemini"
api_keys: ["${GEMINI_KEY_1}"]
base_url: "https://generativelanguage.googleapis.com"
model: "gemini-1.5-pro"
pricing:
input_token_cost: 0.00025
output_token_cost: 0.0005
limits:
req_per_min: 150
tokens_per_min: 80000
# API Key permissions (optional - if not specified, all keys have full access)
api_keys:
- key: "client-a-key"
allowed_providers: ["openai-prod"] # Only OpenAI access
description: "Client A - OpenAI only"
- key: "premium-key"
allowed_providers: ["openai-prod", "gemini-prod"] # Full access
description: "Premium client with all providers"
- key: "test-key"
allowed_providers: ["*"] # Wildcard for all providers
description: "Development key"
# Model aliases for easy reference (maps to provider_id:model)
model_aliases:
gpt-4o: openai-prod:gpt-4o
gemini-pro: gemini-prod:gemini-1.5-pro
claude-opus: claude-prod:claude-3-opus
# Load balancing policy
policy:
algorithm: "hybrid" # "round_robin", "least_loaded", "hybrid"
priority: "balanced" # "balanced", "cost", "req", "token"
retry:
max_attempts: 3
timeout: "30s"
interval: "1s"
cache:
enabled: true
ttl_seconds: 10
# Storage configuration
storage:
runtime:
type: "redis" # "memory", "redis", "file", "http"
addr: "localhost:6379"
password: "${REDIS_PASSWORD}"See Configuration Guide for complete options.
COO-LLM implements enterprise-grade security measures to protect your LLM API infrastructure:
Client Authentication: Configure API keys with granular permissions:
# In config.yaml
api_keys:
- key: "client-a-key"
allowed_providers: ["openai-prod"] # Only OpenAI access
description: "Client A limited access"
- key: "premium-key"
allowed_providers: ["openai-prod", "gemini-prod"] # Full access
description: "Premium client"
- key: "test-key"
allowed_providers: ["*"] # Wildcard for all providers
description: "Development key"Usage: Include the API key in the Authorization header:
curl -X POST http://localhost:2906/api/v1/chat/completions \
-H "Authorization: Bearer your-secure-api-key-1" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}]
}'- π API Key Management: Rotate keys regularly and use different keys for different clients
- π Access Logging: All requests are logged with client identification for audit trails
- π« Key Masking: API keys are never logged in plain text (masked in logs and admin endpoints)
- π Provider Key Security: LLM provider API keys are stored securely and never exposed
- β‘ Rate Limiting: Built-in rate limiting prevents abuse and ensures fair usage
- π‘οΈ Input Validation: All requests are validated before processing
The admin API (/admin/*) requires additional authentication:
server:
admin_api_key: "your-admin-secret"Access admin endpoints:
curl -H "Authorization: Bearer your-admin-secret" \
http://localhost:2906/api/admin/v1/configFor production deployments:
- Use HTTPS/TLS termination (nginx, cloud load balancer, etc.)
- Store API keys in secure secret management systems
- Enable audit logging and monitoring
- Regularly update and patch the system
- Use network security groups to restrict access
COO-LLM provides 100% OpenAI API compatibility:
POST /v1/chat/completions- Chat completions with conversation historyGET /v1/models- List available modelsPOST /admin/v1/config/validate- Config validation (admin)GET /admin/v1/config- Get current config (admin)GET /metrics- Prometheus metrics
- OpenAI SDKs: Python, Node.js, Go, etc.
- LangChain/LangGraph: Full integration support
- LlamaIndex: Compatible with OpenAI connector
- Any OpenAI-compatible client
- β Conversation history (messages array)
- β Streaming responses (planned)
- β Function calling (planned)
- β Token usage tracking
- β Model aliases
- β Custom parameters (temperature, top_p, etc.)
- π Load Balancing: Intelligent distribution across 3+ providers
- π° Cost Optimization: Real-time cost tracking and automatic optimization
- β‘ Rate Limiting: Sliding window rate limiting with key rotation
- π Performance: Sub-millisecond routing with comprehensive monitoring
- π Security: API key masking and secure storage
- π Observability: Prometheus metrics, structured JSON logging
We welcome contributions! Please see our Contributing Guidelines for details.
git clone https://github.com/coo-llm/coo-llm-main.git
cd coo-llm
go mod download
go build ./...
go test ./...- π New Providers: Add support for more LLM providers
- βοΈ Load Balancing: Improve routing algorithms
- π Metrics: Add more observability features
- π Security: Enhance security and authentication
- π Documentation: Improve docs and examples
This project is licensed under the DIB License v1.0 - see the LICENSE file for details.
- OpenAI for the API specification that enables interoperability
- Google & Anthropic for their excellent LLM APIs
- The Go Community for outstanding tooling and libraries
- LangChain for inspiring the integration examples
- All Contributors who help make COO-LLM better
- π GitHub Issues - Bug reports and feature requests
- π¬ Discussions - Questions and general discussion
- π Documentation - Comprehensive guides and API reference
- π§ͺ LangChain Demo - Integration examples
- π Production Ready: Used in production with millions of requests
- β‘ High Performance: Sub-millisecond routing with Go's efficiency
- π§ Easy Configuration: YAML-based config with environment variables
- π Enterprise Observability: Prometheus metrics and structured logging
- π Auto-Scaling: Horizontal scaling with Redis-backed state
- π° Cost Effective: Intelligent routing saves 20-50% on API costs
COO-LLM - The Intelligent LLM API Load Balancer π
Load balance your LLM API calls across multiple providers with OpenAI compatibility, real-time cost optimization, and enterprise-grade reliability.