English | 简体中文
An LLM API proxy service based on LiteLLM that supports managing multiple LLM API accounts with load balancing and failover capabilities.
- ✅ Multi-Account Management: Support multiple API keys for the same model with automatic load balancing
- ✅ Alibaba Cloud Support: Full support for Qwen series models (qwen-max, qwen-plus, qwen-turbo, etc.)
- ✅ OpenAI Compatible API: Provides standard OpenAI API interface for seamless integration
- ✅ Automatic Failover: Automatically switches to backup accounts when one becomes unavailable
- ✅ Request Retry: Built-in retry mechanism to improve service availability
- ✅ Model Fallback: Support fallback strategies, e.g., fallback to qwen-plus when qwen-max is unavailable
- ✅ Admin Dashboard: Built-in Admin UI for visual management of API keys and usage statistics
- Quick Start
- Configuration
- API Examples
- Multi-Account Load Balancing
- Failover
- Monitoring & Management
- Common Commands
- Contributing
- License
- Python 3.12+
- uv (recommended) or pip
- Docker (for PostgreSQL, optional)
git clone https://github.com/yugasun/llm-api-proxy.git
cd llm-api-proxy# Using uv (recommended)
make install
# Or manually
uv sync# Copy configuration template
make setup
# Edit .env file
vim .envKey configuration items:
# Alibaba Cloud API Key (required)
# Get from: https://bailian.console.aliyun.com/
DASHSCOPE_API_KEY_1=sk-your-api-key-1
DASHSCOPE_API_KEY_2=sk-your-api-key-2 # Optional
# Admin UI Configuration (optional, required for admin dashboard)
LITELLM_MASTER_KEY=sk-your-master-key
UI_USERNAME=admin
UI_PASSWORD=admin
DATABASE_URL=postgresql://litellm:litellm@localhost:5432/litellm# Option 1: API service only (no database required)
make start
# Option 2: With admin dashboard (requires PostgreSQL)
make db-up # Start database
make start # Start service# Test Chat Completions API
curl http://localhost:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-plus",
"messages": [{"role": "user", "content": "Hello, please introduce yourself"}]
}'
# Or use Makefile
make testmodel_list:
# Model configuration
- model_name: qwen-max # Exposed model name
litellm_params:
model: openai/qwen-max # Actual model (openai/ prefix for OpenAI compatible API)
api_base: https://dashscope.aliyuncs.com/compatible-mode/v1
api_key: os.environ/DASHSCOPE_API_KEY_1 # Read from environment variable
litellm_settings:
request_timeout: 120 # Request timeout (seconds)
num_retries: 3 # Number of retries
fallbacks: # Fallback strategy
- qwen-max: [qwen-plus]
router_settings:
routing_strategy: simple-shuffle # Routing strategy| Model Name | Description | Use Case |
|---|---|---|
qwen-max |
Qwen Max | Complex tasks, most capable |
qwen-plus |
Qwen Plus | Balanced performance, speed, and cost |
qwen-flash |
Qwen Flash/Turbo | Simple tasks, fast and low cost |
qwen-long |
Qwen Long | Ultra-long document processing (10M tokens) |
qwq-plus |
QwQ Reasoning Model | Math and code reasoning |
qwen-coder |
Qwen Coder | Code generation and tool calling |
deepseek-v3 |
DeepSeek V3 | General tasks |
deepseek-r1 |
DeepSeek R1 | Reasoning tasks |
from openai import OpenAI
client = OpenAI(
api_key="any-key", # Can be any value
base_url="http://localhost:4000/v1"
)
response = client.chat.completions.create(
model="qwen-plus",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Please introduce yourself"}
]
)
print(response.choices[0].message.content)from openai import OpenAI
client = OpenAI(
api_key="any-key",
base_url="http://localhost:4000/v1"
)
stream = client.chat.completions.create(
model="qwen-plus",
messages=[{"role": "user", "content": "Write a poem about spring"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")curl http://localhost:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-plus",
"messages": [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Hello"}
],
"stream": true
}'Configure multiple entries with the same model_name to enable multi-account load balancing:
model_list:
# Account 1
- model_name: qwen-max
litellm_params:
model: openai/qwen-max
api_base: https://dashscope.aliyuncs.com/compatible-mode/v1
api_key: os.environ/DASHSCOPE_API_KEY_1
# Account 2 - Automatic load balancing
- model_name: qwen-max
litellm_params:
model: openai/qwen-max
api_base: https://dashscope.aliyuncs.com/compatible-mode/v1
api_key: os.environ/DASHSCOPE_API_KEY_2Configure fallback strategies to automatically switch when the primary model is unavailable:
litellm_settings:
fallbacks:
- qwen-max: [qwen-plus, qwen-flash] # Try qwen-plus, then qwen-flash when qwen-max fails
- qwen-plus: [qwen-flash]LiteLLM Proxy provides rich management features:
| Feature | URL | Description |
|---|---|---|
| API Documentation | http://localhost:4000/docs | Swagger UI |
| Admin Dashboard | http://localhost:4000/ui | Admin Dashboard |
| Health Check | http://localhost:4000/health | Service Status |
| Model List | http://localhost:4000/v1/models | Available Models |
# Installation & Configuration
make install # Install dependencies
make setup # Copy configuration templates
# Database Management
make db-up # Start PostgreSQL
make db-down # Stop PostgreSQL
make db-status # Check database status
# Service Management
make start # Start service in foreground
make start-bg # Start service in background
make stop # Stop service
make restart # Restart service
make status # Check service status
make logs # View service logs
# Testing & Debugging
make test # Test API
make health # Health check
make models # List available models
make debug # Start in debug modeContributions are welcome! Please check CONTRIBUTING.md for details.
- Fork this repository
- Create your branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Create a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
If this project helps you, please give it a Star ⭐️