Skip to content

yugasun/llm-api-proxy

Repository files navigation

LLM API Proxy

License: MIT Python 3.12+ LiteLLM Author

English | 简体中文

An LLM API proxy service based on LiteLLM that supports managing multiple LLM API accounts with load balancing and failover capabilities.

✨ Features

  • Multi-Account Management: Support multiple API keys for the same model with automatic load balancing
  • Alibaba Cloud Support: Full support for Qwen series models (qwen-max, qwen-plus, qwen-turbo, etc.)
  • OpenAI Compatible API: Provides standard OpenAI API interface for seamless integration
  • Automatic Failover: Automatically switches to backup accounts when one becomes unavailable
  • Request Retry: Built-in retry mechanism to improve service availability
  • Model Fallback: Support fallback strategies, e.g., fallback to qwen-plus when qwen-max is unavailable
  • Admin Dashboard: Built-in Admin UI for visual management of API keys and usage statistics

📋 Table of Contents

🚀 Quick Start

Requirements

  • Python 3.12+
  • uv (recommended) or pip
  • Docker (for PostgreSQL, optional)

1. Clone the Repository

git clone https://github.com/yugasun/llm-api-proxy.git
cd llm-api-proxy

2. Install Dependencies

# Using uv (recommended)
make install

# Or manually
uv sync

3. Configure Environment Variables

# Copy configuration template
make setup

# Edit .env file
vim .env

Key configuration items:

# Alibaba Cloud API Key (required)
# Get from: https://bailian.console.aliyun.com/
DASHSCOPE_API_KEY_1=sk-your-api-key-1
DASHSCOPE_API_KEY_2=sk-your-api-key-2  # Optional

# Admin UI Configuration (optional, required for admin dashboard)
LITELLM_MASTER_KEY=sk-your-master-key
UI_USERNAME=admin
UI_PASSWORD=admin
DATABASE_URL=postgresql://litellm:litellm@localhost:5432/litellm

4. Start the Service

# Option 1: API service only (no database required)
make start

# Option 2: With admin dashboard (requires PostgreSQL)
make db-up    # Start database
make start    # Start service

5. Verify the Service

# Test Chat Completions API
curl http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen-plus",
    "messages": [{"role": "user", "content": "Hello, please introduce yourself"}]
  }'

# Or use Makefile
make test

⚙️ Configuration

config.yaml Structure

model_list:
  # Model configuration
  - model_name: qwen-max              # Exposed model name
    litellm_params:
      model: openai/qwen-max          # Actual model (openai/ prefix for OpenAI compatible API)
      api_base: https://dashscope.aliyuncs.com/compatible-mode/v1
      api_key: os.environ/DASHSCOPE_API_KEY_1  # Read from environment variable

litellm_settings:
  request_timeout: 120    # Request timeout (seconds)
  num_retries: 3          # Number of retries
  fallbacks:              # Fallback strategy
    - qwen-max: [qwen-plus]

router_settings:
  routing_strategy: simple-shuffle  # Routing strategy

Supported Models

Model Name Description Use Case
qwen-max Qwen Max Complex tasks, most capable
qwen-plus Qwen Plus Balanced performance, speed, and cost
qwen-flash Qwen Flash/Turbo Simple tasks, fast and low cost
qwen-long Qwen Long Ultra-long document processing (10M tokens)
qwq-plus QwQ Reasoning Model Math and code reasoning
qwen-coder Qwen Coder Code generation and tool calling
deepseek-v3 DeepSeek V3 General tasks
deepseek-r1 DeepSeek R1 Reasoning tasks

📝 API Examples

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    api_key="any-key",  # Can be any value
    base_url="http://localhost:4000/v1"
)

response = client.chat.completions.create(
    model="qwen-plus",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Please introduce yourself"}
    ]
)

print(response.choices[0].message.content)

Streaming Output

from openai import OpenAI

client = OpenAI(
    api_key="any-key",
    base_url="http://localhost:4000/v1"
)

stream = client.chat.completions.create(
    model="qwen-plus",
    messages=[{"role": "user", "content": "Write a poem about spring"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

cURL

curl http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen-plus",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant"},
      {"role": "user", "content": "Hello"}
    ],
    "stream": true
  }'

⚖️ Multi-Account Load Balancing

Configure multiple entries with the same model_name to enable multi-account load balancing:

model_list:
  # Account 1
  - model_name: qwen-max
    litellm_params:
      model: openai/qwen-max
      api_base: https://dashscope.aliyuncs.com/compatible-mode/v1
      api_key: os.environ/DASHSCOPE_API_KEY_1

  # Account 2 - Automatic load balancing
  - model_name: qwen-max
    litellm_params:
      model: openai/qwen-max
      api_base: https://dashscope.aliyuncs.com/compatible-mode/v1
      api_key: os.environ/DASHSCOPE_API_KEY_2

🔄 Failover

Configure fallback strategies to automatically switch when the primary model is unavailable:

litellm_settings:
  fallbacks:
    - qwen-max: [qwen-plus, qwen-flash]  # Try qwen-plus, then qwen-flash when qwen-max fails
    - qwen-plus: [qwen-flash]

📊 Monitoring & Management

LiteLLM Proxy provides rich management features:

Feature URL Description
API Documentation http://localhost:4000/docs Swagger UI
Admin Dashboard http://localhost:4000/ui Admin Dashboard
Health Check http://localhost:4000/health Service Status
Model List http://localhost:4000/v1/models Available Models

🛠️ Common Commands

# Installation & Configuration
make install      # Install dependencies
make setup        # Copy configuration templates

# Database Management
make db-up        # Start PostgreSQL
make db-down      # Stop PostgreSQL
make db-status    # Check database status

# Service Management
make start        # Start service in foreground
make start-bg     # Start service in background
make stop         # Stop service
make restart      # Restart service
make status       # Check service status
make logs         # View service logs

# Testing & Debugging
make test         # Test API
make health       # Health check
make models       # List available models
make debug        # Start in debug mode

🔗 Links

🤝 Contributing

Contributions are welcome! Please check CONTRIBUTING.md for details.

  1. Fork this repository
  2. Create your branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Create a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

👨‍💻 Author

⭐ Star History

If this project helps you, please give it a Star ⭐️

Star History Chart

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors