LLM API Proxy

An LLM API proxy service based on LiteLLM that supports managing multiple LLM API accounts with load balancing and failover capabilities.

✨ Features

✅ Multi-Account Management: Support multiple API keys for the same model with automatic load balancing
✅ Alibaba Cloud Support: Full support for Qwen series models (qwen-max, qwen-plus, qwen-turbo, etc.)
✅ OpenAI Compatible API: Provides standard OpenAI API interface for seamless integration
✅ Automatic Failover: Automatically switches to backup accounts when one becomes unavailable
✅ Request Retry: Built-in retry mechanism to improve service availability
✅ Model Fallback: Support fallback strategies, e.g., fallback to qwen-plus when qwen-max is unavailable
✅ Admin Dashboard: Built-in Admin UI for visual management of API keys and usage statistics

📋 Table of Contents

🚀 Quick Start

Requirements

Python 3.12+
uv (recommended) or pip
Docker (for PostgreSQL, optional)

1. Clone the Repository

git clone https://github.com/yugasun/llm-api-proxy.git
cd llm-api-proxy

2. Install Dependencies

# Using uv (recommended)
make install

# Or manually
uv sync

3. Configure Environment Variables

# Copy configuration template
make setup

# Edit .env file
vim .env

Key configuration items:

# Alibaba Cloud API Key (required)
# Get from: https://bailian.console.aliyun.com/
DASHSCOPE_API_KEY_1=sk-your-api-key-1
DASHSCOPE_API_KEY_2=sk-your-api-key-2  # Optional

# Admin UI Configuration (optional, required for admin dashboard)
LITELLM_MASTER_KEY=sk-your-master-key
UI_USERNAME=admin
UI_PASSWORD=admin
DATABASE_URL=postgresql://litellm:litellm@localhost:5432/litellm

4. Start the Service

# Option 1: API service only (no database required)
make start

# Option 2: With admin dashboard (requires PostgreSQL)
make db-up    # Start database
make start    # Start service

5. Verify the Service

# Test Chat Completions API
curl http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen-plus",
    "messages": [{"role": "user", "content": "Hello, please introduce yourself"}]
  }'

# Or use Makefile
make test

⚙️ Configuration

config.yaml Structure

model_list:
  # Model configuration
  - model_name: qwen-max              # Exposed model name
    litellm_params:
      model: openai/qwen-max          # Actual model (openai/ prefix for OpenAI compatible API)
      api_base: https://dashscope.aliyuncs.com/compatible-mode/v1
      api_key: os.environ/DASHSCOPE_API_KEY_1  # Read from environment variable

litellm_settings:
  request_timeout: 120    # Request timeout (seconds)
  num_retries: 3          # Number of retries
  fallbacks:              # Fallback strategy
    - qwen-max: [qwen-plus]

router_settings:
  routing_strategy: simple-shuffle  # Routing strategy

Supported Models

Model Name	Description	Use Case
`qwen-max`	Qwen Max	Complex tasks, most capable
`qwen-plus`	Qwen Plus	Balanced performance, speed, and cost
`qwen-flash`	Qwen Flash/Turbo	Simple tasks, fast and low cost
`qwen-long`	Qwen Long	Ultra-long document processing (10M tokens)
`qwq-plus`	QwQ Reasoning Model	Math and code reasoning
`qwen-coder`	Qwen Coder	Code generation and tool calling
`deepseek-v3`	DeepSeek V3	General tasks
`deepseek-r1`	DeepSeek R1	Reasoning tasks

📝 API Examples

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    api_key="any-key",  # Can be any value
    base_url="http://localhost:4000/v1"
)

response = client.chat.completions.create(
    model="qwen-plus",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Please introduce yourself"}
    ]
)

print(response.choices[0].message.content)

Streaming Output

from openai import OpenAI

client = OpenAI(
    api_key="any-key",
    base_url="http://localhost:4000/v1"
)

stream = client.chat.completions.create(
    model="qwen-plus",
    messages=[{"role": "user", "content": "Write a poem about spring"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

cURL

curl http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen-plus",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant"},
      {"role": "user", "content": "Hello"}
    ],
    "stream": true
  }'

⚖️ Multi-Account Load Balancing

Configure multiple entries with the same model_name to enable multi-account load balancing:

model_list:
  # Account 1
  - model_name: qwen-max
    litellm_params:
      model: openai/qwen-max
      api_base: https://dashscope.aliyuncs.com/compatible-mode/v1
      api_key: os.environ/DASHSCOPE_API_KEY_1

  # Account 2 - Automatic load balancing
  - model_name: qwen-max
    litellm_params:
      model: openai/qwen-max
      api_base: https://dashscope.aliyuncs.com/compatible-mode/v1
      api_key: os.environ/DASHSCOPE_API_KEY_2

🔄 Failover

Configure fallback strategies to automatically switch when the primary model is unavailable:

litellm_settings:
  fallbacks:
    - qwen-max: [qwen-plus, qwen-flash]  # Try qwen-plus, then qwen-flash when qwen-max fails
    - qwen-plus: [qwen-flash]

📊 Monitoring & Management

LiteLLM Proxy provides rich management features:

Feature	URL	Description
API Documentation	http://localhost:4000/docs	Swagger UI
Admin Dashboard	http://localhost:4000/ui	Admin Dashboard
Health Check	http://localhost:4000/health	Service Status
Model List	http://localhost:4000/v1/models	Available Models

🛠️ Common Commands

# Installation & Configuration
make install      # Install dependencies
make setup        # Copy configuration templates

# Database Management
make db-up        # Start PostgreSQL
make db-down      # Stop PostgreSQL
make db-status    # Check database status

# Service Management
make start        # Start service in foreground
make start-bg     # Start service in background
make stop         # Stop service
make restart      # Restart service
make status       # Check service status
make logs         # View service logs

# Testing & Debugging
make test         # Test API
make health       # Health check
make models       # List available models
make debug        # Start in debug mode

🔗 Links

🤝 Contributing

Contributions are welcome! Please check CONTRIBUTING.md for details.

Fork this repository
Create your branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Create a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

👨‍💻 Author

yugasun - GitHub | Email

⭐ Star History

If this project helps you, please give it a Star ⭐️

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github		.github
demo		demo
gpustack		gpustack
litellm-pgvector @ b553f84		litellm-pgvector @ b553f84
scripts		scripts
vllm		vllm
.env.example		.env.example
.gitignore		.gitignore
.gitmodules		.gitmodules
.python-version		.python-version
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
README.zh-CN.md		README.zh-CN.md
config.example.yaml		config.example.yaml
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
run.py		run.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM API Proxy

✨ Features

📋 Table of Contents

🚀 Quick Start

Requirements

1. Clone the Repository

2. Install Dependencies

3. Configure Environment Variables

4. Start the Service

5. Verify the Service

⚙️ Configuration

config.yaml Structure

Supported Models

📝 API Examples

Python (OpenAI SDK)

Streaming Output

cURL

⚖️ Multi-Account Load Balancing

🔄 Failover

📊 Monitoring & Management

🛠️ Common Commands

🔗 Links

🤝 Contributing

📄 License

👨‍💻 Author

⭐ Star History

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM API Proxy

✨ Features

📋 Table of Contents

🚀 Quick Start

Requirements

1. Clone the Repository

2. Install Dependencies

3. Configure Environment Variables

4. Start the Service

5. Verify the Service

⚙️ Configuration

config.yaml Structure

Supported Models

📝 API Examples

Python (OpenAI SDK)

Streaming Output

cURL

⚖️ Multi-Account Load Balancing

🔄 Failover

📊 Monitoring & Management

🛠️ Common Commands

🔗 Links

🤝 Contributing

📄 License

👨‍💻 Author

⭐ Star History

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages