anthropic-proxy exposes an Anthropic-compatible /v1/messages API to clients such as Claude Code, then forwards requests to any OpenAI-compatible backend.
It is designed for the practical case where your client always asks for Claude-style model names, but you want to force everything to a model you control from .env.
Claude Code / Anthropic SDK
|
| POST /v1/messages
v
anthropic-proxy
|
| POST /v1/chat/completions
v
OpenAI-compatible upstream
Supported upstreams include:
- OpenAI
- NVIDIA NIM
- Ollama
- vLLM
- LM Studio
- Groq
- DeepSeek
- OpenRouter
- LiteLLM
By default, the proxy now treats DEFAULT_MODEL as the model to actually use upstream.
That means:
- Claude Code may request
claude-opus,claude-sonnet, or anything else. - The proxy will still send your
.envmodel upstream whenFORCE_MODEL=1. - You can change the upstream model by editing
.env. - Most config changes hot reload automatically without restarting the process.
- Anthropic-compatible
POST /v1/messages - Anthropic-compatible
POST /v1/messages/count_tokens GET /healthz- Sync and streaming support
- Tool call conversion between Anthropic and OpenAI formats
- Image block to
image_urlconversion - Optional client auth with
PROXY_CLIENT_KEY - Force-model mode via
.env - Hot reload for request-time config
CGO_ENABLED=0 go build -ldflags="-s -w" -o anthropic-proxy .Or:
make build# Default .env
./anthropic-proxy
# Custom config path
./anthropic-proxy /path/to/config.env
# Help
./anthropic-proxy --helpCross-compile examples:
GOOS=linux GOARCH=amd64 go build -o dist/anthropic-proxy-linux-amd64 .
GOOS=darwin GOARCH=arm64 go build -o dist/anthropic-proxy-darwin-arm64 .
GOOS=windows GOARCH=amd64 go build -o dist/anthropic-proxy-windows-amd64.exe .- Create your local env file:
cp .env.example .env-
Edit
.env. -
Start the proxy:
./anthropic-proxy- Point Claude Code to the proxy:
export ANTHROPIC_BASE_URL=http://localhost:8787
export ANTHROPIC_API_KEY=anything
claudeIf you set PROXY_CLIENT_KEY, use that value instead of anything.
This is the simplest setup path if you want to use the prebuilt Windows binary from the GitHub release.
Download anthropic-proxy-windows-amd64.exe from the latest release and place it in a folder such as:
C:\tools\anthropic-proxy\
Example layout:
C:\tools\anthropic-proxy\
anthropic-proxy-windows-amd64.exe
.env
You can rename the binary if you want:
anthropic-proxy-windows-amd64.exe -> anthropic-proxy.exe
Open PowerShell in the same folder and create .env from the example:
Copy-Item .env.example .envIf you downloaded only the binary, create .env manually. Minimal NVIDIA example:
UPSTREAM_URL=https://integrate.api.nvidia.com/v1/chat/completions
UPSTREAM_API_KEY=nvapi-...
DEFAULT_MODEL=z-ai/glm-5.1
FORCE_MODEL=1
LISTEN_ADDR=:8787
REQUEST_TIMEOUT_SEC=600
DEBUG=1From PowerShell:
.\anthropic-proxy.exeIf your file still has the release name:
.\anthropic-proxy-windows-amd64.exeYou should see startup logs similar to:
anthropic-proxy
listen : :8787
upstream : https://integrate.api.nvidia.com/v1/chat/completions
default : z-ai/glm-5.1
force : true
In another PowerShell window:
Invoke-WebRequest http://127.0.0.1:8787/healthz | Select-Object -ExpandProperty ContentExpected output:
ok
You can also inspect the active config:
Invoke-WebRequest http://127.0.0.1:8787/ | Select-Object -ExpandProperty ContentFor the current PowerShell session only:
$env:ANTHROPIC_BASE_URL = "http://127.0.0.1:8787"
$env:ANTHROPIC_API_KEY = "anything"
claudeIf you enabled proxy auth:
$env:ANTHROPIC_BASE_URL = "http://127.0.0.1:8787"
$env:ANTHROPIC_API_KEY = "my-local-proxy-key"
claudeFor persistent user-level environment variables in PowerShell:
[Environment]::SetEnvironmentVariable("ANTHROPIC_BASE_URL", "http://127.0.0.1:8787", "User")
[Environment]::SetEnvironmentVariable("ANTHROPIC_API_KEY", "anything", "User")Then open a new terminal before running:
claudeIf you do not want to keep one PowerShell window open, you can start the proxy in a separate process:
Start-Process -FilePath ".\anthropic-proxy.exe" -WorkingDirectory (Get-Location)If you want log files:
Start-Process -FilePath ".\anthropic-proxy.exe" `
-WorkingDirectory (Get-Location) `
-RedirectStandardOutput ".\proxy.stdout.log" `
-RedirectStandardError ".\proxy.stderr.log"Most config changes are hot reloaded from .env. For example, you can change:
DEFAULT_MODEL=z-ai/glm-5.1to:
DEFAULT_MODEL=meta/llama-3.1-8b-instructThen hit:
Invoke-WebRequest http://127.0.0.1:8787/ | Select-Object -ExpandProperty Contentand the new model should appear without restarting the proxy.
- If PowerShell says the file cannot be found, make sure you are in the correct folder and use
.\anthropic-proxy.exe. - If Windows SmartScreen warns about the binary, use
More infoand thenRun anywayif you trust the release you downloaded. - If port
8787is already in use, changeLISTEN_ADDRin.envto another port such as:8788, then restart the proxy. - If Claude Code cannot connect, confirm
http://127.0.0.1:8787/healthzreturnsokfirst. - If requests hang, test the upstream directly before blaming the proxy.
- If you changed
LISTEN_ADDR, restart is required. That one setting is not hot reloaded. - If you use Command Prompt instead of PowerShell, session variables are:
set ANTHROPIC_BASE_URL=http://127.0.0.1:8787
set ANTHROPIC_API_KEY=anything
claudeWhen FORCE_MODEL=1, every incoming model is replaced with DEFAULT_MODEL.
Example:
Incoming request:
{
"model": "claude-sonnet-4",
"max_tokens": 256,
"messages": [
{ "role": "user", "content": "hello" }
]
}Upstream request becomes:
{
"model": "z-ai/glm-5.1",
"messages": [
{ "role": "user", "content": "hello" }
]
}if .env contains:
DEFAULT_MODEL=z-ai/glm-5.1
FORCE_MODEL=1If you set FORCE_MODEL=0, the proxy uses:
- exact match in
MODEL_MAP - prefix match in
MODEL_MAP DEFAULT_MODELas fallback
Example:
FORCE_MODEL=0
DEFAULT_MODEL=meta/llama-3.1-8b-instruct
MODEL_MAP={"claude-opus":"meta/llama-3.1-70b-instruct","claude-sonnet":"meta/llama-3.1-8b-instruct"}The proxy checks .env on requests and reloads it automatically when the file changes.
Hot-reloaded settings:
UPSTREAM_URLUPSTREAM_API_KEYDEFAULT_MODELFORCE_MODELMODEL_MAPPROXY_CLIENT_KEYREQUEST_TIMEOUT_SECDEBUGRATE_LIMITRATE_LIMIT_WINDOW_SEC
Not hot-reloaded:
LISTEN_ADDR- Config file path argument (CLI)
Changing LISTEN_ADDR still requires restarting the process because the server socket is already bound.
| Variable | Required | Default | Hot Reload | Description |
|---|---|---|---|---|
UPSTREAM_URL |
no | https://api.openai.com/v1/chat/completions |
yes | OpenAI-compatible chat completions endpoint |
UPSTREAM_API_KEY |
yes | none | yes | Bearer token sent upstream |
DEFAULT_MODEL |
yes when FORCE_MODEL=1 |
none | yes | Model to send upstream |
FORCE_MODEL |
no | 1 |
yes | Force every incoming request to DEFAULT_MODEL |
MODEL_MAP |
no | {} |
yes | Optional model mapping JSON |
PROXY_CLIENT_KEY |
no | unset | yes | Require clients to send this key |
REQUEST_TIMEOUT_SEC |
no | 600 |
yes | Per-request upstream timeout |
DEBUG |
no | 0 |
yes | Enable request logging |
LISTEN_ADDR |
no | :8787 |
no | HTTP bind address |
RATE_LIMIT |
no | 0 (disabled) |
yes | Max requests per window |
RATE_LIMIT_WINDOW_SEC |
no | 60 |
yes | Rate limit window in seconds |
UPSTREAM_URL=https://integrate.api.nvidia.com/v1/chat/completions
UPSTREAM_API_KEY=nvapi-...
DEFAULT_MODEL=meta/llama-3.1-8b-instruct
FORCE_MODEL=1
LISTEN_ADDR=:8787
REQUEST_TIMEOUT_SEC=600
DEBUG=1UPSTREAM_URL=https://integrate.api.nvidia.com/v1/chat/completions
UPSTREAM_API_KEY=nvapi-...
DEFAULT_MODEL=z-ai/glm-5.1
FORCE_MODEL=1
LISTEN_ADDR=:8787
REQUEST_TIMEOUT_SEC=600
DEBUG=1UPSTREAM_URL=https://integrate.api.nvidia.com/v1/chat/completions
UPSTREAM_API_KEY=nvapi-...
DEFAULT_MODEL=minimaxai/minimax-m2.7
FORCE_MODEL=1
LISTEN_ADDR=:8787
REQUEST_TIMEOUT_SEC=600
DEBUG=1UPSTREAM_URL=http://localhost:11434/v1/chat/completions
UPSTREAM_API_KEY=ollama
DEFAULT_MODEL=qwen3-coder:30b
FORCE_MODEL=1
LISTEN_ADDR=:8787
REQUEST_TIMEOUT_SEC=600
DEBUG=0UPSTREAM_URL=http://localhost:8000/v1/chat/completions
UPSTREAM_API_KEY=not-needed
DEFAULT_MODEL=Qwen/Qwen3-Coder-30B
FORCE_MODEL=1
LISTEN_ADDR=:8787
REQUEST_TIMEOUT_SEC=600
DEBUG=0UPSTREAM_URL=https://integrate.api.nvidia.com/v1/chat/completions
UPSTREAM_API_KEY=nvapi-...
FORCE_MODEL=0
DEFAULT_MODEL=meta/llama-3.1-8b-instruct
MODEL_MAP={"claude-opus":"meta/llama-3.1-70b-instruct","claude-sonnet":"meta/llama-3.1-8b-instruct","claude-haiku":"nvidia/llama-3.1-nemotron-nano-8b-v1"}
LISTEN_ADDR=:8787
REQUEST_TIMEOUT_SEC=600
DEBUG=1Claude Code only needs an Anthropic-compatible base URL and some API key value.
If proxy auth is disabled:
export ANTHROPIC_BASE_URL=http://localhost:8787
export ANTHROPIC_API_KEY=anything
claudeIf proxy auth is enabled:
PROXY_CLIENT_KEY=my-local-proxy-keythen:
export ANTHROPIC_BASE_URL=http://localhost:8787
export ANTHROPIC_API_KEY=my-local-proxy-key
claudecurl http://127.0.0.1:8787/healthzExpected:
ok
curl http://127.0.0.1:8787/Example response:
{
"service": "anthropic-proxy",
"upstream": "https://integrate.api.nvidia.com/v1/chat/completions",
"default_model": "z-ai/glm-5.1",
"force_model": true,
"models": {},
"request_timeout_sec": 600
}curl http://127.0.0.1:8787/v1/messages \
-H "content-type: application/json" \
-d '{
"model": "claude-sonnet-4",
"max_tokens": 64,
"messages": [
{ "role": "user", "content": "Reply with exactly: proxy works" }
]
}'Example response:
{
"id": "msg_xxx",
"type": "message",
"role": "assistant",
"model": "claude-sonnet-4",
"content": [
{ "type": "text", "text": "proxy works" }
],
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 41,
"output_tokens": 3
}
}curl http://127.0.0.1:8787/v1/messages \
-H "content-type: application/json" \
-d '{
"model": "claude-opus-4",
"stream": true,
"max_tokens": 64,
"messages": [
{ "role": "user", "content": "Say hello" }
]
}'curl http://127.0.0.1:8787/v1/messages/count_tokens \
-H "content-type: application/json" \
-d '{"model":"claude-test","max_tokens":10,"messages":[]}'- Some upstream models are slower than others.
- Some reasoning models may emit hidden or partial reasoning content differently.
- Some providers behave inconsistently in streaming mode.
- If a model hangs directly at the upstream, the proxy cannot fix that.
For that reason, when debugging:
- test the upstream directly first
- test the same model through the proxy
- compare the behavior
count_tokensis only a rough estimatecache_controlblocks are dropped because OpenAI-compatible APIs usually have no equivalentLISTEN_ADDRchanges require restart- Streaming support depends on how faithfully the upstream implements OpenAI-style SSE
MIT