A reverse proxy server for OpenRouter.ai that adds prefill functionality and cache control for Anthropic models.
Many LLM clients don't support native prefill configuration or cache control. This proxy enables these advanced features from any message in the conversation - user messages, system prompts, or any other message type.
-
Start the server:
llm_proxy.exeThe server will start on port 8080.
-
Configure your OpenRouter API key: The proxy provides OpenAI-compatible endpoints. Include your OpenRouter API key in the
Authorizationheader when making requests tohttp://localhost:8080.
If any message contains a prefill block using |prefill: and | delimiters, the proxy will:
- Remove the prefill block from the original message content
- Add a new assistant message with the prefill content to the conversation
Example:
Suppose you send the following message:
Hello |prefill: I'm an AI assistant|
This would normally be sent as:
{
"role": "user",
"content": "Hello |prefill: I'm an AI assistant|"
}The proxy will transform it to:
[
{
"role": "user",
"content": "Hello "
},
{
"role": "assistant",
"content": "I'm an AI assistant"
}
]If any message contains the |cache| command, the proxy will add cache control to the last user message. This enables prompt caching functionality to reduce costs for repeated prompts. Make sure you are using an Anthropic model.
Example:
Your message content |cache|
The proxy will transform it to:
{
"role": "user",
"content": "Your message content",
"cache_control": {
"type": "ephemeral"
}
}