A local proxy server that relays Anthropic Messages API-compatible requests to the Kiro (Amazon Q) backend using Kiro CLI credentials.
Just set ANTHROPIC_BASE_URL from any Anthropic API client (e.g., Claude Code) to use Claude models via Kiro.
- Anthropic Messages API compatible — Supports
/v1/messages(streaming / non-streaming),/v1/messages/count_tokens, and/v1/models - Request conversion — Automatically converts Anthropic API requests to Kiro API (AWS Event Stream) format
- Response conversion — Converts Kiro event streams back to Anthropic SSE format
- Automatic auth management — Reads credentials from Kiro CLI's SQLite DB with automatic token refresh (Social / OIDC)
- Model mapping — Maps Anthropic model names (e.g.,
claude-sonnet-4-6) to Kiro model names. Customizable via environment variable - Extended Thinking — Enable via the
[1m]suffix, thethinkingfield, oroutput_config.effort. Reasoning depth travels natively asadditionalModelRequestFields.output_config.effort(validated against each model's enum; defaults tomediumfor effort-capable models when thinking is on without an explicit effort) - Tool Search — Proxy-side implementation of Anthropic's Tool Search Tool. Supports
tool_search_tool_regex_20251119andtool_search_tool_bm25_20251119withdefer_loadingfor on-demand tool discovery - Prompt Caching — Converts Anthropic tool-level
cache_controlto KirocachePoint - Truncation detection — Automatically injects a notice into the next request when a response is truncated
- Retry — Exponential backoff retry for 403 (token expiry), 429, and 5xx errors. Also retries thinking-only (empty visible) responses
- API key auth — Optional access restriction for the proxy itself
- CORS — Allows requests from localhost origins
- File logging — Write structured logs (OTel JSON Lines) to a rotating file via lumberjack. Defaults optimized for coding agent consumption (10 MB, uncompressed)
- OpenTelemetry tracing — Opt-in distributed tracing via
--otelwith OTLP HTTP exporter. Captures request/response headers and body as span events across the full proxy chain
- Go 1.26+
- Kiro CLI installed and logged in
brew install d-kuro/tap/kiroccgo install github.com/d-kuro/kirocc/cmd/kirocc@latestkiroccListens on http://127.0.0.1:3456 by default.
export ANTHROPIC_BASE_URL=http://127.0.0.1:3456
export ANTHROPIC_AUTH_TOKEN=dummy
claudeANTHROPIC_AUTH_TOKEN is required by Claude Code but not used for authentication by kirocc (credentials are read from Kiro CLI's DB). Any non-empty value works unless -api-key is set.
| Flag | Default | Description |
|---|---|---|
-port |
3456 |
Listen port |
-host |
127.0.0.1 |
Bind host |
-db |
(OS-dependent, see below) | Kiro CLI SQLite DB path |
-api-key |
(none) | API key required to access the proxy |
-debug |
false |
Enable debug logging |
-log-file |
(none) | Write logs to file with rotation (file-only by default) |
-log-max-size |
10 |
Max log file size in MB before rotation |
-log-max-backups |
5 |
Max number of old log files to retain |
-log-max-age |
7 |
Max days to retain old log files |
-log-compress |
false |
Compress rotated log files with gzip |
-log-console |
false |
Also write logs to console when -log-file is set |
-otel |
false |
Enable OpenTelemetry tracing (OTLP HTTP exporter) |
-otel-body-limit |
32768 |
Max bytes of request body to capture in OTel spans (0 = unlimited) |
| OS | Path |
|---|---|
| macOS | ~/Library/Application Support/kiro-cli/data.sqlite3 |
| Linux | ~/.local/share/kiro-cli/data.sqlite3 |
Command-line options can be overridden with environment variables.
| Variable | Corresponding option |
|---|---|
KIROCC_PORT |
-port |
KIROCC_HOST |
-host |
KIROCC_DB_PATH |
-db |
KIROCC_API_KEY |
-api-key |
KIROCC_DEBUG |
-debug |
KIROCC_LOG_FILE |
-log-file |
KIROCC_LOG_MAX_SIZE |
-log-max-size |
KIROCC_LOG_MAX_BACKUPS |
-log-max-backups |
KIROCC_LOG_MAX_AGE |
-log-max-age |
KIROCC_LOG_COMPRESS |
-log-compress |
KIROCC_LOG_CONSOLE |
-log-console |
KIROCC_OTEL |
-otel |
KIROCC_OTEL_BODY_LIMIT |
-otel-body-limit |
Enable distributed tracing to visualize the full request chain in Jaeger, Grafana Tempo, or any OTLP-compatible backend.
# Start a local collector (e.g., Grafana LGTM stack)
docker run -d --name lgtm -p 3000:3000 -p 4317:4317 -p 4318:4318 grafana/otel-lgtm
# Start kirocc with tracing enabled
kirocc -otelThe OTLP endpoint defaults to http://localhost:4318 and can be configured via the standard OTEL_EXPORTER_OTLP_ENDPOINT environment variable.
Use the KIROCC_MODEL_MAPPINGS environment variable to override model name mappings.
export KIROCC_MODEL_MAPPINGS='[{"anthropic":"my-model","kiro":"claude-sonnet-4.5","context_window_size":200000}]'| Path | Description |
|---|---|
GET /health |
Health check |
GET /v1/models |
List available models |
POST /v1/messages |
Messages API (streaming / non-streaming) |
POST /v1/messages/count_tokens |
Token count (approximate *) |
* count_tokens uses the cl100k_base encoding from tiktoken-go, which differs from Claude's actual tokenizer. The returned value is an approximation.
flowchart TB
subgraph Client
CC["Claude Code / Anthropic API Client"]
end
subgraph kirocc ["kirocc (localhost:3456)"]
direction TB
MW["Middleware<br/>(OTel Tracing, Trace ID, CORS, API Key Auth)"]
Handler["Messages Handler"]
Auth["Auth<br/>(SQLite + Token Refresh)"]
subgraph reqconv ["Request Conversion"]
direction LR
ModelResolve["Model Resolution<br/>claude-sonnet-4-6 → claude-sonnet-4.6"]
MsgNorm["Message Normalization"]
ToolConv["Tool & Schema Conversion"]
ToolSearch["Tool Search<br/>(regex / BM25)"]
EffortResolve["Effort Resolution<br/>(native output_config.effort)"]
EnvState["Env State<br/>(<env> block → operatingSystem/cwd, current message only)"]
CacheConv["Cache Point Conversion<br/>(tool-level only)"]
end
subgraph respconv ["Response Conversion"]
direction LR
EventParse["AWS Event Stream Parser"]
ThinkingParse["Thinking Tag Parser"]
SSEWrite["SSE Writer"]
TruncDetect["Truncation Detection"]
GateWrite["Gate Writer<br/>(buffered retry)"]
end
end
subgraph Kiro ["Kiro API"]
KiroAPI["runtime.{region}.kiro.dev"]
end
CC -- "Anthropic Messages API<br/>(JSON / SSE)" --> MW
MW --> Handler
Handler --> Auth
Handler --> reqconv
reqconv -- "Kiro Payload<br/>(JSON)" --> KiroAPI
KiroAPI -- "AWS Event Stream<br/>(binary frames)" --> respconv
respconv -- "Anthropic SSE / JSON" --> CC
- Client sends an Anthropic Messages API request to kirocc
- Middleware assigns a trace ID, handles CORS, and validates the API key
- Auth reads/refreshes credentials from Kiro CLI's SQLite DB
- Handler resolves the model name and determines thinking mode
- Request conversion pipeline:
- Normalizes messages (merges consecutive same-role messages, extracts text/images/tool_use/tool_result from multi-block content)
- Converts tools and sanitizes JSON Schema (removes unsupported keywords, flattens
anyOf/oneOf/allOf) - If tool search tools are present, partitions tools into active/deferred and injects a proxy-side
ToolSearchtool - Extracts system prompt and places it as a history entry pair
- Parses the
<env>block from the system prompt intoenvState(operatingSystem,currentWorkingDirectory) and attaches it to the current message only - Reorders tool results to match the preceding assistant's tool_use order
- Forwards reasoning effort natively as
additionalModelRequestFields.output_config.effortat the request root (sibling ofconversationState); the resolved effort is validated/clamped per model - Converts Anthropic tool-level
cache_controlto KirocachePoint
- Kiro API returns an AWS Event Stream (binary frames)
- Response conversion pipeline:
- Parses binary event stream frames
- Converts cumulative text to incremental deltas
- Intercepts
ToolSearchtool_use calls, executes search, emitsserver_tool_use/tool_search_tool_resultSSE events, and re-requests Kiro with discovered tools (up to 3 rounds) - Parses
<thinking>tags fromassistantResponseEventor usesreasoningContentEvent(with deduplication) - Enforces
stop_sequencesandmax_tokensadapter-side - Detects truncated responses and stores them; a notice is injected into the next request
- Gate Writer buffers output until visible content arrives, enabling transparent retry of thinking-only responses
kiro-cli 2.5.1 expresses reasoning depth natively through output_config.effort. kirocc forwards it as additionalModelRequestFields.output_config.effort at the request root (sibling of conversationState):
{
"conversationState": { "...": "..." },
"additionalModelRequestFields": {
"output_config": { "effort": "medium" }
}
}Thinking is enabled by any of:
- Model name with
[1m]suffix (e.g.,claude-sonnet-4-6[1m]) Anthropic-Betaheader containingcontext-1m(e.g.,context-1m-2025-01-01)thinking.typeset to"enabled"or"adaptive"in the request
The reasoning effort sent to the backend is resolved as follows:
- An explicit, recognized
output_config.effortwins, validated/clamped to the model's allowed enum (xhighon a 4-value model clamps tomax; unrecognized strings are dropped). - Otherwise, if reasoning is enabled (via
thinking.type, the[1m]suffix, or thecontext-1mheader) without an explicit effort, a default effort ofmediumis sent so the intent reaches the backend. - Otherwise the field is omitted.
Per-model allowed effort levels:
claude-opus-4.8,claude-opus-4.7:low,medium,high,xhigh,maxclaude-opus-4.6,claude-sonnet-4.6(and their-1mvariants):low,medium,high,max(noxhigh; clamps tomax)- All other models omit
additionalModelRequestFieldsentirely
thinking.budget_tokens is accepted in the request but no longer affects behavior; reasoning depth is conveyed entirely through effort.
The Kiro backend does not support Anthropic's Tool Search Tool. kirocc implements it proxy-side with an inner loop:
- Client sends
tool_search_tool_regex_20251119(orbm25) + tools withdefer_loading: true - Proxy partitions tools into active (sent to Kiro) and deferred (held for search)
- Proxy injects a
ToolSearchtool definition that Kiro can understand - When the model calls
ToolSearch, the proxy intercepts the tool_use:- Executes regex or BM25 search against deferred tools
- Emits
server_tool_use+tool_search_tool_resultSSE events to the client - Promotes discovered tools to active and rebuilds the Kiro request
- Calls Kiro again with the updated tool list (up to 3 rounds)
- When the model calls a regular tool or produces text, the response is forwarded to the client
Supported query forms:
select:Read,Edit,Grep— exact tool selection by nameread file— keyword search (regex with word-level OR fallback, or BM25 scoring)
| Input model | Kiro model | Context window |
|---|---|---|
claude-sonnet-4-6 |
claude-sonnet-4.6 |
200k |
claude-sonnet-4-6[1m] |
claude-sonnet-4.6-1m |
1M |
claude-sonnet-4.5 |
claude-sonnet-4.5 |
200k |
claude-sonnet-4.5[1m] |
claude-sonnet-4.5-1m |
1M |
claude-opus-4-8 |
claude-opus-4.8 |
1M |
claude-opus-4-8[1m] |
claude-opus-4.8 |
1M |
claude-opus-4-7 |
claude-opus-4.7 |
1M |
claude-opus-4-7[1m] |
claude-opus-4.7 |
1M |
claude-opus-4-6 |
claude-opus-4.6 |
1M |
claude-opus-4-6[1m] |
claude-opus-4.6 |
1M |
claude-opus-4.5 |
claude-opus-4.5 |
200k |
claude-haiku-4.5 |
claude-haiku-4.5 |
200k |
Opus 4.6, 4.7, and 4.8 always use 1M context (no 200k SKU exists upstream). The explicit [1m]-suffixed aliases (claude-opus-4-8[1m] / claude-opus-4-7[1m] / claude-opus-4-6[1m]) are first-class entries that preserve the suffix verbatim in the response model field — this matches Claude Code's default Max-plan state (lG() emits claude-opus-4-8[1m]) and keeps its mR() 1M-context check happy without spuriously enabling extended thinking. Thinking is still opt-in via Sonnet [1m] suffix, Anthropic-Beta: context-1m header, or thinking field.
Unmatched claude-* models are passed through as-is. Non-claude models fall back to claude-sonnet-4.6.
The model field in /v1/messages responses (streaming message_start, non-streaming body, and tool-search path) is returned as the Anthropic-form ID (e.g. claude-opus-4-7), not the Kiro SKU (claude-opus-4.7).
When the proxy routes to a 1M context window (always-1M SKU such as claude-opus-4.8 / claude-opus-4.7 / claude-opus-4.6, or a model invoked with the [1m] suffix or Anthropic-Beta: context-1m header), a trailing [1m] is appended to the response model ID (e.g. claude-opus-4-8[1m]). Claude Code's client-side context-window logic matches /\[1m\]/i on the response model to pick the 1M window — without the suffix it defaults to 200k and auto-compacts at ~160k even when upstream actually has 1M of context.
Note: [1m] has different meanings on request vs. response. On the request model it is a client-supplied thinking-opt-in signal (and is stripped before upstream routing). On the response model it is purely a context-window advertisement for Claude Code and does not imply that extended thinking was enabled.
Apache License 2.0