LLM smart router. Classifies prompts by complexity and routes to the optimal model.
OpenAI API-compatible proxy — drop in as a base URL and let kani pick the right model automatically.
Request → Distilled Feature Classifier (15 dimensions) → Tier + Agentic Score → Capability Filter → Model Selection (round-robin) → Upstream Provider
│
└─ model unavailable → conservative default
Classification pipeline:
- Distilled feature classifier — deterministic
tokenCount+ learned 14 semantic dimensions - Axis-based scoring — separate
complexityandreasoningscores drive tier selection - Independent agentic score —
agenticTaskdimension is exposed asagentic_scorewithout affecting tier - Conservative default — fall back to
MEDIUMwhen the feature model is unavailable - Capability filter — auto-detects vision/tools/json_mode from the request and escalates to a capable model
Every request is logged to $XDG_STATE_HOME/kani/log/ (default: ~/.local/state/kani/log/) as training data for future model improvement.
kani no longer relies on hand-maintained keyword lists or runtime LLM fallback for routing. The scorer is now distilled-feature-first:
- compute
tokenCountdeterministically - infer 14 semantic dimensions (
low/medium/high) using a learned multi-output classifier - compute separate complexity and reasoning axis scores from the dimensions
- determine tier from these axis scores (
SIMPLE/MEDIUM/COMPLEX/REASONING) - expose
agentic_scorefrom theagenticTaskdimension independently (does not affect tier) - return a conservative default tier when the feature model, embedding configuration, embedding request, or prediction path is unavailable
Axis-based tier thresholds:
- REASONING:
reasoning_score >= 0.75(4 reasoning dimensions averaged) - COMPLEX:
complexity_score >= 0.8(6 complexity dimensions averaged) - MEDIUM:
complexity_score >= 0.5 - SIMPLE: below all thresholds
This makes routing behavior easier to improve with data, because changes come from retraining and calibration rather than runtime prompt engineering.
# Classify a prompt
uvx --from git+https://github.com/tumf/kani kani route "hello world"
# Start the proxy server
uvx --from git+https://github.com/tumf/kani kani servegit clone https://github.com/tumf/kani.git && cd kani
uv sync
uv run kani route "hello world"
uv run kani servekani speaks the OpenAI API. Change base_url and model, everything else stays the same.
from openai import OpenAI
client = OpenAI(
api_key="sk-...", # OpenAI key
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "explain quicksort"}],
)from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1", # OpenRouter
api_key="sk-or-...", # OpenRouter key
)
response = client.chat.completions.create(
model="anthropic/claude-sonnet-4",
messages=[{"role": "user", "content": "explain quicksort"}],
)from openai import OpenAI
client = OpenAI(
base_url="http://localhost:18420/v1", # ← kani
api_key="anything", # kani handles upstream auth
)
# kani picks the best model based on prompt complexity
response = client.chat.completions.create(
model="kani/auto",
messages=[{"role": "user", "content": "explain quicksort"}],
)
# Or pin a routing profile
response = client.chat.completions.create(
model="kani/premium", # always use best-quality models
messages=[{"role": "user", "content": "prove P != NP"}],
)curl http://localhost:18420/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "kani/auto", "messages": [{"role": "user", "content": "hello"}]}'That's it. Any tool or library that supports the OpenAI API works with kani — LangChain, LlamaIndex, Cursor, Continue, etc. Just point
base_urlat kani.
Note: The routing profiles below are sample/reference defaults. Treat them as examples — you should tune the actual profile names, strategies, and model mappings to match your own workload and cost/quality goals.
| Profile | Strategy | Best for |
|---|---|---|
kani/auto |
Balanced cost/quality (default) | General use |
kani/eco |
Cheapest viable models | High volume, low stakes |
kani/premium |
Best quality models | Critical tasks |
kani/agentic |
Tool-use optimized | Agent workflows |
kani automatically detects required capabilities from the request and routes to a model that supports them. If no model in the scored tier has the required capabilities, kani escalates to higher tiers.
Detected capabilities:
| Capability | Trigger |
|---|---|
vision |
image_url content block in messages |
tools |
tools or functions field in request |
json_mode |
response_format.type is json_object or json_schema |
Configuration: declare model metadata via prefix matching in config.yaml using model_rules:
model_rules:
- prefix: "anthropic/claude-"
capabilities: [vision, tools, json_mode]
- prefix: "google/gemini-"
capabilities: [vision, tools, json_mode]
- prefix: "openai/gpt-4"
capabilities: [vision, tools, json_mode]model_rules is the primary model metadata surface. The older model_capabilities key remains a legacy alias and is normalized into model_rules only when model_rules is unset. When a request requires capabilities and no configured candidate declares the full required set, routing fails closed instead of selecting an incapable model.
| Endpoint | Method | Description |
|---|---|---|
/v1/chat/completions |
POST | Main proxy (OpenAI-compatible) |
/v1/models |
GET | List available models |
/v1/route |
POST | Debug — returns routing decision without proxying |
/admin/reload-config |
POST | Admin-only safe config hot reload |
/health |
GET | Health + active config version metadata |
Routed responses include extra headers: X-Kani-Tier, X-Kani-Model, X-Kani-Score, X-Kani-Signals.
config.yaml:
host: "0.0.0.0"
port: 18420
default_provider: openrouter
default_profile: auto
providers:
openrouter:
name: openrouter
base_url: "https://openrouter.ai/api/v1"
api_key: "${OPENROUTER_API_KEY}"
# reasoning_style: openai | anthropic | dashscope | gemini | none (default: openai)
cliproxy:
name: cliproxy
base_url: "http://127.0.0.1:8317/v1"
api_key: "local-test-key"
profiles:
auto:
tiers:
SIMPLE:
# primary can be a single model or an ordered list for round-robin
primary: ["google/gemini-2.5-flash", "google/gemini-2.5-flash-lite"]
fallback: ["nvidia/gpt-oss-120b"]
MEDIUM:
primary:
model: "moonshotai/kimi-k2.5"
max_input_tokens: 128000
fallback: null # allowed; normalized to []
COMPLEX:
primary: "google/gemini-3.1-pro"
fallback:
- model: "anthropic/claude-sonnet-4.6"
max_input_tokens: 200000
REASONING:
primary: "x-ai/grok-4-1-fast-reasoning"
fallback: ["anthropic/claude-sonnet-4.6"]
# provider: per-tier override (optional)
smart_proxy:
fallback_backoff:
enabled: true
initial_delay_seconds: 5
multiplier: 2
max_delay_seconds: 300
${VAR}syntax resolves environment variables- Provider resolution order is: model-entry
provider> tier-levelprovider>default_provider - Configured model IDs are sent literally to the selected provider;
anthropic/claude-sonnet-4.6is not parsed by kani as a provider selector unless you also set aproviderfield primaryaccepts a string,{model, provider, max_input_tokens}object, or a list of those; list entries are selected round-robin perprofile+tiercombinationfallbackaccepts the same string/object entries asprimary; object entries can setmax_input_tokensso candidates with a known input limit lower than the estimated prompt tokens are skipped- Candidates without
max_input_tokensremain eligible because their input limit is unknown fallback: nullis accepted only atprofiles.*.tiers.*.fallbackand normalized to[]- When primary fails, fallback attempts skip the failed primary candidate and deduplicate repeated
model+providerentries smart_proxy.fallback_backoffenables process-local exponential cooldowns for retryable non-streaming429/5xxfailures, keyed bymodel+provider- Cooled-down
model+providerpairs are skipped during both primary selection and fallback execution; the same model on a different provider remains eligible - Successful recovery resets the failure streak for that exact
model+providerpair, and restarting kani clears the in-memory cooldown registry - Config path:
--configflag >$KANI_CONFIGenv var >./config.yaml>$XDG_CONFIG_HOME/kani/config.yaml>/etc/kani/config.yaml - Set
KANI_ADMIN_TOKENto enablePOST /admin/reload-config(admin-only, separate from regular API keys) - Hot reload validates with
strict=Trueand rejects non-reloadable field changes (host,port) with409
kani can optionally reduce context pressure for long-running conversations by compacting oversized message histories before proxying upstream (Phase A) and by pre-computing summaries in the background for reuse on later requests (Phase B).
All compaction behavior is opt-in and disabled by default. When disabled or when compaction fails, kani routes and proxies requests unchanged.
Add a smart_proxy section to your config.yaml:
smart_proxy:
context_compaction:
enabled: true # master switch
sync_compaction:
enabled: true # Phase A: compact inline before proxying
threshold_percent: 80.0 # compact when prompt ≥ 80% of context window
protect_first_n: 1 # turns to keep at head of conversation
protect_last_n: 2 # turns to keep at tail
summary_profile: "" # empty = resolve through default_profile
background_precompaction:
enabled: true # Phase B: pre-compute summaries async
trigger_percent: 70.0 # start background job at 70% usage
max_concurrency: 2 # max parallel background jobs
summary_ttl_seconds: 3600
session:
header_name: X-Kani-Session-Id # client header for explicit session binding
context_window_tokens: 128000 # assumed context window for threshold mathSummary generation is selected by routing profile. Set summary_profile to a profile such as compress to route summaries through that profile; leave it empty to fall back to default_profile via the router's normal model resolution.
kani uses a stable session key only when the client sends the configured explicit session header:
- Explicit header — value of
session.header_name(required for Phase B cache hits, persistence, incremental summarization, and background precompaction) - No header — no session ID is derived; inline compaction may still run for an oversized request, but cache reuse and background precompaction are unavailable
The resolution mode is surfaced in the X-Kani-Compaction-Session response header as explicit or none.
Each routed response includes compaction headers:
| Header | Values | Meaning |
|---|---|---|
X-Kani-Compaction |
off | skipped | inline | cached | failed |
What compaction did |
X-Kani-Compaction-Session |
explicit | none |
How session was resolved |
X-Kani-Compaction-Saved-Tokens |
integer | Estimated tokens saved |
Structured log fields are emitted at INFO level on every compaction decision. Failures are logged at WARNING level and never propagate to the client.
Use admin-only config hot reload without restarting the proxy:
# 1) set an admin token (separate from normal API keys)
export KANI_ADMIN_TOKEN="your-admin-token"
# 2) trigger reload after editing config.yaml
curl -X POST http://localhost:18420/admin/reload-config \
-H "Authorization: Bearer ${KANI_ADMIN_TOKEN}"Behavior:
- Reload is applied only when strict config validation succeeds.
- In-flight requests keep the state snapshot captured at request start.
- Changes to
host/portare rejected as non-reloadable with409and require process restart.
No additional services are required. Compaction state is persisted in SQLite under $XDG_DATA_HOME/kani/compaction.db (default: ~/.local/share/kani/compaction.db). Override with KANI_DATA_DIR.
# Verify compaction is active after startup:
curl -s http://localhost:18420/health | jq .
# Inspect a routed request's compaction outcome:
curl -v -X POST http://localhost:18420/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Kani-Session-Id: my-session-1" \
-d '{"model":"kani/auto","messages":[{"role":"user","content":"hello"}]}' \
2>&1 | grep -i "x-kani-compaction"Runtime routing does not call an LLM. LLM usage is limited to offline dataset generation when logs are missing semantic labels.
Optional annotator configuration (for scripts/build_agentic_dataset.py --annotate-missing) can be set in config.yaml under feature_annotator, or overridden with env vars:
feature_annotator:
model: "gemini-2.5-flash-lite"
provider: "cliproxy" # optional; defaults to default_providerfeature_annotator and llm_classifier connection details are provider-resolved. In config.yaml, set model + optional provider; do not set base_url or api_key directly in these sections.
| Env var | Default | Description |
|---|---|---|
KANI_LLM_ANNOTATOR_MODEL |
google/gemini-2.5-flash-lite |
Annotation model |
KANI_LLM_ANNOTATOR_BASE_URL |
https://openrouter.ai/api/v1 |
API endpoint |
KANI_LLM_ANNOTATOR_API_KEY |
$OPENROUTER_API_KEY |
API key |
Priority is: CLI flags > env vars > config.yaml feature_annotator > built-in defaults.
All decisions are logged to $XDG_STATE_HOME/kani/log/routing-YYYY-MM-DD.jsonl (default: ~/.local/state/kani/log/):
{"timestamp":"2025-03-21T19:50:00","prompt_preview":"prove the Riemann...","tier":"REASONING","score":0.82,"confidence":0.87,"method":"distilled-features","agentic_score":1.0,"signals":{"tokenCount":38,"semanticLabels":{"reasoningMarkers":"high","agenticTask":"high"},"featureVersion":"v1"}}Use these logs to build distilled feature training data:
uv run python scripts/build_agentic_dataset.py \
--output data/distilled_feature_dataset.jsonWhen existing logs do not yet include semantic labels, you can annotate missing examples offline:
uv run python scripts/build_agentic_dataset.py \
--annotate-missing \
--output data/distilled_feature_dataset.jsonThen train the multi-output feature classifier bundle:
uv run python scripts/train_classifier.py \
--data data/distilled_feature_dataset.json \
--output modelsThis writes models/feature_classifier.pkl with the sklearn multi-output classifier, per-dimension label encoders, weights, thresholds, and embedding metadata.
kani supports API key authentication to restrict proxy access. Keys are managed via the CLI and stored in $XDG_DATA_HOME/kani/api_keys.json.
When no keys are configured, all requests pass through without authentication (backward-compatible). As soon as one key is added, every API request must include a valid Authorization: Bearer <key> header.
# Create a key (auto-generated, shown once)
kani keys add hermes
# kani-aBcDeFgH... ← save this
# List keys (prefix only, secrets are not stored in plaintext)
kani keys list
# Remove a key by name or prefix
kani keys remove hermesUsing the key:
curl http://localhost:18420/v1/chat/completions \
-H "Authorization: Bearer kani-aBcDeFgH..." \
-H "Content-Type: application/json" \
-d '{"model": "kani/auto", "messages": [{"role": "user", "content": "hello"}]}'client = OpenAI(
base_url="http://localhost:18420/v1",
api_key="kani-aBcDeFgH...", # kani API key
)/health and /docs are exempt from authentication. No server restart required — keys take effect immediately.
kani serve [--config path] [--host 0.0.0.0] [--port 18420]
kani route "your prompt here" [--config path]
kani config [--config path]
kani keys add <name>
kani keys list
kani keys remove <name|prefix>src/kani/
├── scorer.py # distilled feature scoring (15-dimensional classifier)
├── router.py # Tier → model+provider mapping
├── proxy.py # FastAPI OpenAI-compatible server
├── config.py # YAML config loading, env var resolution
├── dirs.py # XDG-compliant directory paths (config, data, logs)
├── logger.py # JSONL routing log
└── cli.py # Click CLI
uv sync
uv run pytest tests/ -q # 176 tests
uv run ruff check src/
uv run pyright src/Scoring logic ported from ClawRouter (MIT license).
MIT