🦀 kani

LLM smart router. Classifies prompts by complexity and routes to the optimal model.

OpenAI API-compatible proxy — drop in as a base URL and let kani pick the right model automatically.

How it works

Request → Distilled Feature Classifier (15 dimensions) → Tier + Agentic Score → Capability Filter → Model Selection (round-robin) → Upstream Provider
                                                     │
                                                     └─ model unavailable → conservative default

Classification pipeline:

Distilled feature classifier — deterministic tokenCount + learned 14 semantic dimensions
Axis-based scoring — separate complexity and reasoning scores drive tier selection
Independent agentic score — agenticTask dimension is exposed as agentic_score without affecting tier
Conservative default — fall back to MEDIUM when the feature model is unavailable
Capability filter — auto-detects vision/tools/json_mode from the request and escalates to a capable model

Every request is logged to $XDG_STATE_HOME/kani/log/ (default: ~/.local/state/kani/log/) as training data for future model improvement.

Scoring approach

kani no longer relies on hand-maintained keyword lists or runtime LLM fallback for routing. The scorer is now distilled-feature-first:

compute tokenCount deterministically
infer 14 semantic dimensions (low / medium / high) using a learned multi-output classifier
compute separate complexity and reasoning axis scores from the dimensions
determine tier from these axis scores (SIMPLE / MEDIUM / COMPLEX / REASONING)
expose agentic_score from the agenticTask dimension independently (does not affect tier)
return a conservative default tier when the feature model, embedding configuration, embedding request, or prediction path is unavailable

Axis-based tier thresholds:

REASONING: reasoning_score >= 0.75 (4 reasoning dimensions averaged)
COMPLEX: complexity_score >= 0.8 (6 complexity dimensions averaged)
MEDIUM: complexity_score >= 0.5
SIMPLE: below all thresholds

This makes routing behavior easier to improve with data, because changes come from retraining and calibration rather than runtime prompt engineering.

Quick start

Try without installing (uvx)

# Classify a prompt
uvx --from git+https://github.com/tumf/kani kani route "hello world"

# Start the proxy server
uvx --from git+https://github.com/tumf/kani kani serve

Local install

git clone https://github.com/tumf/kani.git && cd kani
uv sync

uv run kani route "hello world"
uv run kani serve

Usage — drop-in replacement for OpenAI / OpenRouter

kani speaks the OpenAI API. Change base_url and model, everything else stays the same.

Before (direct OpenAI)

from openai import OpenAI

client = OpenAI(
    api_key="sk-...",                          # OpenAI key
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "explain quicksort"}],
)

Before (OpenRouter)

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",  # OpenRouter
    api_key="sk-or-...",                       # OpenRouter key
)

response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4",
    messages=[{"role": "user", "content": "explain quicksort"}],
)

After (kani) — auto-routed

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:18420/v1",      # ← kani
    api_key="anything",                        # kani handles upstream auth
)

# kani picks the best model based on prompt complexity
response = client.chat.completions.create(
    model="kani/auto",
    messages=[{"role": "user", "content": "explain quicksort"}],
)

# Or pin a routing profile
response = client.chat.completions.create(
    model="kani/premium",  # always use best-quality models
    messages=[{"role": "user", "content": "prove P != NP"}],
)

curl

curl http://localhost:18420/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "kani/auto", "messages": [{"role": "user", "content": "hello"}]}'

That's it. Any tool or library that supports the OpenAI API works with kani — LangChain, LlamaIndex, Cursor, Continue, etc. Just point base_url at kani.

Routing profiles

Note: The routing profiles below are sample/reference defaults. Treat them as examples — you should tune the actual profile names, strategies, and model mappings to match your own workload and cost/quality goals.

Profile	Strategy	Best for
`kani/auto`	Balanced cost/quality (default)	General use
`kani/eco`	Cheapest viable models	High volume, low stakes
`kani/premium`	Best quality models	Critical tasks
`kani/agentic`	Tool-use optimized	Agent workflows

Capability-aware routing

kani automatically detects required capabilities from the request and routes to a model that supports them. If no model in the scored tier has the required capabilities, kani escalates to higher tiers.

Detected capabilities:

Capability	Trigger
`vision`	`image_url` content block in messages
`tools`	`tools` or `functions` field in request
`json_mode`	`response_format.type` is `json_object` or `json_schema`

Configuration: declare model metadata via prefix matching in config.yaml using model_rules:

model_rules:
  - prefix: "anthropic/claude-"
    capabilities: [vision, tools, json_mode]
  - prefix: "google/gemini-"
    capabilities: [vision, tools, json_mode]
  - prefix: "openai/gpt-4"
    capabilities: [vision, tools, json_mode]

model_rules is the primary model metadata surface. The older model_capabilities key remains a legacy alias and is normalized into model_rules only when model_rules is unset. When a request requires capabilities and no configured candidate declares the full required set, routing fails closed instead of selecting an incapable model.

API endpoints

Endpoint	Method	Description
`/v1/chat/completions`	POST	Main proxy (OpenAI-compatible)
`/v1/models`	GET	List available models
`/v1/route`	POST	Debug — returns routing decision without proxying
`/admin/reload-config`	POST	Admin-only safe config hot reload
`/health`	GET	Health + active config version metadata

Routed responses include extra headers: X-Kani-Tier, X-Kani-Model, X-Kani-Score, X-Kani-Signals.

Configuration

config.yaml:

host: "0.0.0.0"
port: 18420
default_provider: openrouter
default_profile: auto

providers:
  openrouter:
    name: openrouter
    base_url: "https://openrouter.ai/api/v1"
    api_key: "${OPENROUTER_API_KEY}"
    # reasoning_style: openai | anthropic | dashscope | gemini | none (default: openai)
  cliproxy:
    name: cliproxy
    base_url: "http://127.0.0.1:8317/v1"
    api_key: "local-test-key"

profiles:
  auto:
    tiers:
      SIMPLE:
        # primary can be a single model or an ordered list for round-robin
        primary: ["google/gemini-2.5-flash", "google/gemini-2.5-flash-lite"]
        fallback: ["nvidia/gpt-oss-120b"]
      MEDIUM:
        primary:
          model: "moonshotai/kimi-k2.5"
          max_input_tokens: 128000
        fallback: null  # allowed; normalized to []
      COMPLEX:
        primary: "google/gemini-3.1-pro"
        fallback:
          - model: "anthropic/claude-sonnet-4.6"
            max_input_tokens: 200000
      REASONING:
        primary: "x-ai/grok-4-1-fast-reasoning"
        fallback: ["anthropic/claude-sonnet-4.6"]
      # provider: per-tier override (optional)

smart_proxy:
  fallback_backoff:
    enabled: true
    initial_delay_seconds: 5
    multiplier: 2
    max_delay_seconds: 300

${VAR} syntax resolves environment variables
Provider resolution order is: model-entry provider > tier-level provider > default_provider
Configured model IDs are sent literally to the selected provider; anthropic/claude-sonnet-4.6 is not parsed by kani as a provider selector unless you also set a provider field
primary accepts a string, {model, provider, max_input_tokens} object, or a list of those; list entries are selected round-robin per profile+tier combination
fallback accepts the same string/object entries as primary; object entries can set max_input_tokens so candidates with a known input limit lower than the estimated prompt tokens are skipped
Candidates without max_input_tokens remain eligible because their input limit is unknown
fallback: null is accepted only at profiles.*.tiers.*.fallback and normalized to []
When primary fails, fallback attempts skip the failed primary candidate and deduplicate repeated model+provider entries
smart_proxy.fallback_backoff enables process-local exponential cooldowns for retryable non-streaming 429 / 5xx failures, keyed by model+provider
Cooled-down model+provider pairs are skipped during both primary selection and fallback execution; the same model on a different provider remains eligible
Successful recovery resets the failure streak for that exact model+provider pair, and restarting kani clears the in-memory cooldown registry
Config path: --config flag > $KANI_CONFIG env var > ./config.yaml > $XDG_CONFIG_HOME/kani/config.yaml > /etc/kani/config.yaml
Set KANI_ADMIN_TOKEN to enable POST /admin/reload-config (admin-only, separate from regular API keys)
Hot reload validates with strict=True and rejects non-reloadable field changes (host, port) with 409

Smart-proxy context compaction

kani can optionally reduce context pressure for long-running conversations by compacting oversized message histories before proxying upstream (Phase A) and by pre-computing summaries in the background for reuse on later requests (Phase B).

All compaction behavior is opt-in and disabled by default. When disabled or when compaction fails, kani routes and proxies requests unchanged.

Configuration

Add a smart_proxy section to your config.yaml:

smart_proxy:
  context_compaction:
    enabled: true                        # master switch

    sync_compaction:
      enabled: true                      # Phase A: compact inline before proxying
      threshold_percent: 80.0            # compact when prompt ≥ 80% of context window
      protect_first_n: 1                 # turns to keep at head of conversation
      protect_last_n: 2                  # turns to keep at tail
      summary_profile: ""                # empty = resolve through default_profile

    background_precompaction:
      enabled: true                      # Phase B: pre-compute summaries async
      trigger_percent: 70.0              # start background job at 70% usage
      max_concurrency: 2                 # max parallel background jobs
      summary_ttl_seconds: 3600

    session:
      header_name: X-Kani-Session-Id    # client header for explicit session binding

    context_window_tokens: 128000        # assumed context window for threshold math

Summary generation is selected by routing profile. Set summary_profile to a profile such as compress to route summaries through that profile; leave it empty to fall back to default_profile via the router's normal model resolution.

Session identity

kani uses a stable session key only when the client sends the configured explicit session header:

Explicit header — value of session.header_name (required for Phase B cache hits, persistence, incremental summarization, and background precompaction)
No header — no session ID is derived; inline compaction may still run for an oversized request, but cache reuse and background precompaction are unavailable

The resolution mode is surfaced in the X-Kani-Compaction-Session response header as explicit or none.

Operator telemetry

Each routed response includes compaction headers:

Header	Values	Meaning
`X-Kani-Compaction`	`off` \| `skipped` \| `inline` \| `cached` \| `failed`	What compaction did
`X-Kani-Compaction-Session`	`explicit` \| `none`	How session was resolved
`X-Kani-Compaction-Saved-Tokens`	integer	Estimated tokens saved

Structured log fields are emitted at INFO level on every compaction decision. Failures are logged at WARNING level and never propagate to the client.

Safe config hot reload (admin)

Use admin-only config hot reload without restarting the proxy:

# 1) set an admin token (separate from normal API keys)
export KANI_ADMIN_TOKEN="your-admin-token"

# 2) trigger reload after editing config.yaml
curl -X POST http://localhost:18420/admin/reload-config \
  -H "Authorization: Bearer ${KANI_ADMIN_TOKEN}"

Behavior:

Reload is applied only when strict config validation succeeds.
In-flight requests keep the state snapshot captured at request start.
Changes to host / port are rejected as non-reloadable with 409 and require process restart.

Docker Compose / local deployment

No additional services are required. Compaction state is persisted in SQLite under $XDG_DATA_HOME/kani/compaction.db (default: ~/.local/share/kani/compaction.db). Override with KANI_DATA_DIR.

# Verify compaction is active after startup:
curl -s http://localhost:18420/health | jq .
# Inspect a routed request's compaction outcome:
curl -v -X POST http://localhost:18420/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-Kani-Session-Id: my-session-1" \
  -d '{"model":"kani/auto","messages":[{"role":"user","content":"hello"}]}' \
  2>&1 | grep -i "x-kani-compaction"

Offline feature annotation

Runtime routing does not call an LLM. LLM usage is limited to offline dataset generation when logs are missing semantic labels.

Optional annotator configuration (for scripts/build_agentic_dataset.py --annotate-missing) can be set in config.yaml under feature_annotator, or overridden with env vars:

feature_annotator:
  model: "gemini-2.5-flash-lite"
  provider: "cliproxy"  # optional; defaults to default_provider

feature_annotator and llm_classifier connection details are provider-resolved. In config.yaml, set model + optional provider; do not set base_url or api_key directly in these sections.

Env var	Default	Description
`KANI_LLM_ANNOTATOR_MODEL`	`google/gemini-2.5-flash-lite`	Annotation model
`KANI_LLM_ANNOTATOR_BASE_URL`	`https://openrouter.ai/api/v1`	API endpoint
`KANI_LLM_ANNOTATOR_API_KEY`	`$OPENROUTER_API_KEY`	API key

Priority is: CLI flags > env vars > config.yaml feature_annotator > built-in defaults.

Routing logs

All decisions are logged to $XDG_STATE_HOME/kani/log/routing-YYYY-MM-DD.jsonl (default: ~/.local/state/kani/log/):

{"timestamp":"2025-03-21T19:50:00","prompt_preview":"prove the Riemann...","tier":"REASONING","score":0.82,"confidence":0.87,"method":"distilled-features","agentic_score":1.0,"signals":{"tokenCount":38,"semanticLabels":{"reasoningMarkers":"high","agenticTask":"high"},"featureVersion":"v1"}}

Use these logs to build distilled feature training data:

uv run python scripts/build_agentic_dataset.py \
  --output data/distilled_feature_dataset.json

When existing logs do not yet include semantic labels, you can annotate missing examples offline:

uv run python scripts/build_agentic_dataset.py \
  --annotate-missing \
  --output data/distilled_feature_dataset.json

Then train the multi-output feature classifier bundle:

uv run python scripts/train_classifier.py \
  --data data/distilled_feature_dataset.json \
  --output models

This writes models/feature_classifier.pkl with the sklearn multi-output classifier, per-dimension label encoders, weights, thresholds, and embedding metadata.

API key authentication

kani supports API key authentication to restrict proxy access. Keys are managed via the CLI and stored in $XDG_DATA_HOME/kani/api_keys.json.

When no keys are configured, all requests pass through without authentication (backward-compatible). As soon as one key is added, every API request must include a valid Authorization: Bearer <key> header.

# Create a key (auto-generated, shown once)
kani keys add hermes
#   kani-aBcDeFgH...  ← save this

# List keys (prefix only, secrets are not stored in plaintext)
kani keys list

# Remove a key by name or prefix
kani keys remove hermes

Using the key:

curl http://localhost:18420/v1/chat/completions \
  -H "Authorization: Bearer kani-aBcDeFgH..." \
  -H "Content-Type: application/json" \
  -d '{"model": "kani/auto", "messages": [{"role": "user", "content": "hello"}]}'

client = OpenAI(
    base_url="http://localhost:18420/v1",
    api_key="kani-aBcDeFgH...",  # kani API key
)

/health and /docs are exempt from authentication. No server restart required — keys take effect immediately.

CLI

kani serve [--config path] [--host 0.0.0.0] [--port 18420]
kani route "your prompt here" [--config path]
kani config [--config path]
kani keys add <name>
kani keys list
kani keys remove <name|prefix>

Architecture

src/kani/
├── scorer.py    # distilled feature scoring (15-dimensional classifier)
├── router.py    # Tier → model+provider mapping
├── proxy.py     # FastAPI OpenAI-compatible server
├── config.py    # YAML config loading, env var resolution
├── dirs.py      # XDG-compliant directory paths (config, data, logs)
├── logger.py    # JSONL routing log
└── cli.py       # Click CLI

Development

uv sync
uv run pytest tests/ -q    # 176 tests
uv run ruff check src/
uv run pyright src/

Credits

Scoring logic ported from ClawRouter (MIT license).

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 224 Commits
.github/workflows		.github/workflows
.opencodereview		.opencodereview
assets		assets
data		data
models		models
openspec		openspec
scripts		scripts
src/kani		src/kani
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
AGENTS.md		AGENTS.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
README.DASHBOARD.md		README.DASHBOARD.md
README.md		README.md
config.example.yaml		config.example.yaml
docker-compose.yaml		docker-compose.yaml
grafana-dashboard-kani.json		grafana-dashboard-kani.json
grafana-datasource-kani-sqlite.yml		grafana-datasource-kani-sqlite.yml
kani-dashboard-kani.json		kani-dashboard-kani.json
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

🦀 kani

How it works

Scoring approach

Quick start

Try without installing (uvx)

Local install

Usage — drop-in replacement for OpenAI / OpenRouter

Before (direct OpenAI)

Before (OpenRouter)

After (kani) — auto-routed

curl

Routing profiles

Capability-aware routing

API endpoints

Configuration

Smart-proxy context compaction

Configuration

Session identity

Operator telemetry

Safe config hot reload (admin)

Docker Compose / local deployment

Offline feature annotation

Routing logs

API key authentication

CLI

Architecture

Development

Credits

License

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages