Skip to content

itssujeeth/claude-controller

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

claude-controller

A backend/model switcher for Claude Code. One command picks your inference backend, sets the right environment, and launches Claude Code — no manual export juggling, no stale env vars between sessions.

Architecture

architecture.svg

claude-controller sits between you and Claude Code. It resolves dynamic model facts from the Ollama API, merges them with curated compatibility knowledge, pre-flight checks the target endpoint, then hands off to Claude Code with a clean environment. After your session, cc-health and cc-diagnose analyse what happened.

Repository structure

claude-controller/
├── claude-controller.sh      # main entry point — interactive picker, env manager, launcher
├── ollama_resolver.py        # queries Ollama API live for installed models, num_ctx, architecture
├── model_profiles.json       # curated compatibility knowledge: tool_compat, known_issues
├── README.md
├── docs/
│   └── architecture.svg      # system architecture diagram
└── tools/
    ├── cc-health             # post-session JSONL analyser — flags anomalies and failure patterns
    └── cc-diagnose           # Ollama performance diagnostic — VRAM, tok/s, GPU offload

Installation

# Clone or copy the repo, then make scripts executable
chmod +x claude-controller.sh tools/cc-health tools/cc-diagnose

# Optional: symlink to PATH
ln -sf "$(pwd)/claude-controller.sh"  ~/.local/bin/claude-controller
ln -sf "$(pwd)/tools/cc-health"       ~/.local/bin/cc-health
ln -sf "$(pwd)/tools/cc-diagnose"     ~/.local/bin/cc-diagnose

Prerequisites: Claude Code CLI, Python 3, nc/ss/lsof, curl

For Ollama backend: install Ollama and pull at least one model.

For OpenAI or Gemini backends, install LiteLLM:

pip install 'litellm[proxy]' --break-system-packages

Quick start

claude-controller

Pick a backend, pick a model — Claude Code launches. That's it.

All commands

claude-controller                    # interactive picker → launches Claude Code
claude-controller --status           # show active backend, model, compat profile
claude-controller --resolve <model>  # full resolved profile (dynamic + curated)
claude-controller --profiles         # list all curated model profiles
claude-controller --diagnose <model> # Ollama performance diagnostic
claude-controller --start-litellm    # start LiteLLM proxy (kills stale process if needed)
claude-controller --stop-litellm     # stop LiteLLM proxy
claude-controller --setup-litellm    # regenerate LiteLLM config
claude-controller --export           # print env vars for sourcing into current shell

API keys

mkdir -p ~/.config/switchmodel
cat >> ~/.config/switchmodel/keys.env <<EOF
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=...
EOF
chmod 600 ~/.config/switchmodel/keys.env

The Anthropic backend uses your existing claude login session — no API key needed.

Post-session tools

cc-health — session quality analyser

cc-health                  # analyse most recent session
cc-health --last 5         # analyse last 5 sessions
cc-health <session.jsonl>  # analyse a specific file
cc-health --watch          # watch for new sessions and auto-analyse

Reads Claude Code's JSONL session logs and flags:

Flag What it means
SYNTHETIC_ERROR API call died before reaching any model — LiteLLM config or connectivity issue
TOOL_LOOP Same tool called with identical params 2+ times — model stuck in retry loop
WRONG_PARAMS Known bad patterns: pages:"" on Read (gpt-4o), ReadFile instead of Read (qwen3)
AGENT_FAILURE Agent tool called with Claude model names LiteLLM cannot resolve
RAW_XML <function=...> leaked as plain text — model partially trained on tool syntax
CONTEXT_CAP All turns show identical input_tokens — conversation history being truncated
ROLE_SWITCH Model spontaneously responded as a different assistant persona
ZERO_OUTPUT Session produced no substantive output — task not completed
CACHE_EFFICIENCY Low cache hit rate on a Claude model

cc-diagnose — Ollama performance diagnostic

cc-diagnose                        # diagnose last-used model
cc-diagnose devstral-small-2:24b   # diagnose a specific model
claude-controller --diagnose <model>  # same, via the controller

Measures and reports:

  • GPU vs CPU layer split and VRAM usage
  • KV cache size at various num_ctx values vs available VRAM headroom
  • Token speed: prefill (tok/s) and decode (tok/s)
  • Reload cost between requests (cold-start detection)
  • Context cap detection from recent JSONL logs

Design

Two sources of model knowledge, kept strictly separate

Dynamic facts — read live from the Ollama API via ollama_resolver.py:

  • Which models are installed
  • Real num_ctx from the modelfile and native context length from model weights
  • Architecture, parameter size, quantization, capabilities

Curated knowledge — stored in model_profiles.json:

  • tool_compat rating: native / partial / broken
  • known_issues from observed Claude Code session behavior
  • Human notes and recommendations

These are never mixed. model_profiles.json contains no num_ctx values or parameter counts — nothing Ollama already knows. This prevents the two sources from contradicting each other or going stale independently.

Profile lookup chain

For any Ollama model, ollama_resolver.py runs:

  1. Exact match — full model name (e.g. qwen3-coder:latest is rated broken specifically)
  2. Family prefix — progressively shorter prefix matches (qwen3-coder, devstral, llama3...)
  3. Architecture — the general.architecture string from Ollama's model_info (e.g. qwen3, llama)
  4. Unknown fallback — warns the user, proceeds with conservative defaults

A brand new model you just pulled gets a sensible answer based on its architecture even if it has no explicit profile entry.

num_ctx resolution and pinned variants

num_ctx is resolved in priority order:

  1. If the modelfile explicitly sets num_ctx → use that value, warn if below 16K
  2. Use the native context_length from model weights (from model_info), capped at 131072
  3. Absolute fallback: 32768

If the resolved value is missing or below the 16K agentic minimum, the controller automatically creates a pinned Modelfile variant (e.g. devstral-small-2-cc:49152ctx) with num_ctx baked in. This is the only reliable way to set context window size for Claude Code's HTTP requests to Ollama — environment variables like OLLAMA_NUM_CTX are ignored at the API layer.

The VRAM budget for the KV cache is estimated from nvidia-smi free memory minus model weight allocation, and num_ctx is scaled down automatically if it would cause spillage to RAM.

Environment sanitization

Every apply_* function calls sanitize_env() before setting new variables, clearing all backend-related env vars (ANTHROPIC_BASE_URL, ANTHROPIC_AUTH_TOKEN, ANTHROPIC_API_KEY, CLAUDE_MODEL, OLLAMA_KEEP_ALIVE). This prevents stale vars from a previous backend session leaking into the new one.

Pre-flight health checks

Before launching Claude Code, the controller verifies the target endpoint with a two-stage check: TCP connect (port open?) followed by an HTTP health request (process actually responding?). This catches the common failure where a crashed process leaves a socket in TIME_WAIT — the port appears open but no requests are served. A failed pre-flight exits immediately with a clear error, rather than letting Claude Code retry silently for 3 minutes.

LiteLLM configuration

LiteLLM is always configured with drop_params: true, which silently strips Claude-specific parameters (like context_management) that non-Anthropic models reject with HTTP 400. A system prompt (CLAUDE_CODE_SYSTEM_PROMPT) is also injected for non-Claude backends to prevent known failure modes: gpt-4o spawning Claude sub-agents via the Agent tool, shallow Glob-only fallback after Agent failures, and the pages:"" empty-string parameter bug on Read tool calls.

Compatibility ratings

Ratings are derived from real Claude Code JSONL session data:

Rating Meaning
native Full Claude Code tool protocol support. No known issues.
⚠️ partial Works for most tasks. Known issues shown as warnings before launch.
broken Fundamental tool-calling incompatibilities. Must type yes to proceed.

Known issues from session evidence

gpt-4o (session a7385301, 3ae0395b): Emits pages:"" on Read tool calls — Claude Code rejects this every time and the model never self-corrects. Also attempts to spawn Claude sub-agents (claude-sonnet-4-6, claude-opus-4-7) via the Agent tool, which LiteLLM cannot resolve without Anthropic credentials. Both are addressed by the injected system prompt.

gemini-2.0-flash (session 2e393533): Rejected Claude Code's context_management parameter with HTTP 400 — fixed by enforcing drop_params: true.

qwen3-coder:latest (session 84b1c5cc): Hallucinated tool names (ReadFile instead of Read), wrong parameter names, read wrong files, spontaneously became a Google Drive assistant, emitted raw XML as plain text, fixed context cap at 32768 tokens per turn.

devstral-small-2-cc:49152ctx (validated): 100% GPU offload on RTX 3090, 52 tok/s decode, 48K context, no reload cost between turns. Optimal configuration for 24GB VRAM.

claude-sonnet-4-6 (session 97299b69): 100% success rate, 92.2% cache efficiency, 340 output tokens/turn, correct tool usage throughout. The reference baseline.

Adding a new profile

Edit model_profiles.json. Add to exact for a specific model tag, or families for a whole family:

"families": {
  "my-new-model": {
    "tool_compat": "partial",
    "known_issues": [
      "issue_name: Description of the problem and when it manifests."
    ],
    "notes": "Human-readable recommendation.",
    "session_evidence": ["session-uuid-here"]
  }
}

Run cc-health after a test session to gather evidence before writing a profile entry.

Extending LiteLLM model support

Edit the generated config at ~/.cache/agentrun/litellm_config.yaml, or regenerate it:

claude-controller --setup-litellm

The config always includes drop_params: true. Do not remove it.

Runtime files

Path Purpose
~/.config/switchmodel/keys.env API keys for OpenAI, Gemini
~/.cache/agentrun/backend_state.json Last-used backend and model
~/.cache/agentrun/litellm_config.yaml Generated LiteLLM config
~/.cache/agentrun/litellm.log LiteLLM proxy log
~/.claude/projects/**/*.jsonl Claude Code session logs (read by cc-health)

About

claude-controller is a bash script that gives you a single command to switch Claude Code between four AI backends — Ollama (local/free), Anthropic API (your existing login), OpenAI via LiteLLM, and Gemini via LiteLLM — and launches Claude Code automatically after selection. No manual environment variable juggling required.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors