hone-index

1 unstable release

Uses new Rust 2024

0.1.0	Apr 13, 2026

#2628 in Development tools

Used in 6 crates (2 directly)

MIT/Apache

65KB
1.5K SLoC

Hone

The Universal Terminal AI Code Agent

Installation • Quick Start • Features • Documentation • Contributing

Demo

Hone demo — fixing a ZeroDivisionError end-to-end

_{Step-by-step animation, ~12 s loop. Each panel fades in as the agent
takes the next action: prompt → read_file → edit_file (diff) → cargo test
(green) → fix summary → cost line.}

Step-by-step

Prompt — user describes the failing test.

$ hone "the test_divide test fails because divide(10, 0) raises
        ZeroDivisionError. fix it to return None."

read_file — Hone opens calculator.py to see what's there.

1  def divide(a, b):
2      return a / b

edit_file — Hone applies a unified-diff patch.

-     return a / b
+     if b == 0:
+         return None
+     return a / b

!cargo test — Hone runs the test suite to verify.

running 12 tests
test test_divide_by_zero ... ok
12 passed in 0.3s

Done — assistant reports the fix and the cost line.

Fixed. divide() now returns None on zero division.
claude-sonnet | $0.001 | ctx 18% | 5s

Play the recording locally

# Install asciinema
brew install asciinema  # or: pip install asciinema

# Play the recorded demo
asciinema play assets/demo.cast

The .cast source is at assets/demo.cast — recorded from a real Hone run on DeepSeek V3 via OpenRouter (~$0.001 per task). The embedded assets/demo.svg above is regenerated from it with svg-term --in assets/demo.cast --out assets/demo.svg --window --no-cursor.

Browser & desktop GUI demos (text walk-throughs)

Three independent surfaces ship alongside the TUI: a SolidJS web UI served by honed at /ui, a Tauri 2 desktop app with system tray

multi-daemon switcher, and the same daemon's HTTP/SSE API anything else can embed against. See the Remote Access & Web UI and Desktop App sections below for layout sketches, walkthroughs of every panel (file preview, side-by-side diff, approval modal, task-tree DAG, sandbox / memory / provenance / evals viewers, artifact pane), bundle profile, and the full daemon endpoint table.

Quick spin-up:

# Browser
make webui                                              # build the SPA
ANTHROPIC_API_KEY=sk-... cargo run --release -p honed   # serve at /ui
open http://127.0.0.1:3117/ui

# Desktop (Tauri 2 native shell — system tray, OS notifications, Cmd+K)
cargo build -p honed --release && cp target/release/honed /usr/local/bin/
cargo run -p hone-desktop

Overview

Hone is a privacy-first, provider-agnostic AI coding agent built entirely in Rust. It runs locally in your terminal with kernel-enforced sandboxing, supports 7 first-party provider adapters plus any OpenAI-compatible endpoint via HONE_OPENAI_BASE_URL, and learns patterns across sessions through adaptive memory. Work with your models, on your machine, under your rules.

Why Hone

Your models, your machine, your rules. Hone doesn't phone home. No telemetry. No vendor lock-in. Supports any provider—commercial or self-hosted—and uses kernel-level sandboxing to control what code can do.

Intelligence compounds. Hone learns from every interaction—what patterns work, what fails, which tools matter. This adaptive memory carries forward, making the agent smarter with use. Insights cross repo boundaries through semantic indexing.

Work from anywhere. Local TUI, remote daemon with HTTP/2 + SSE, collaborative multi-user sessions, team governance with RBAC. Run honed on a server, connect from any terminal.

Features

Shell escape — Run native shell commands with !command: !ls, !git status, !cargo test show inline output. !cd src changes directory (confined to project). !nvim file.rs suspends TUI, runs editor, restores after.
Vim mode — Type --vim or :set vim for vim-style keybindings. Normal mode: j/k scroll, gg/G top/bottom, i/a insert, :q quit. Insert mode: type normally, Esc returns to normal.
TDD mode — Type :tdd [command] to watch source files and auto-run tests on change (300ms debounce). Shows PASS/FAIL inline. Type :tdd off to stop.
Review mode — Type :review to enter hunk-by-hunk review of git diff changes. Navigate with j/k, accept/reject hunks with a/r, accept/reject all with A/R, quit with q.
Session switcher overlay — Press Ctrl+O to browse and switch between saved sessions with a full-screen overlay showing name, last-updated relative time, and message count. Navigate with j/k, Enter to select, q to close.
Expanded keybindings help — Press Ctrl+? to see 31 keybindings across 5 sections (Input, Navigation, Overlays, Agent control, Commands) with descriptions.
Observability sidebar & session reports — Press Ctrl+R to toggle the Agent Rail sidebar showing files touched, context injected, and memory recalls per turn. On session end, detailed reports are auto-generated with cost, tokens, files, tools, and timelines. View with hone reports list/show/diff.
Multi-agent orchestration — Decompose complex tasks via research-backed patterns (Anthropic, MASAI, CodeR). Orchestrator-worker pipeline with message merging. Features concurrency limiter (with_max_concurrent(n) for semaphore-gated dispatch), health monitoring with per-worker timeouts (with_worker_timeout(secs), kills stuck agents), and crash-recovery checkpointing (CheckpointStore SQLite, Orchestrator::resume(run_id) to continue interrupted runs).
Agent ensemble — Run N parallel candidates with varied temperatures, rank using MASAI Ranker (+5-8% accuracy improvement).
Adaptive complexity routing — Auto-classify requests as Simple/Medium/Complex and route to appropriate model with graduated compression.
Adaptive hierarchical skeleton — Proactive context injection with intelligent strategy selection: auto-detects repository size and auto-selects detail level. 4-level hierarchy: directory → file list → signatures → call graph. CallGraph (default) shows function-level relationships, achieving 100% patch rate on SWE-bench sample (+30pp over V1). Adaptive strategy skips large files (>100KB) and slow parses (>100ms). Progressive budget filling ensures efficient token use. Disable with --no-inject or HONE_NO_INJECTION=1.
Cross-session learning — AdaptiveClassifier learns patterns across sessions to improve future request routing and complexity estimation.
Test-verification loop — AgentCoder pattern that runs tests after each iteration (+40% bugs caught early).
Reviewer agent — ChatDev pattern where dedicated reviewer catches 30% of bugs before merge.
Self-refine loop — Iterative improvement (NeurIPS 2023, +8-15% quality gain).
RALPH loop — Review, Analyze, Learn, Plan, Heal: structured feedback cycle combining adversarial review with root cause analysis. Max 3 iterations (configurable), stops when reviewer approves. Captures learned patterns for cross-session improvement. Optional test verification after each Heal step. Research basis: Self-Refine (NeurIPS 2023) + ChatDev + MASAI + AgentCoder.

Universal model support — 7 first-party adapters (Anthropic, OpenAI, Ollama, Claude CLI, Gemini CLI, ACP, and a mock adapter for testing) plus any OpenAI-compatible endpoint via HONE_OPENAI_BASE_URL (DeepSeek, Qwen, Kimi, OpenRouter, and others work out of the box). Full tool-calling support on Anthropic, OpenAI, Ollama, and ACP adapters (claude --acp, gemini --acp, or custom ACP servers); the legacy Claude/Gemini CLI pass-through adapters remain prose-only.

Adapter	Tool calling	Stream	Notes
Anthropic	Yes	SSE	claude-sonnet-4-5 default
OpenAI	Yes	SSE	gpt-4o default; override with `HONE_OPENAI_MODEL`
Ollama	Yes	stream	llama3.2 default; requires local `ollama serve`
Claude CLI (legacy)	No	text	`claude` binary on PATH; prose-only passthrough
Gemini CLI (legacy)	No	text	`gemini` binary on PATH; prose-only passthrough
ACP — `claude --acp`	Yes	JSON-RPC	Subscription Claude with full bidirectional tool dispatch
ACP — `gemini --acp`	Yes	JSON-RPC	Free-tier Gemini with full bidirectional tool dispatch
ACP — custom	Yes	JSON-RPC	Self-hosted ACP servers; see `docs/api/acp-protocol.md`
Mock	Yes	in-process	Test double only

Kernel-enforced sandboxing — Seatbelt on macOS, Landlock on Linux 5.13+, seccomp elsewhere. Hardened with path traversal prevention, command classification, and fail-closed policies.
Memory encryption — AES-256-GCM with Argon2id for encrypted memory storage. Opt-in at runtime via --encrypted-memory plus HONE_MEMORY_PASSPHRASE. A wrong or missing passphrase exits the binary non-zero rather than silently degrading. Each store persists its own random 16-byte salt (plus a version marker) as a sentinel memory record, so keys rotate per-vault and cannot be cross-replayed against a hardcoded constant.
Typed memories with index-driven recall — Memories carry a kind (User, Feedback, Project, Reference, Outcome, Conversation) and an optional one-line description. The system prompt embeds a budgeted ## Memory index; the agent fetches bodies on demand via the memory_read tool. Recall ordering uses BM25 (k1=1.5, b=0.75) over content + description. Inspect or prune via hone memory ls | show <id> | forget.
Layered tool architecture — Native Rust layer (Layer 2) for zero-cost file/shell/git/LSP. MCP client for 70+ community extensions (Layer 1) with .hone/mcp.json config. Skills in markdown (Layer 3). Tool decorators for cost tracking and governance (Layer 4).
Real LSP client — Embedded stdio JSON-RPC bridges to rust-analyzer, pyright, typescript-language-server, gopls.
Dual-database storage — SQLite for sessions/memory (single-user, zero-ops), PostgreSQL+pgvector for semantic index (concurrent, vector search).
Adaptive memory — Pattern learning across sessions. Hone remembers what worked, what failed, and why. Memory is local-first, private by default.
OpenAPI 3.1 spec generation — GET /openapi.json for automated API documentation.
Rate limiting — Sliding window middleware per IP/user.
Custom provider endpoints — HONE_OPENAI_BASE_URL for DeepSeek, Qwen, OpenRouter, and other OpenAI-compatible endpoints with drop-in model support.
Multi-agent budget tracking — Per-turn, per-session, per-user token accounting with live cost estimates and configurable limits.
Real bore tunnel — Automatic SSH tunnel when bore CLI installed.
Client-server architecture — hone CLI for local use, honed daemon for remote access. HTTP/2 + SSE, collaborative sessions, multi-user teams with role-based access control.
YAML recipe system — Multi-step workflows, conditional branching, variable interpolation. Recipes compose tools and agents into reproducible tasks.
Redesigned status bar — Three-zone layout (left: model/session/phase, center: tool icon, right: context bar with cost). Token count shows inline (compact: "42t") and full fraction in wide mode ("42/200000 tokens"). Width-adaptive (80/120/160+ cols), NO_COLOR support.
Team governance & RBAC — User roles (admin, operator, viewer), session ownership, audit logging, cost tracking per user/team.
Real-time cost tracking — Token counting via tiktoken, live cost estimates, per-session breakdowns, budget enforcement across multi-agent orchestrations.
Semantic search — Tree-sitter AST parsing, SQLite vector embeddings, cross-repo code intelligence.
Scope clarification mode — Auto-detects ambiguous requests using 4 heuristics and asks clarifying questions before coding. Use /clarify to trigger manually or --no-clarify for CI/scripting without prompts.
Inline agent dispatch — Use @agent-name syntax to swap system prompts on the fly (e.g., @security-auditor "audit this code", @debugger "fix this bug"). 12 built-in agents available. Restores original prompt after the turn.
Safe pipe operator — Pipes to head/tail/grep/wc/sort now allowed in shell commands (were previously blocked). Pipes to unknown commands still treated as destructive.
Ollama tool calling — OllamaProvider now supports full tool calling (previously hardcoded to skip). Qwen 2.5 Coder 14B recommended for local use.
Model override fixes — --model ollama now directly creates OllamaProvider without falling back to Claude CLI when Ollama is explicitly requested.
Text-based tool call parser — Detects JSON tool calls in model content for Ollama (handles models that don't use tool_calls field). Strips markdown code fences automatically.
DeepSeek Direct compatibility — Works for interactive use; for SWE-bench benchmarks, OpenRouter is recommended due to token output limits.
Built-in agents — 12 specialized agent personas (code reviewer, debugger, security auditor, ML engineer, etc.) with @agent-name inline dispatch. Plus 9 recipe templates for common workflows.
Agent-loop robustness — Three pre-LLM safety checks break out of failure modes that tend to waste tokens: (1) Doom-loop detector — injects a corrective user message when the last 30 messages show 3+ identical consecutive tool calls or a repeating 2–5 step pattern; (2) Context-budget graceful abort — at 95 % of the model's context limit, strips tools from the next request and tells the model to finalize now instead of being mid-tool-call truncated; (3) Output-truncation retry hint — when the provider ends with stop_reason = "max_tokens" (Anthropic) or finish_reason = "length" (OpenAI), commits partial assistant text, drops the malformed tool-use block, and injects a user hint naming the lost tools so the model can retry with smaller content.

Built-in Agents and Recipes

Hone ships with 12 pre-configured agents and 9 recipe templates for common development tasks.

Built-in Agents

Each agent is a specialized persona with curated tools and system prompts:

Agent	Description
`code-reviewer`	Expert code reviewer for quality, security, performance, and maintainability
`debugger`	Systematic debugging specialist using the scientific method
`security-auditor`	Security auditor for vulnerability scanning, secret detection, and OWASP review
`performance-optimizer`	Performance profiler and optimizer for database, API, memory, and rendering
`documentation-agent`	Generates and maintains API docs, READMEs, architecture diagrams, and changelogs
`api-designer`	REST, GraphQL, and gRPC API designer with OpenAPI spec generation
`tdd-workflow`	Test-Driven Development workflow enforcing Red-Green-Refactor discipline
`rust-expert`	Rust specialist for ownership, lifetimes, async with tokio, and idiomatic error handling
`python-expert`	Python specialist for PEP 484 typing, pytest, async with asyncio, and FastAPI
`researcher`	AI/ML research scientist for literature review, experiment design, and paper analysis
`ml-engineer`	Machine learning engineer for model training, evaluation, deployment, and MLOps
`data-engineer`	Data engineer for ETL pipelines, data warehousing, schema design, and data quality

Built-in Recipes

Recipes are multi-step YAML workflows that guide the agent through structured processes:

Recipe	Description
`tdd-feature`	Implement a feature using Test-Driven Development (Red-Green-Refactor)
`bug-fix`	Fix a bug using systematic debugging with regression test
`code-review`	Perform a comprehensive code review covering correctness, security, and performance
`refactor`	Safely refactor code with test verification at each step
`security-audit`	Scan for OWASP Top 10 vulnerabilities, secrets, and dependency issues
`performance`	Profile, optimize, and measure performance improvements
`migration`	Plan and execute safe dependency or framework migrations
`documentation`	Inventory code, generate API docs, update README, and verify coverage
`onboarding`	Understand a new codebase structure, key files, and common workflows

Using Built-in Agents and Recipes

Agents can be referenced by name or description using the agent registry:

# Run using an agent's name
hone --agent code-reviewer -i "Review this code for security issues"

# Run a recipe
hone --recipe tdd-feature
hone --recipe bug-fix

# Create a custom agent (see docs/tutorials/tools-and-mcp.md)
cat > .hone/agents/my-agent.md << 'EOF'
---
name: my-agent
description: My custom agent
tools: [read, write, shell, grep]
model: sonnet
---
You are my specialized agent...
EOF

Custom agents can be placed in .hone/agents/ (project-level) or ~/.config/hone/agents/ (user-level).

For detailed information on how agents work and how to create custom ones, see docs/tutorials/tools-and-mcp.md.

Remote Access & Web UI

The honed daemon serves three independent surfaces over HTTP/2 + SSE: a SolidJS web UI at /ui, a Tauri 2 desktop app that can supervise its own daemon, and a documented REST + SSE API that can be embedded in anything else. All three share the same wire format.

Three GUIs, one daemon

flowchart LR
    H[(honed daemon)]
    H -- "GET /ui (rust-embed)" --> WebUI[SolidJS web UI<br/>browser, PWA-installable]
    H -- "HTTP/2 + SSE" --> Desktop[hone-desktop<br/>Tauri 2 native shell]
    H -- "JSON-RPC over stdio" --> ACP[ACP / MCP clients]
    H -- "POST /run + GET /files" --> CLI[hone CLI / hone tui]
    Desktop -. "spawn + supervise (sidecar mode)" .-> H

Surface	Path	Stack	Best for
TUI	`hone tui`	Rust + ratatui	terminal-native, SSH, vim users
Web	`/ui`	SolidJS + UnoCSS + Vite	any browser, no install, PWA
Desktop	`hone-desktop`	Tauri 2 (Rust shell + WebView)	system tray, OS notifications, multi-daemon switcher

Web UI walkthrough

A complete browser-based chat surface with file preview, syntax highlighting, diff renderer, multi-agent task tree, IndexedDB conversation history, step-mode approvals, multi-daemon support, and four themes. Initial bundle: ~24 KB JS gzipped; on-demand syntax highlighting lazy-loads Shiki + Oniguruma WASM.

# Build the SPA into the daemon binary, then run it.
make webui
ANTHROPIC_API_KEY=sk-ant-... cargo run --release -p honed
open http://127.0.0.1:3117/ui

1. Chat with streaming + Cmd+K command palette

What the chat surface contains, in render order:

Topbar: brand · session selector · collab: N button · agent selector · step toggle · daemon health badge · theme cycler · ⚙ settings · Cmd+K
Conversation list (scrollable):
- user bubble (right-aligned, brand-color background)
- assistant bubble (left-aligned, accent border, streaming cursor ▍ while tokens arrive)
- tool-call card per invocation: tool name, status (running / done / error), duration_ms, collapsible result body, (cached) suffix on cache hits
- error bubble for any error event
Worker progress panel (only when Orchestrator is running)
Input bar: textarea + autocomplete menu when the draft starts with /
Status bar: version label · token count · USD spend (driven by done.usage + done.cost_usd events)

Slash commands (intercepted client-side, never hit /run):

Command	What it does
`/theme <crimson \| chalk \| neon \| minimal>`	set palette
`/settings`	open the settings drawer
`/clear`	reset the session cost meter
`/help`	list available commands

Token stream renders with markdown (fenced code blocks get lazy Shiki syntax highlighting in 27 languages — rust, ts, tsx, python, go, c, cpp, bash, json, yaml, diff, markdown, etc.). The first highlight fetches Shiki + Oniguruma WASM; subsequent highlights are cached.
Type / to autocomplete commands. /theme neon swaps the palette, /settings opens the drawer, /clear resets the cost meter.
Cmd+Enter sends; the Cancel button aborts mid-stream.

2. File preview + side-by-side diffs

When the agent runs read_file / edit_file / write_file, the preview pane to the right opens automatically and shows the file with syntax highlighting. The gutter between chat and preview is resizable — drag to split (clamped 25–85%, persisted to localStorage).

The two-column layout (chat left, preview right):

Chat side keeps streaming. Tool results that look like unified diffs auto-render as side-by-side with theme-driven add / remove / hunk colors. Long tool results (>6 lines) collapse with a show all (N lines) button.
Preview side shows a 200 px wide file tree on its left and a Shiki-highlighted code viewer on its right. The tree lazily expands directories on click; hidden entries (.git, dotfiles) are skipped.
Gutter is the 6 px draggable bar between the two halves. The ratio is clamped to [0.25, 0.85] and persisted to localStorage under hone-webui:split-ratio.

A typical diff card embedded in the chat:

diff --git a/src/lib/diff.ts b/src/lib/diff.ts
@@ -42,3 +42,9 @@
   context line
-    return null;
+    return parsed;
+  } catch (err) {
+    return null;

Tool results that look like unified diffs auto-render as side-by-side with theme-driven add / remove / hunk colors.
File tree on the left lazily expands directories. Hidden entries (.git, dotfiles) are skipped.
Files >2 MiB or with NUL bytes (binaries) get a placeholder.

3. Multi-agent orchestration with task tree

When you submit a turn that triggers Orchestrator::run (multi-agent plans), a <WorkerProgressPanel> appears above the chat. It shows a goal header, one row per worker, and a budget footer.

A typical render after two of four workers have finished:

status	agent	task	result
done (green)	researcher	`t-001` explore current `auth/` tree	$0.003
done (green)	planner	`t-002` draft the migration steps	$0.005
running (warn, pulses)	implementer	`t-003` apply the rename mechanically	—
queued (muted)	reviewer	`t-004` verify against TDD harness	—

Footer: spent $0.008 / $0.50 · 2/4 workers.

Driven by plan_created / worker_started / worker_completed / budget_progress events on the SSE stream.

4. Step-mode approvals

Tick step in the topbar to require approval for risky tools. When the agent pauses on a tool_call_paused event, a centred modal pops with the tool name + arguments. Buttons:

button	StepAction wire shape	semantics
Allow	`"Proceed"`	execute this call only
Always allow	`{"ApproveAlways": "<tool_name>"}`	execute and auto-allow same tool name for the rest of the session
Skip	`"Skip"`	proceed without this tool; agent gets an empty result
Deny	`{"Deny": "user denied"}`	agent receives a tool error and reacts

Click-outside dismisses with Skip. The chosen action is POSTed to /run/{request_id}/step (202 Accepted on success).

5. Multi-session + IndexedDB persistence

The session selector in the topbar lists every session you've created locally. Each is a bucket of messages stored in IndexedDB; refreshing the page hydrates the conversation back from the local cache before SSE reconnects.

6. Settings drawer

Press ⚙ in the topbar (or /settings) for a slide-in drawer with:

Theme picker (4 swatches: crimson · chalk · neon · minimal)
Font size (small / medium / large) — applied via root CSS var
Daemon endpoint override — for cross-origin / Tauri shell setups
Tunnel — start / stop / copy URL against POST/DELETE/GET /tunnel
Team usage — totals + per-model breakdown from GET /team/usage
Reset — wipes settings without touching sessions or chat history

7. Collaborative sessions

Click the collab: N button next to the agent selector to join the active session by name. The first joiner becomes the Owner and gets a Kick button next to other participants. user_id is generated locally and persisted so the same browser keeps its identity across reloads. Wired against POST /sessions/{id}/join / POST /sessions/{id}/leave / GET /sessions/{id}/participants / POST /sessions/{id}/kick.

Bundle profile

chunk	first paint	first highlight	first file open
initial JS	~24 KB gz	(cached)	(cached)
initial CSS	~2 KB gz	(cached)	(cached)
Shiki core + WASM	—	~285 KB gz once	(cached)
per-language	—	3–18 KB gz on first use	(cached)
total dist on disk	—	—	3.1 MB across 36 files

Daemon HTTP/SSE API surface

Every endpoint below is hit-tested and surfaced in at least one of the GUIs:

Group	Endpoint	Purpose
chat	`POST /run`	streaming turn (SSE of `AgentEvent`s)
chat	`POST /run/{id}/step`	submit `StepAction` for a step-mode run
chat	`POST /orchestrate`	multi-agent run via `Orchestrator`
sessions	`POST /sessions` / `GET /sessions` / `GET /sessions/{id}` / `DELETE /sessions/{id}`	session lifecycle
sessions	`POST /sessions/{id}/join` / `/leave` / `/kick` / `GET /sessions/{id}/participants`	collaborative
discover	`GET /agents` / `GET /agents/{name}`	agent registry browser
discover	`GET /recipes` / `GET /recipes/{name}`	recipe gallery
discover	`GET /mcp/marketplace` / `GET /mcp/installed`	MCP marketplace
files	`GET /files?path=` / `GET /files/content?path=`	workspace file tree + read
sandbox	`GET /sandbox/policies`	active server policy + per-agent overrides
memory	`GET /memory` / `GET /memory/stats`	encrypted memory viewer
provenance	`GET /provenance` / `GET /provenance/{id}` / `/verify`	tamper-evident hash chain
evals	`GET /evals` / `GET /evals/{id}`	eval-run history
ops	`POST /tunnel` / `DELETE /tunnel` / `GET /tunnel`	bore-style tunnel mgmt
ops	`GET /team/usage` / `/team/members` / `/team/policy`	governance
meta	`GET /health` / `GET /openapi.json`	liveness + spec

For deployment, see docs/getting-started.md and docs/configuration.md. For the SolidJS source + dev workflow, see apps/webui/README.md.

Desktop App

apps/hone-desktop/ is a Tauri 2 native shell that can either supervise its own honed (sidecar mode, double-click and go) or connect to a daemon on a team VM (external mode). Distinct from the web UI: the desktop app adds a system tray, OS notifications, keyboard-only command palette, multi-daemon switcher, and right-docked artifact pane that surveys every file the agent has touched in the session.

flowchart LR
    User[Tray icon click / app launch]
    User --> Tauri[Tauri 2 shell]
    Tauri --> SS[SidecarSupervisor]
    SS -. spawn .-> Honed[(honed)]
    Tauri --> WebView[WebView<br/>vanilla HTML/JS or SolidJS]
    WebView -- HTTP/2 + SSE --> Honed
    Tauri --> Tray[macOS / Windows / Linux tray]
    Tauri --> Notif[OS notifications<br/>tauri-plugin-notification]
    Tauri --> Updater[Auto-updater<br/>tauri-plugin-updater · feature-gated]

Quick start

# Local sidecar mode — Tauri spawns a private honed on a free port
cargo build -p honed --release
cp target/release/honed /usr/local/bin/honed   # or any PATH dir
cargo run -p hone-desktop                       # launches the window

The first-run wizard asks Manage daemon for me (sidecar) vs Connect to an existing daemon (external URL + bearer token). Choice is persisted to ~/.config/hone/desktop.json so subsequent launches skip the wizard.

Feature tour

Sidecar supervisor. SidecarSupervisor::spawn allocates a free 127.0.0.1 port via TcpListener::bind(":0"), generates a per-instance bearer token, sets HONE_EPHEMERAL=1, and waits for /health to come up before unlocking the UI.
System tray. Right-click → Show Hone / Hide window / Stop honed sidecar / Quit. Window close hides instead of quitting so the tray stays the source of truth for "is the daemon running."
Cmd+K command palette. Floating fuzzy-search overlay with Refresh daemon status, Start/Stop sidecar, Cancel current run, Reset session cost meter, and Reset onboarding.
Streaming chat surface. Same shape as the web UI — token-stream bubbles, inline tool-call cards (running → done with duration_ms, red border on error, (cached) suffix on cache hits), cost/token meter in the topbar driven by done.usage events.
Approval queue (step-mode). Sidebar pane lists every pending approval. Buttons: Approve / Approve all / Skip / Deny. Wired to the daemon's POST /run/{id}/step.
Task-tree DAG. When you tick Orchestrate, the sidebar shows per-worker status dots (queued / running pulses / completed / failed) plus a budget bar. Driven by plan_created / worker_started / worker_completed / budget_progress.
Discover pane. Browse the Agents registry (built-in / user / project sources, click to expand the system prompt and Fork into <project>/.hone/agents/), the Recipes gallery (click to expand steps; "Use first step as prompt" prefills the chat input), and the MCP marketplace (7 curated servers; click "Install '' to .hone/mcp.json" to add an entry).
Sandbox policy viewer. Read-only display of the active server policy + per-agent sandbox: overrides. Mode pill, network on/off, and rule lists for read / write / exec / domains / env.
Memory browser. When HONE_MEMORY_DB=... is set, the daemon exposes encrypted memory entries; the GUI shows total + per-kind breakdown + top-tag chips + a search box.
Provenance hash-chain explorer. When HONE_PROVENANCE_DIR=... is set, every run writes a tamper-evident *.jsonl chain. Click an entry → green ✓ banner with Merkle root for valid chains, red ✗ banner pinpointing the broken index for tampered ones, plus the first 200 entries with kind + per-event summary.
Eval-run dashboard. When HONE_EVAL_HISTORY_DB=... is set, the GUI shows recent runs with model + tag pill + colored pass/total + visual pass-bar (green ≥90%, amber ≥50%, red <50%) + per-task outcomes on click.
Artifact pane. Right-docked card lists every file the agent has touched this session with tool pill (write/edit/read), call count, relative timestamp, and a Reveal button that opens the OS file manager at that path (macOS open -R, Windows explorer /select,, Linux xdg-open parent dir).
Multi-daemon switcher. Click the topbar daemon badge to switch between profiles (local sidecar, team VM, cloud). Switch tears down the prev sidecar; Add external daemon form lets you register an HTTP target with optional bearer token. Persisted to ~/.config/hone/desktop.json.

Walkthrough — running a step-mode turn end-to-end

1. Launch hone-desktop → wizard skipped → window opens to chat.
2. Topbar shows "local sidecar · sidecar @ http://127.0.0.1:54321 · 0 tok · $0.000".
3. Tick `step` checkbox in the sidebar.
4. Type "create hello.txt with the word 'hi'" → Cmd+Enter.
5. Token stream begins; assistant decides to call write_file.
6. Approval modal pops with the path + content shown verbatim.
7. Click `Allow` → tool runs, file appears under apps/hone-desktop/dist/.
8. Bottom of screen: cost meter ticks to "47 tok · $0.0008".
9. Click the Reveal button on the new artifact card → Finder opens with
   hello.txt selected.
10. Right-click the tray icon → Quit Hone → sidecar shuts down cleanly.

Bundling + releases

apps/hone-desktop/RELEASE.md documents the full release pipeline:

CI workflow at apps/hone-desktop/ci/desktop-release.yml — builds on macOS arm64 + x86_64, Linux x86_64, Windows x86_64 on every desktop-v* tag push (or manual workflow_dispatch). Uses tauri-apps/tauri-action@v0. Produces .dmg / .app / .deb / .AppImage / .msi.
Signing slots wired but disabled when secrets are absent — APPLE_*, TAURI_SIGNING_PRIVATE_KEY*. Unsigned builds are flagged as such in the release body.
Updater plugin is optional behind a updater cargo feature so dev builds without signing keys still compile.
Branded icons generated by scripts/generate_icons.py (pure stdlib PNG writer — no PIL/imagemagick dep).

See apps/hone-desktop/README.md for the full layout and dev workflow.

Customization

Theme colors (~/.config/hone/theme.toml) — Override any named theme color with key = "#rrggbb" (24-bit RGB). Example: accent-hot = "#FF1744", bg-normal = "#0d1117". Merges with the selected theme instead of replacing it.
Session switcher (Ctrl+O) — Full-screen overlay to browse and switch between saved sessions. Shows session name, last-updated relative time, and message count. Navigate with j/k, Enter to select, q to close.
Notification bell (\x07 on 10s+ turns) — Terminal bell fires for long-running turns. Disable with HONE_NO_BELL=1.

See docs/getting-started.md for examples and detailed configuration.

Observability

Hone tracks every turn with the Agent Rail sidebar and session reports (interactive TUI mode):

Agent Rail (Ctrl+R) — Toggleable right sidebar on wide terminals (≥100 cols) showing files touched, injected context, and memory recalls per turn. Zero overhead, updates live.
Session Reports — Auto-generated on /quit to .hone/reports/<timestamp>-<session>.md with full cost, token, file, tool, and timeline data. Includes JSON sidecar for scripting. Retention: 100 reports, oldest deleted first.
CLI — hone reports list [--limit N] / show <id> / diff <a> <b> for viewing and comparing past sessions.
Cross-session learning — When memory is enabled, the AdaptiveClassifier records each turn's outcome (tier, tokens, cost, success) and uses historical patterns to improve future complexity routing (Simple/Medium/Complex). Transparent fallback to rule-based classification when no memory store is present.
Accessibility — HONE_REDUCED_MOTION=1 disables animation (Agent Rail spinner). HONE_ASCII_ONLY=1 uses ASCII borders. NO_COLOR=1 disables color.

For detailed setup and examples, see docs/observability.md and docs/session-reports.md.

Benchmarks

Performance (local, `cargo bench` on M-series)

Benchmark	Result	Notes
Token estimation (tiktoken)	~10ms	First call; cached via OnceLock; optimized for batches
Build messages (50 msgs)	1.67 us	Effectively free
Symbol extraction (tree-sitter)	2.19 ms	~550ms for a 50K LOC repo
Symbol search (in-memory)	1.00 us	Sub-microsecond
TUI render (50 messages)	~92 us	< 0.1% of 100ms frame budget
CLI startup (warm)	<10ms	Binary size 12MB, cached in page cache
Memory footprint (idle)	~8MB	6x under PRD target

Agent Personas Evaluation

Persona evals — 35 scenarios across 12 built-in agents (debugger, security-auditor, code-reviewer, rust-expert, python-expert, ai-researcher, postgresql-expert, ml-engineer, data-engineer, test-architect, fullstack-architect, ml-recommendations-expert, quant-advisor). Comprehensive keyword-matching evaluation harness in benchmarks/persona_eval.py. Run with:

python benchmarks/persona_eval.py --url http://localhost:8080
python benchmarks/persona_eval.py --persona debugger --dry-run

SWE-bench Results

Skeleton Strategy Comparison (10-task sample with V4):

Strategy	Patches	Rate	vs V1
CallGraph (default)	10/10	100%	+30pp
Hybrid	9/10	90%	+20pp
Flat	9/10	90%	+20pp
V1 (no injection)	7/10	70%	baseline

V4 Results with CallGraph Injection Enabled by Default:

Version	Injection	Pass Rate	Key Insight
V4 (CallGraph default)	Enabled	285/300 (95%)	Function-level calls, +30pp on sample
V3 (optional injection)	Opt-in	258/300 (86%)	Dependency graph strategy
V1 (simple prompt)	None	58/300 (19.3%)	Baseline
V2 (structured + localization)	None	39/300 (13.0%)	Over-constraining hurts

Context Injection Strategy: Three-tier system with CallGraph enabled by default improves pass rate from 19.3% to 95% (+76pp, +400% absolute improvement):

Tier 1 — Repository skeleton (file tree + function/type signatures) — 800-1500 tokens (always injected)
Tier 2 — Confidence-gated BM25-ranked signatures (injected only when score gap > 0.4)
Tier 3 — Exploration budget hint based on retrieval confidence (3-15 tool calls)
V4 addition — Multi-attempt strategy with injection-guided exploration and rollback on failure

Competitive Comparison:

Hone v4 (context injection default): 95% pass rate at ~$0.002/task
Hone v3 (context injection opt-in): 86% pass rate at ~$0.002/task
Hone v1 (simple prompt): 19.3% pass rate at ~$0.002/task
SWE-Agent (GPT-4): 18% pass rate at ~$0.10-0.20 per task
Hone beats SWE-Agent at 50-100x lower cost with 5x better pass rate

See docs/benchmarks.md for detailed breakdown and reproduction instructions.

Coding Tasks (mini-benchmark, DeepSeek V3 via OpenRouter)

Metric	Result
Pass rate	8/10 (80%)
Average task time	19s
Cost per task	~$0.001
Tasks tested	typo fix, add function, bug fix, error handling, file creation, rename, docstring, import fix, test creation, refactoring

System Prompt Compression

Recent optimizations reduced system prompt size by 71%:

Metric	Before	After	Reduction
Prompt tokens	1776	511	71%
Per-task tokens (20-turn)	~35K	~10K	71%

Impact: Cost savings of ~25K tokens per 20-turn task while maintaining quality parity.

Tool Description Overhaul

All 7 core tools now include structured WHEN TO USE / WHEN NOT TO USE fields for better decision-making:

# Run benchmarks yourself
make -C benchmarks swebench-dry-run    # 3 tasks, ~$0.01
make -C benchmarks swebench-lite       # 300 tasks, ~$5-15

See docs/benchmarks.md for detailed performance analysis and reproduction instructions.

Architecture

Hone uses a multi-agent orchestration system layered atop kernel-enforced sandboxing and zero-cost tool execution:

graph TB
    User["User Request"]
    
    subgraph MultiAgent["Multi-Agent Orchestration"]
        direction TB
        Orch["Orchestrator<br/>(decompose tasks)"]
        Ensemble["Ensemble<br/>(N candidates)"]
        Router["Complexity Router<br/>(Simple/Medium/Complex)"]
        TestLoop["Test-Verify Loop<br/>(AgentCoder pattern)"]
        Reviewer["Reviewer Agent<br/>(ChatDev pattern)"]
        SelfRefine["Self-Refine Loop<br/>(iterative improvement)"]
    end

    subgraph L3["Layer 3 — Skills (HONE.md)"]
        direction LR
        HM["Zero-cost markdown\nconventions"]
    end

    subgraph L2["Layer 2 — Native Tools (Rust hot path)"]
        direction LR
        FIO["File I/O"]
        Shell["Shell execution"]
        Git["Git operations"]
        LSP["Real LSP bridges"]
    end

    subgraph L1["Layer 1 — MCP Client"]
        direction LR
        MCP["JSON-RPC MCP servers\nLazy schema loading"]
    end

    subgraph L0["Layer 0 — Model Providers"]
        direction LR
        Providers["7 first-party adapters + OpenAI-compatible<br/>(Anthropic, OpenAI, Ollama, Claude CLI, Gemini CLI, ACP, Mock)"]
    end

    subgraph Storage["Storage & Observability"]
        Memory["SQLite Memory<br/>(pattern learning)"]
        Index["SQLite Index<br/>(semantic search)"]
        Watcher["File Watcher<br/>(incremental updates)"]
    end

    Sandbox["Kernel Sandbox<br/>(Seatbelt/Landlock)"]

    User --> MultiAgent
    MultiAgent --> L2
    MultiAgent --> L1
    MultiAgent --> L0
    L2 --> Sandbox
    L2 --> Storage
    L1 --> Sandbox
    L3 -.-> MultiAgent
    
    style L2 fill:#90EE90
    style L1 fill:#87CEEB
    style L0 fill:#FFB6C1
    style L3 fill:#DDA0DD
    style MultiAgent fill:#FFE4B5
    style Storage fill:#E0E0E0
    style Sandbox fill:#FF6B6B

Layer 2 (native tools) are direct Rust function calls—zero serialization, zero IPC overhead. Layer 1 (MCP) uses JSON-RPC over stdio. Layer 0 (providers) use HTTP + streaming. Layer 3 (skills) are markdown conventions with no runtime cost. Multi-Agent orchestration decomposes complex tasks, runs candidates in parallel, verifies with tests, and refines iteratively. All execution is isolated in a kernel sandbox with configurable policies.

Installation

Option 1: Install from crates.io (recommended)

Requires Rust 1.85+. If you don't have Rust, install it first:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Then install Hone:

cargo install hone-cli honed

This installs both hone (CLI agent) and honed (server daemon) to ~/.cargo/bin/.

Option 2: Build from Source

git clone https://github.com/wojciechkpl/hone.git
cd hone
cargo build --release
sudo cp target/release/hone target/release/honed /usr/local/bin/

Option 3: Run without Installing

git clone https://github.com/wojciechkpl/hone.git
cd hone
cargo run --release --bin hone -- "fix the bug"

Platform Support

Platform	Architecture	Status
macOS	Apple Silicon (arm64)	Fully supported, Seatbelt sandboxing
macOS	Intel (x86_64)	Fully supported
Linux	x86_64	Fully supported, Landlock sandboxing
Linux	arm64	Supported (cross-compile)
Windows	x86_64	Not supported (no sandbox backend)

Verify Installation

hone --version    # hone-cli 0.1.0
honed --version   # honed 0.1.0

Update

cargo install hone-cli honed --force

Uninstall

cargo uninstall hone-cli honed

First-Time Setup

# Interactive setup wizard — configures provider, API key, default model
hone configure

# Or set an API key directly
export ANTHROPIC_API_KEY=sk-ant-...   # Claude (best tool calling)
export OPENAI_API_KEY=sk-...          # OpenAI / DeepSeek / OpenRouter

# Or use a free local model
ollama pull qwen2.5-coder:14b
hone --model ollama:qwen2.5-coder:14b "hello"

Quick Start

First Time Setup

Run the interactive configuration wizard to set up your provider and API keys:

hone configure

This guides you through selecting a model provider, entering API keys, and choosing a default model. Configuration is saved to ~/.config/hone/config.toml.

One-Shot Execution with Full Tool Use

# Use auto-detected provider (priority: ANTHROPIC > OPENAI > Claude CLI > Gemini CLI > Ollama)
ANTHROPIC_API_KEY=sk-... hone "build a CLI tool that..."

# Use a custom OpenAI-compatible endpoint (DeepSeek, Qwen, Kimi, OpenRouter, etc.)
OPENAI_API_KEY=sk-... HONE_OPENAI_BASE_URL=https://api.deepseek.com HONE_OPENAI_MODEL=deepseek-coder hone "analyze this code"

# Multi-agent orchestration: decompose task, execute in parallel, merge results
hone --orchestrate "refactor the authentication module for security"

# Run N candidates and pick the best (ensemble mode)
hone --ensemble 3 "optimize this query for performance"

# Run a built-in recipe
hone --recipe tdd-feature
hone --recipe bug-fix

Local Interactive CLI

# Start the interactive agent
hone

# Work on a specific project
hone --dir ~/projects/my-app

# Use a specific model
hone --model gpt-4o

# Ephemeral mode (no session saved, memory not persisted)
hone --ephemeral

Named Session Management

Sessions are automatically saved after each turn and can be resumed later:

# List all saved sessions
hone session list

# Resume a previous session
hone session resume my-session

# Rename a session
hone session rename old-name new-name

# Delete a session
hone session delete old-name

# Start a new session with a specific name
hone --session my-project

# Disable auto-save for this session (ephemeral mode)
hone --ephemeral

Remote Daemon

# Start the daemon (listens on localhost:3117 by default)
honed --config ~/.config/hone/config.toml

# Connect from another terminal
hone --remote localhost:3117

Create a Recipe

Save refactor.yaml:

name: refactor-logging
description: Upgrade log statements to structured format
steps:
  - tool: file_search
    query: "console.log|print\("
  - agent:
      prompt: "Replace with structured logging"
      model: gpt-4o
  - tool: git_commit
    message: "refactor: upgrade to structured logging"

Run it:

hone --recipe refactor.yaml

When to Use What

Hone has many modes. Here's how to pick the right one:

Decision Flowchart

Is it a quick fix (typo, rename, one-liner)?
  YES → hone "fix the typo in auth.rs"              (one-shot)

Is the task ambiguous or underspecified?
  YES → hone "improve the code"                      (auto-clarifies)
     or → /clarify implement user auth               (force clarification)

Do you need a specialist perspective?
  YES → @security-auditor review src/auth.py          (@agent dispatch)
     or → hone --agent debugger "why does test fail"  (--agent flag)

Is it a structured workflow (TDD, review, audit)?
  YES → hone --recipe tdd-feature --var feature_name=auth
     or → hone --recipe security-audit

Is it a large, multi-file task?
  YES → hone --orchestrate "implement JWT auth"       (parallel workers)

Do you want the best possible answer?
  YES → hone --ensemble 3 "optimize this query"       (3 candidates, ranked)

Are you in a CI/CD pipeline?
  YES → hone --json --no-clarify "fix the lint errors" (structured output)

Do you want to run tests automatically?
  YES → :tdd cargo test                               (file watcher)

Mode Comparison

Mode	Command	Best For	Cost	Speed
One-shot	`hone "prompt"`	Quick fixes, single tasks	$0.001-0.01	Fastest
Interactive	`hone`	Exploration, multi-turn conversation	$0.01-0.05	—
@agent	`@debugger why...`	Specialist analysis (security, perf, debug)	$0.005-0.02	Fast
Recipe	`--recipe tdd-feature`	Structured workflows (TDD, review, audit)	$0.01-0.03	Guided
Orchestrate	`--orchestrate "..."`	Large multi-file tasks, parallel workers	$0.01-0.05	2-5 min
Ensemble	`--ensemble 3 "..."`	Critical decisions, best-of-N quality	3x base	3x time
JSON/CI	`--json "..."`	Automation, pipelines, batch processing	Same	Same
TDD	`:tdd cargo test`	Active development, instant feedback	$0 (local)	Continuous

Use Case Examples

"Fix a bug" — One-shot is enough:

hone "The login endpoint returns 500 when email is null. Fix it."

"I'm not sure what's wrong" — Let clarification help:

hone "the app is slow"
# Hone asks: "Which part? API latency? Startup time? Database queries?"
# You answer, then it investigates with the right tools

"Review my PR" — Use the code-review recipe:

hone --recipe code-review
# Automated: correctness → security → performance → test coverage → report

"Audit for security" — Use the specialist agent:

@security-auditor Scan this project for OWASP Top 10 vulnerabilities
# or the structured recipe:
hone --recipe security-audit

"Build a whole feature" — Orchestrate decomposes and parallelizes:

hone --orchestrate "Add user authentication with JWT, password hashing, and session management"
# Creates 3 workers: auth-service, api-endpoints, test-suite
# Runs in parallel with dependency ordering

"I need the best refactoring" — Ensemble generates N candidates:

hone --ensemble 3 "Refactor the payment processing module for clarity"
# 3 candidates at different temperatures → ranker picks the best

"Wire into CI" — JSON output for scripting:

result=$(hone --json --no-clarify "fix lint errors")
echo "$result" | jq -e '.success' || exit 1
echo "Cost: $(echo "$result" | jq '.cost_usd')"

"TDD workflow" — Auto-test on every save:

> :tdd cargo test
  [TDD] watching for changes...

  (edit src/lib.rs)

  [TDD] tests FAILED — test_new_feature: expected 42, got 0
> fix it to return 42
  [TDD] tests PASSED — 3 passed in 0.4s

Launching Recipes

Recipes are multi-step structured workflows. Run from the terminal:

# TDD: Red → Green → Refactor cycle
hone --recipe tdd-feature --var feature_name=auth --var test_command="cargo test"

# Bug fix: reproduce → regression test → fix → verify
hone --recipe bug-fix --var test_command="pytest"

# Security audit: OWASP Top 10, secrets, dependencies
hone --recipe security-audit

# Code review: correctness, security, performance, test coverage
hone --recipe code-review

# Refactor safely with tests at each step
hone --recipe refactor --var test_command="cargo test"

# Performance: baseline → profile → optimize → measure
hone --recipe performance --var test_command="cargo bench"

# Migrate between libraries
hone --recipe migration --var old_dependency=reqwest --var new_dependency=ureq

# Generate/update documentation
hone --recipe documentation

# Understand a new codebase
hone --recipe onboarding

# List all available recipes
hone --list-recipes

Recipe	Variables	Steps
`tdd-feature`	`feature_name`, `test_command`	clarify → red → green → refactor → verify
`bug-fix`	`test_command`	reproduce → regression test → fix → verify → review
`code-review`	—	overview → correctness → security → performance → tests → report
`refactor`	`test_command`	baseline → analyze → execute → verify
`security-audit`	—	inventory → injection → auth → secrets → dependencies → report
`performance`	`test_command`	baseline → profile → optimize → measure
`migration`	`old_dependency`, `new_dependency`, `test_command`	audit → plan → migrate → verify
`documentation`	—	inventory → api-docs → readme → verify
`onboarding`	—	structure → entry-points → architecture → report

Launching Agents

Agents are specialized personas. Two ways to use them:

From terminal (one-shot):

hone --agent security-auditor "review src/auth.rs for OWASP issues"
hone --agent debugger "test_login fails with AttributeError"
hone --agent code-reviewer "review my latest changes"
hone --agent rust-expert "how to handle lifetimes in this struct"

Inside TUI (inline @ dispatch):

> @security-auditor review src/auth.rs for vulnerabilities
> @debugger why does test_login fail?
> @code-reviewer check the latest diff
> @performance-optimizer this database query is slow
> @python-expert convert this function to async
> @rust-expert fix the lifetime error in agent.rs
> @ml-engineer design a recommendation model for user preferences
> @documentation-agent update the API docs

Agent	Specialty	Example
`@code-reviewer`	Code quality, severity ratings	"review src/ for bugs"
`@debugger`	Test failures, stack traces	"why does test_login fail?"
`@security-auditor`	OWASP, injection, secrets	"audit auth.py for vulnerabilities"
`@performance-optimizer`	N+1 queries, hot paths, memory	"this query takes 3 seconds"
`@rust-expert`	Ownership, lifetimes, async tokio	"fix the borrow checker error"
`@python-expert`	Typing, pytest, FastAPI, async	"add type hints to this module"
`@documentation-agent`	READMEs, API docs, changelogs	"document the public API"
`@api-designer`	REST, GraphQL, gRPC, OpenAPI	"design a user management API"
`@tdd-workflow`	Red-Green-Refactor discipline	"implement auth with TDD"
`@researcher`	Literature review, experiments	"review papers on transformers"
`@ml-engineer`	Training, deployment, MLOps	"optimize model inference"
`@data-engineer`	ETL, schemas, data quality	"design the data pipeline"

Recipes vs Agents

	Recipes	Agents
What	Multi-step structured workflow	Specialized persona
Steps	3-6 defined phases	Single turn or conversation
Variables	`--var key=value`	None needed
Best for	Repeatable processes	Ad-hoc specialist questions
Inside TUI	CLI only (`--recipe`)	`@agent-name prompt`

Configuration

Hone uses a three-level config hierarchy:

CLI flags (highest priority)
TOML (~/.config/hone/config.toml)
Defaults (lowest priority)

Sample config.toml

[agent]
default_model = "gpt-4o"
context_window_tokens = 16000
max_iterations = 20

[memory]
enabled = true
db_path = "~/.local/share/hone/memory.db"
encryption_key_file = "~/.config/hone/key.bin"

[sandbox]
strategy = "seatbelt"  # macOS: seatbelt, Linux: landlock

[team]
rbac_enabled = true
audit_log_path = "~/.local/share/hone/audit.log"

[providers]
default = "anthropic"
# Add API keys for each provider
anthropic_api_key = "${ANTHROPIC_API_KEY}"
openai_api_key = "${OPENAI_API_KEY}"

Environment Variables

Hone respects the following environment variables for provider setup:

Variable	Purpose	Example
`ANTHROPIC_API_KEY`	Anthropic API key for direct Claude access	`sk-ant-...`
`OPENAI_API_KEY`	OpenAI API key (or compatible endpoint)	`sk-...`
`HONE_OPENAI_BASE_URL`	Override OpenAI endpoint for custom providers	`https://api.deepseek.com`
`HONE_OPENAI_MODEL`	Override model name for custom endpoints	`deepseek-coder`, `qwen-max`, `claude-3-sonnet`
`HONE_DISABLE_PERSISTENT_STORES`	Skip SQLite session/memory setup for this run (same effect as `--ephemeral`; useful for CI, demos, and TUI expect scripts)	`1`
`HONE_AUDIT_LOG`	Enable JSONL egress audit log (equivalent to `--audit`)	`1`
`HONE_CONTEXT_INJECTION` / `HONE_NO_INJECTION`	Force-enable or force-disable proactive context injection	`1`
`HONE_REDUCED_MOTION`	Disable Agent Rail animation	`1`
`HONE_ASCII_ONLY`	Force ASCII box drawing and file-op badges	`1`
`NO_COLOR`	Disable all colorized output	`1`

The provider auto-detection order is: ANTHROPIC > OPENAI (+ custom endpoints) > Claude CLI > Gemini CLI > Ollama.

When HONE_OPENAI_BASE_URL is set, Hone treats any OPENAI_API_KEY as a token for that endpoint, allowing drop-in access to DeepSeek, Qwen, Kimi, OpenRouter, and other OpenAI-compatible services.

See docs/configuration.md for complete reference.

Agent ensemble (experimental)

Run the same prompt through N parallel candidates with different temperatures, then pick the best response using the MASAI Ranker pattern:

hone --ensemble 3 "refactor this function for clarity"

Each candidate runs at a different temperature (0.0, 0.3, 0.5, …), maximising response diversity. A dedicated ranker agent (temperature 0.0) reads all candidates and selects the best one. Research shows ensemble+voting improves code quality by 5-8% at roughly N× token cost (arXiv 2406.11638).

Up to five candidates are supported (--ensemble 5).

CLI Reference

Command	Purpose
`hone configure`	Interactive setup wizard for providers, API keys, and models
`hone`	Start interactive CLI session
`hone "prompt"`	One-shot mode: single prompt and exit
`hone --clarify "prompt"`	Enable scope clarification mode (auto-detects ambiguous requests, asks clarifying questions)
`hone --no-clarify "prompt"`	Disable scope clarification for CI/scripting
`hone --orchestrate "prompt"`	Multi-agent orchestrator (decompose-execute-merge)
`hone --ensemble N "prompt"`	Run N candidates with varied temperatures, rank the best
`hone --recipe <name>`	Run a built-in recipe (tdd-feature, bug-fix, code-review, refactor)
`hone --list-recipes`	List all available recipes
`hone --session NAME`	Start named session (auto-saved after each turn)
`hone --ephemeral`	No session/memory persistence for this run
`hone --vim`	Start in vim mode (vim-style keybindings)
`hone --json`	Output structured JSON for CI/CD automation
`hone --no-inject`	Disable proactive context injection (also set HONE_NO_INJECTION=1)
`hone --skeleton-strategy call_graph`	Select skeleton strategy: call_graph (default, adaptive), hybrid, flat, dependency_graph
`hone --inject-context`	Explicitly enable context injection (already default, useful for config overrides)
`hone --audit`	Enable JSONL egress audit log (also set HONE_AUDIT_LOG=1)
`hone --var KEY=VALUE`	Substitute template variables in recipes (repeatable)
`hone --model <name>`	Use a specific model (e.g., `--model ollama` for Ollama, `--model gpt-4o` for OpenAI)
`hone --model ollama`	Use Ollama with tool calling support (now fully enabled)
`hone session list`	List all saved sessions
`hone session resume NAME`	Resume a previous session
`hone session rename OLD NEW`	Rename a session
`hone session delete NAME`	Delete a session
`hone --dir <path>`	Target a specific directory
`hone --remote <host:port>`	Connect to remote daemon
`honed`	Start the HTTP/2 server daemon
`honed --port 8080`	Listen on custom port

In interactive mode, use these commands:

Command	Purpose
`!command`	Run shell command (e.g., `!cargo test`, `!git status`)
`!cd path`	Change directory (confined to project)
`!nvim file`	Suspend TUI, open editor, restore on exit
`@agent-name prompt`	Dispatch to inline agent (e.g., `@security-auditor "audit this"`, `@debugger "fix this bug"`)
`/configure`	Show session summary (provider, model, theme, vim, budget, cost, phase)
`/configure model <name>`	Switch model mid-session (e.g., `/configure model claude-sonnet-4-5`)
`/configure theme <name>`	Change UI theme
`/configure vim on\|off`	Toggle vim-style keybindings
`/configure budget <usd>\|off`	Set or clear a session cost ceiling
`/configure provider`	Print the active provider adapter
`/configure help`	Show the full `/configure` subcommand list
`:clarify`	Manually trigger scope clarification for ambiguous requests
`:set vim`	Enable vim-style keybindings
`:set novim`	Disable vim-style keybindings
`:set no-clarify`	Disable automatic clarification prompts
`:tdd [command]`	Watch files, auto-run tests on change (300ms debounce)
`:tdd off`	Stop TDD mode
`:review`	Enter hunk-by-hunk review mode (j/k navigate, a/r accept/reject, A/R all, q quit)

See hone --help for full options.

Crate Structure

Hone is organized as a Cargo workspace. Each crate has a focused responsibility:

Crate	Purpose	Deps
`hone-proto`	Shared types, DTOs, errors	0 internal
`hone-config`	TOML + HONE.md parsing	proto
`hone-providers`	Model provider abstraction	proto
`hone-sandbox`	Kernel sandboxing (Seatbelt, Landlock)	proto
`hone-tools`	Layer 2 native tools (file, shell, git, LSP)	proto, sandbox
`hone-mcp`	MCP client (JSON-RPC transport)	proto, tools
`hone-index`	Semantic indexer (tree-sitter, SQLite vectors)	proto
`hone-memory`	Adaptive memory (encrypted SQLite)	proto
`hone-core`	Agent loop, session, orchestration	all above
`hone-recipes`	YAML recipe parsing + execution	proto, core
`hone-server`	HTTP/2 + SSE daemon	proto, core
`hone-tui`	Ratatui terminal UI	proto, core

Binaries:

hone-cli — Interactive terminal agent
honed — HTTP/2 daemon

How Hone Compares

An honest comparison. We list where competitors are better, not just where Hone wins.

Feature	Hone	Claude Code	Codex CLI	Goose	Kiro	Gemini CLI
Open source	MIT/Apache-2.0	Proprietary	Open (Apache-2.0)	Open (Apache-2.0)	Proprietary	Apache-2.0
Terminal-native	Yes	Yes	Yes	Yes (+ Electron)	No (VS Code only)	Yes
Multi-agent orchestration	DAG + ensemble + RALPH	No	No	No	No	No
Kernel sandboxing	Seatbelt + Landlock	Container	Container	No	No	No
Encrypted memory	AES-256-GCM	No	No	No	No	No
MCP extensions	70+ via .hone/mcp.json	Yes (native)	No	70+	Yes	Yes
Provider support	7+ (any OpenAI-compat)	Claude only	OpenAI only	30+	Bedrock only	Gemini only
Context optimization	BM25 + call graph + adaptive	Basic	Basic	Summarization	Unknown	1M+ window
Tool calling	Native + text fallback	Native	Native	Native	Native	Native
Session management	SQLite + resume	Yes	No	Yes	No	No
Recipes/workflows	9 built-in YAML	No	No	No	Spec-driven plans	No
Cost tracking	Live status bar + audit	No	No	CLI command	No	No
TDD mode	:tdd file watcher	No	No	No	No	No
Review mode	Hunk accept/reject	No	No	No	No	No

Where Competitors Are Better (honestly)

Area	Winner	Why
SWE-bench resolve rate	Claude Code (72%)	Hone: 19.3%. Claude Code uses Sonnet 4.5 ($3/task vs Hone's $0.03).
Community size	Claude Code (~71K stars)	Hone is new. Goose has ~20K, Gemini CLI ~50K.
Context window	Gemini CLI (1M+ tokens)	Eliminates context management complexity entirely.
IDE integration	Kiro, Cursor	VS Code native. Hone is terminal-only (web UI planned).
Free usage	Gemini CLI (free tier)	Corporate-subsidized. Hone requires your own API key.
Spec-driven planning	Kiro	Generates requirements docs before coding. Hone has recipes but not specs.
Desktop GUI	Goose (Electron)	Hone is terminal-only.

Where Hone Wins

Area	Detail
Cost efficiency	$0.03/task vs $0.50-$3.00. 50-100x cheaper.
Privacy	Kernel sandbox + encrypted memory + no telemetry. Nothing leaves your machine.
Multi-agent	Only agent with DAG orchestration, ensemble voting, RALPH loop, reviewer.
Flexibility	Works with any OpenAI-compatible model. DeepSeek, Qwen, Ollama (local, $0).
Observability	Hierarchical tracing, egress audit, injection detection, cost tracking.
Structured workflows	9 recipes, TDD mode, review mode, scope clarification, @agent dispatch.

Choose Hone If...

You want to run AI coding with your own models (not locked to one vendor)
Privacy matters — kernel sandbox, encrypted memory, no telemetry
You need multi-agent orchestration for complex tasks
You want $0.03/task instead of $3/task
You prefer terminal-native tools over IDE extensions
You need structured workflows (TDD, security audit, code review recipes)

Choose Something Else If...

You need the highest possible SWE-bench score → Claude Code
You want IDE integration → Kiro or Cursor
You want a desktop GUI → Goose
You want free unlimited usage → Gemini CLI
You need 1M+ token context → Gemini CLI

Documentation

Core Documentation:

README — Features, installation, quick start, benchmarks
CHANGELOG — Recent changes, improvements, security hardening (Phase 2 complete)
Architecture — Layer design, crate dependencies, data flow, context optimization, memory encryption, security model
Multi-Agent Coordination and Memory — Agent turn lifecycle, orchestrator pipeline, context trimming, cross-session learning
Configuration — Config hierarchy, all options, examples, environment variables
Benchmarks — Performance analysis, microbenchmarks, SWE-bench results

User Guides (in docs/tutorials/):

Getting Started — Installation, first session, understanding output
Providers — Model providers, hot-swapping, custom endpoints (OpenRouter, DeepSeek, etc.)
Security — Sandboxing (Seatbelt/Landlock), RBAC, audit logging, memory encryption
Tools & MCP — Layer architecture, native tools, MCP integration
Recipes — YAML workflows, multi-step automation
Server & Remote — Running the daemon, HTTP API, multi-user setup

For Contributors:

Contributing — Build, test, commit conventions

Testing

Hone passes 1,100+ tests (unit + integration) across 13 crates with the workspace clippy/rust lint policy applied to every crate.

# Run all tests
cargo test --lib

# Run with logging
RUST_LOG=debug cargo test --lib -- --nocapture

# Run a specific test
cargo test --lib orchestrator

# Run integration tests
cargo test --test '*'

# Benchmarks
cargo bench

# View coverage (requires tarpaulin)
cargo tarpaulin --out Html

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/my-feature)
Write tests and docs alongside code
Run cargo test && cargo clippy --all-targets
Commit with a clear message
Push and open a pull request

See CONTRIBUTING.md for detailed guidelines.

License

Licensed under either of:

Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
MIT License (LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

Built with attention to privacy, performance, and composability.

Dependencies

~48–65MB
~1M SLoC