iceCoder

Self-hosted runtime governance for tool-using LLM agents — keeps long coding sessions on goal through L1/L2 dual-mode supervision, checkpoints, file memory, and a governed tool loop (217+ rounds verified).

中文简介 · Usage & commands · Full guide · 项目介绍

Preview

Getting started

Windows desktop (recommended)

No Node.js required — download the installer (bundled server + Electron shell + floating Ice Bean):

Download iceCoder — Windows x64

Configure your API key on first launch. Data directory: ~/.iceCoder/. Build from source: npm run build:desktop or see docs/使用文档.md.

Source / Web / CLI

Install, configure API keys, start dev/Web/CLI, run one-shot tasks, tests, and coverage — all commands are in docs/使用文档.md (not duplicated here).

Node.js 18+ (22+ recommended) · Dev data: ./data/ · Prod: ~/.iceCoder/ — env vars: docs/environment-variables.md.

Core features

Dual-mode runtime (L0 / L1 / L2)

Three layers: L0 is the supervision tier you pick in the sidebar footer (off / adaptive / strict); L1 is Harness free ↔ forced with ToolGate, branch budget, and TaskGraph constraints when risk rises; L2 watches no_progress, file_loop, tool_repeat_fail, goal_drift, etc., and can takeover → correct → handoff back to the main loop (events in supervisor-events.jsonl).

adaptive (default) balances freedom and enforcement; strict stays near forced, builds the graph on round 1, and is required for some L2 cases (e.g. file_loop).
Tied to TaskGraph and Verification Gate — code changes without verification cannot “finish by chat alone”; critical_* domains affect L2 takeover.
Config: supervisorMode in data/config.json + data/supervisor-config.json; optional ICE_SUPERVISOR_SHADOW=1 shadow run.

Harness loop & TaskGraph

Harness turns model intent into executed tools: permission rules (allow/confirm/deny), no-tool recovery, repeat-failure detection, micro/hard compaction with runtime re-injection. TaskGraph is the only structured execution context (replaces the old Execution Plan): enabled for edit / debug / test / refactor; question / inspect stay in free mode without a graph.

TaskState + RepoContext track intent, phase, touched files, and verification status.
CheckpointEngine v2 adds runtimeV2 on the same {sessionId}.checkpoint.json (tool trail, failures, branch budget, supervisor snapshot).
Sub-agents: delegate_to_subagent explores with a read-only tool whitelist; the main thread only gets a short summary (~60–80% less context bloat from search/read).

Ice Bean (Web session indicator)

Web-only Canvas pet — visual feedback from Harness / Supervisor / TaskGraph events; does not drive runtime decisions.

L0 eye color reflects off / adaptive / strict (toggle in the sidebar footer); the welcome dashboard shows Harness and L2·Gate status at a glance.
L1 chip shows forced · … aligned with execution_mode_enter reasons.
~20 expressions plus a token ring; step summary and “round N” sync over WebSocket.

File memory

Long-term facts live as Markdown under data/memory-files/ (no external vector DB).

Coarse recall (pre-LLM): keyword pull, up to 3 snippets — cheap and auditable.
Extract & maintenance (post-tool): writes/updates after tools; Dream and eviction control size and conflicts.
Session notes with icecoder-runtime blocks cooperate with Harness memory integration.

Multi-session & cross-device sync

Sidebar CRUD per session — isolated history, session-notes, and checkpoints.

~scan: QR pairs phone and PC on the same session id (shared context, not a read-only mirror).
Keep-alive + runningTurn restores streaming UI after refresh or reconnect.

Checkpoint, compaction & resume

Disk checkpoints bundle task state, TaskGraph/runtimeV2, BranchBudget, supervisor snapshot; writes are serialized via Harness tail to avoid torn files.

After hard compaction, TaskState + RepoContext are re-injected (goal, changed files, pending verification).
F5 / WS reconnect uses runningTurn + checkpoint replay so half-finished streams are not lost.
Legacy checkpoints without runtimeV2 remain readable.

Tools (27 built-in + MCP)

File/Git/Shell, search, sub-agent delegation, URL fetch, Office parse, etc.; MCP registers external server tools into the same ToolRegistry.

Shell dual-track: long run_command jobs go background, short ones foreground; soft timeout can escalate.
Diff viewer embeds Git-style diffs for edit tools in chat.
confirm without a UI callback defaults to deny for unattended safety.

Telemetry & observability

Harness rounds, memory ops, L1 mode changes, L2 timeline → data/*/telemetry.jsonl and supervisor-events.jsonl.

Web: ~telemetry, ~supervisor (days= / event= / limit=), ~memory, etc.; type ~ for the command palette.
HTTP: GET /api/supervisor/events mirrors chat reports — useful for long-run debugging and shadow comparisons.
Commands and paths: docs/使用文档.md.

Tests & quality baseline

~1,867 Vitest cases (~78% line coverage on src/; Harness ~84%, Supervisor ~95%, memory ~71% — run npm run test:coverage locally).

Covers gates, TaskGraph, dual-mode, memory lifecycle, Web routes, and long-session scenarios.
npm run eval:agent: agent metric skeleton (success rate, tool rate, verification rate) for regression watch.
How to run: docs/使用文档.md.

Local benchmarks

Blind same-model runs vs Claude Code (judge: Cursor Composer 2.5); SR (acceptance pass) + Composite (Gate 40 + Judge 60), not turn count alone.

Layers L1–L6 plus L4+ (e.g. billing: 97 files, 19 cross-module logic bugs); repair tasks often +3 Composite for iceCoder — see table below.
How to run: docs/使用文档.md §本地 Benchmark · rubric benchMark/md/三平台同模对比评测与裁判评分体系.md.

Read more: dual-mode → docs/requirement/L2测试过程.md · overview → docs/项目介绍.md · memory → docs/requirement/记忆系统调整-finish.md · multi-session → docs/requirement/多会话-web侧栏-finish.md

Local benchmark scores

Judge: Cursor Composer 2.5 · Rubric: benchMark/md/三平台同模对比评测与裁判评分体系.md
Reports: benchMark/reports/ · Tasks: benchMark/tasks/ · How to run: docs/使用文档.md §本地 Benchmark

Do not mix model batches when comparing platforms (m2.7 vs m2.5-pro vs MiniMax-M3).

Task	Platform	Model	SR	Composite	vs CC	Duration
Order pipeline	iceCoder	m2.7	✅	86	+3	—
Saga warehouse	iceCoder	m2.7	✅	88	+3	—
Spell Brigade	iceCoder / CC	m2.5-pro	1 / 1	81 / 80	+1	~120m / 87m
Billing 19 bugs 01	iceCoder	MiniMax-M3	✅ 19/19	93	+1 vs 02	≈3.6 min · 23 turns
Billing 19 bugs 02	CC	MiniMax-M3	✅ 19/19	92	—	5m 45s

Takeaway: Same model + same repo — iceCoder leads on governed delivery (acceptance pass, composite, or wall-clock on L4+ billing); CC can win on isolated code style (e.g. tax-line-builder). The +1 on billing (93 vs 92) is D6 delivery notes only — see report §「分差解读」.

Architecture (compact)

CLI / Web / WS → memory recall → Harness (tools, verify, compact)
  → TaskGraph → Supervisor L1/L2 → Checkpoint + BranchBudget → 27 tools + MCP

Piece	Role
Harness	Main agent loop, verification gate, compaction, telemetry
Supervisor	L1 execution mode + L2 runtime takeover/handoff
TaskGraph	Structured plan injection
File memory	Long-term Markdown facts + session notes
Web	Pet, multi-session, keep-alive, diff viewer, `~` commands

Docs

Doc	Content
使用文档	Commands — install, dev, CLI, tests, benchmark, `~` commands
PROJECT-GUIDE	Architecture & modules
项目介绍	Chinese full reference
PACKAGE_USAGE	`npm pack` install
Benchmark rubric	Scoring methodology
debug-billing-settlement	L4+ 19-bug run (M3, iceCoder vs CC)

License

MIT — see LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 403 Commits
.iceCoder		.iceCoder
LoCoMo		LoCoMo
benchMark		benchMark
data		data
desktop		desktop
docs		docs
releases/windows		releases/windows
scripts		scripts
src		src
test		test
user-memory		user-memory
.gitattributes		.gitattributes
.gitignore		.gitignore
.npmrc		.npmrc
LICENSE		LICENSE
MEMORY.md		MEMORY.md
PACKAGE_USAGE.md		PACKAGE_USAGE.md
README.md		README.md
README.zh-CN.md		README.zh-CN.md
ice-coder-1.0.0.tgz		ice-coder-1.0.0.tgz
package-lock.json		package-lock.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json
tsconfig.pack.json		tsconfig.pack.json
vite.config.ts		vite.config.ts
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

iceCoder

Preview

Getting started

Windows desktop (recommended)

Source / Web / CLI

Core features

Dual-mode runtime (L0 / L1 / L2)

Harness loop & TaskGraph

Ice Bean (Web session indicator)

File memory

Multi-session & cross-device sync

Checkpoint, compaction & resume

Tools (27 built-in + MCP)

Telemetry & observability

Tests & quality baseline

Local benchmarks

Local benchmark scores

Architecture (compact)

Docs

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

iceCoder

Preview

Getting started

Windows desktop (recommended)

Source / Web / CLI

Core features

Dual-mode runtime (L0 / L1 / L2)

Harness loop & TaskGraph

Ice Bean (Web session indicator)

File memory

Multi-session & cross-device sync

Checkpoint, compaction & resume

Tools (27 built-in + MCP)

Telemetry & observability

Tests & quality baseline

Local benchmarks

Local benchmark scores

Architecture (compact)

Docs

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages