NomadDev is an experimental, mobile-first remote execution environment. It provides a secure, natural-language-driven interface for managing remote servers, testing code, and orchestrating containers from your phone without exposing an SSH port or relying on messy terminal emulators.
By combining mesh networking, ephemeral container sandboxing, and LLM-driven RPC mapping, NomadDev allows you to interact with a headless VPS daemon securely and seamlessly.
The system is built on a "local-first" philosophy extended to remote infrastructure. Data and execution remain strictly within your private mesh network.
The architecture is divided into six modular, decoupled components:
- The Secure Mesh (Connectivity): A Tailscale overlay network ensuring the remote host and mobile client communicate exclusively over a private IP range.
- The Orchestrator Daemon (Backend): A lightweight, concurrent WebSocket server written in Go that acts as the central nervous system, handling secure client connections and job routing.
- The Ephemeral Sandbox (Worker): A Go-based wrapper around the Docker SDK that runs each tool call in a one-shot container with no network, read-only rootfs, and gVisor (
runsc) isolation when the host advertises it. Hard memory / CPU / pids caps and a wall-clock timeout bound every execution. - The NLP-to-RPC Middleware (Logic): A translation layer that maps natural language requests to predefined JSON schemas and remote procedure calls (RPC). Pluggable provider backends: Google GenAI (Gemini), OpenAI Chat Completions, Anthropic Messages API, and DeepSeek — each selectable via the
NOMADDEV_MIDDLEWARE_RUNTIMEenv var and gated behind its own build tag. - The GitHub MCP Backend (Integration): A subprocess-managed embedding of the official github-mcp-server exposing ~75 GitHub operations as additional tool calls. Mutating operations flow through the same approval gate as shell scripts.
- The Control Hub (Client): A React Native mobile application that consumes JSON event streams to render a clean, native UI instead of raw terminal output.
Objective: Establish secure, passwordless communication between devices.
- Configure host VPS with Ubuntu 24.04.
- Install and configure Tailscale subnet routing.
- Verify ICMP and basic TCP packet transmission exclusively over the Tailscale IP range.
- Disable public SSH access on the host (port 22).
Provisioning lives at infra/. The flow is documented end-to-end
in infra/RUNBOOK.md: walk through
infra/scripts/provision.sh on a fresh
host, run infra/scripts/tailscale-verify.sh
to confirm the mesh, then
infra/scripts/ssh-lockdown.sh to close
the public interface. infra/scripts/smoke.sh
drives a JWT-authed command.request round-trip and exits non-zero on any
regression — point it at 100.x.y.z:8080 to verify the live deploy.
Objective: Build the core message relay system.
- Initialize the Go module and set up a basic TCP listener.
- Implement a WebSocket server utilizing
gorilla/websocket. - Create a standard JSON event structure for inbound/outbound payloads.
- Implement JWT-based authentication to reject unauthorized WebSocket connections.
- Build a robust logging and state-recovery mechanism for dropped connections.
Implementation lives under cmd/orchestrator and
internal/. See docs/architecture.md,
docs/events.md, and docs/auth.md.
Objective: Safely execute commands and capture outputs without risking the host system.
- Integrate the official Docker SDK for Go.
- Create a function to dynamically pull and spin up lightweight worker images (e.g., Alpine or Ubuntu).
- Implement secure volume bind-mounts for a designated workspace directory.
- Build an execution loop that runs
bashcommands inside the container and streamsstdout/stderrback to the Orchestrator via channels. - Implement hard timeouts and resource limits (RAM/CPU) for the sandbox.
Runner implementation lives at internal/sandbox/;
the orchestrator wires it in at internal/wsserver/sandbox.go.
See docs/sandbox.md for the architecture, threat model,
and how to switch between the mock and Docker runners.
Objective: Standardize natural language into actionable system commands.
- Integrate the Gemini API via Google AI Studio.
- Define JSON schemas for core system tools (e.g.,
execute_script,read_file,write_patch,apply_code_patch,search_syntax). - Persistent reference buffer:
pin_file/unpin_filetools store raw file contents in an in-memory, per-session map ininternal/history— kept out of the event log so the summarization compactor can't drop them — andLoadWindow's caller injects them at the top of the system prompt every turn, keeping critical architectural files in context through long execution chains. - Build the loop that receives user intent, queries the LLM, and captures the resulting Function Call.
- Map the generated Function Calls directly to the Go Sandbox Runner from Phase 3.
- Format execution results back into JSON for the LLM to interpret.
- Audit / dry-run mode:
user.intentenvelopes may carrymode: "audit". The orchestrator stripsexecute_script,write_patch,apply_code_patch, and destructivegithub_*tools from the catalogue before the schema reaches Gemini, and the dispatcher refuses to run them defense-in-depth. The assistant is steered to produce a markdown report. - Multi-provider LLM support: alongside Gemini, the middleware ships
drop-in
Translatorimplementations for OpenAI Chat Completions (internal/middleware/openai.go), the Anthropic Messages API (internal/middleware/anthropic.go), and DeepSeek (reuses the OpenAI client with the DeepSeek base URL pre-filled by the factory, since DeepSeek's API is OpenAI-compatible). Each provider is gated behind its own build tag (-tags openai,-tags anthropic) so the default orchestrator binary stays SDK-free. Operators select a backend withNOMADDEV_MIDDLEWARE_RUNTIME=mock|gemini|openai|anthropic|deepseek|noneand supply per-provider credentials viaNOMADDEV_{OPENAI,ANTHROPIC, DEEPSEEK}_API_KEY(plus optional_MODELoverrides andNOMADDEV_OPENAI_BASE_URLfor Azure / proxy deployments). Seeinternal/middleware/README.mdfor the build matrix. - Per-LLM transport-level retry budget configurable via
NOMADDEV_LLM_MAX_RETRIES(default 2). OpenAI and Anthropic SDKs both back off exponentially on 408/409/429/5xx responses; Gemini's policy is hardcoded by the upstream SDK and not overridable. Sandbox / tool-call retries continue to flow through the separateNOMADDEV_MAX_AUTORETRIESrecovery budget at the dispatch layer. - Cost accounting in USD: a hard-coded per-
(provider, model)price table atinternal/middleware/pricing/derives anomaddev_llm_cost_usd_totalPrometheus counter (labeled by provider + model) alongside the existingnomaddev_llm_tokens_total(which now also carriesprovider+modellabels). The terminalassistant.message.usageenvelope shipped to the Mobile Control Hub carries an additionalcost_usdfield so the per-session 'Session Cost' ticker can render real dollars instead of just tokens. - Anthropic extended thinking surfaced as a distinct
assistant.thinkingwire envelope andAssistantEvent.Thinkingfield. Enable per-deploy withNOMADDEV_ANTHROPIC_THINKING_BUDGET(>=1024); other backends ignore it. Thinking frames stream alongside (not within) the regularassistant.chunkstream so clients can render the model's reasoning separately from its final answer. - Multimodal (image) inputs on
user.intentenvelopes. The mobile Composer has an attachment button backed byexpo-image-picker; picked images are base64-encoded and sent in the envelope'simagesfield (an array of{media_type, data}blocks). The orchestrator validates count + decoded size againstNOMADDEV_USER_INTENT_MAX_IMAGES(default 4) andNOMADDEV_USER_INTENT_MAX_IMAGE_BYTES(default 5 MiB), then forwards to whichever backend is active: Gemini wraps each image as anInlineDatapart, Anthropic as anImageBlock, OpenAI as animage_urlcontent part (DeepSeek inherits the OpenAI path). Images are persisted inhistory.Turn.Partsso subsequent turns in the same session can refer back to them. Allowed media types are restricted toimage/jpeg,image/png,image/gif, andimage/webp(the intersection of the three providers' supported sets).- Vision-capable models: every provider's defaults
(
gpt-4o-mini,claude-sonnet-4-5,gemini-2.0-flash) accept images. DeepSeek'sdeepseek-chatanddeepseek-reasonerare text-only; pair the DeepSeek runtime withNOMADDEV_DEEPSEEK_MODEL=deepseek-vl2to use vision. OpenAI'so3-miniis also text-only. - Guardrail: image-bearing
user.intentenvelopes are rejected up-front with abad_envelopeerror when the active runtime+model is known text-only (seepricing.SupportsVision), so the operator sees a clear "switch to deepseek-vl2" diagnostic instead of an opaque upstream-provider 4xx. Unknown models pass through — upstream surfaces any model-specific error.
- Vision-capable models: every provider's defaults
(
- Runtime model switching from the mobile UI. The
helloenvelope advertises the activeprovider, the currentmodel, and the provider'savailable_modelscatalogue (frommiddleware.KnownModels()); the mobile Settings screen renders a picker from it. Tapping a row sends auser.command{action:"set_model"}envelope — the orchestrator validates the model against the active provider's catalogue, stores a per-session (per-SID) override, and the nextuser.intentpicks it up viaTurnInput.Model.reset_historyclears the override; the client re-applies its remembered choice on reconnect by readinghello.model. Switching the provider itself (e.g. openai → anthropic) stays a startup-only knob — it needs different credentials and a different build tag.
Translator + dispatcher + approval gate live at
internal/middleware/; filesystem-only tools live
at internal/fsops/; per-session conversation memory at
internal/history/. See
docs/middleware.md for the full architecture and
docs/approval.md for the human-in-the-loop state
machine.
search_syntax shells out to ast-grep (sg)
inside the sandbox worker so the model can run structural AST queries
(e.g. fn $F($_: context.Context)) instead of authoring fragile regex. The
binary is pre-baked into the dedicated sandbox image built from the
sandbox Dockerfile target:
docker build --target sandbox -t nomaddev/sandbox:bookworm-sg .
NOMADDEV_SANDBOX_IMAGE=nomaddev/sandbox:bookworm-sg ./orchestrator
The envelope returned to the model is capped by the same
NOMADDEV_GITHUB_MAX_RESULT_BYTES (default 1 MiB) that gates GitHub MCP
results, so a permissive pattern can't blow the context window.
Objective: Ditch the terminal for a native, reactive mobile interface.
- Scaffold a new React Native (or Expo) project.
- Implement a WebSocket client that connects to the Orchestrator's Tailscale IP.
- Build the main chat/event feed UI components.
- Create custom UI cards for "Action Approvals" (intercepting sensitive commands before they run).
- Implement background synchronization to fetch state history upon app resume.
- Live Terminal inside each Action Card — virtualised, auto-tailing
view of streamed
command.chunkoutput with a heartbeat-driven elapsed-time indicator (sandbox.heartbeat) so the operator can see long-running jobs are still alive between bursts of output.
Expo + TypeScript SPA at mobile/, exported as static web
assets and embedded into the orchestrator binary via
internal/wsserver/spa.go. The same
Tailscale IP that exposes /ws also serves the UI at /. Three routes
(/onboard, /chat, /settings) over
@react-navigation/native-stack. JWT onboarding ships as a QR helper at
scripts/qr-jwt/. See
docs/mobile.md for the architecture and
docs/auth.md for the onboarding flow.
Objective: Take the stack from feature-complete to operable on real hosts.
- Persistent session replay buffer (SQLite write-through, rehydrates on restart).
- Prometheus
/metricsendpoint covering WS, replay, sandbox, middleware turns, and LLM token usage. - Multi-stage
Dockerfile(distroless/static, pure-Go SQLite, no cgo) +docker-compose.yml. - Hardened systemd unit + non-destructive installer script.
- Mobile offline outbox + interactive Settings (Reset history, Force reconnect, Model picker).
- Tag-driven release workflow → binaries + multi-arch GHCR image.
mTLS / per-cert subject mapping is an explicit non-goal for this round —
the Tailscale tailnet already gates network reachability, and JWT
remains the single auth source for /ws.
docs/operations.md is the operator reference;
infra/RUNBOOK.md is the deploy walkthrough.
Objective: Let the mobile chat drive GitHub (issues, PRs, repos, …) the same way it drives shell scripts and files, with the same approval gate.
- Subprocess-based MCP client embedding the official github-mcp-server — no exposure to its "Go API is unstable" warning.
- All ~75 tools across 19 toolsets exposed to Gemini via the existing
function-calling loop; tool list narrowable via
NOMADDEV_GITHUB_TOOLSETS. - Auto-approval gating: every tool the upstream marks
DestructiveHint=true(with a verb-prefix fallback) is added to the required-approval set at startup. PRs, issues, file writes all surface the sameApprovalSheetthe mobile UI already renders for shell scripts. -
TokenSourceinterface keeps per-user PAT / GitHub App / OAuth as drop-in future implementations. - Build-tag-gated (
-tags github) so default builds stay slim;NOMADDEV_GITHUB_TOKENempty is a silent no-op for development. -
nomaddev_github_calls_total{tool,outcome}counter for per-tool observability. - Mobile
ApprovalSheetsurfaces a GITHUB badge forgithub_*tools so operators instantly distinguish remote-state approvals from local sandbox/fsops ones. - Opt-in live round-trip test (
make test-github-live) that drives the real upstream binary; CI skips silently when the PAT env var and binary aren't present. - Production deploy paths: GHCR Docker image bundles a pinned
github-mcp-serversodocker compose upworks with no extra install; release-workflow binaries built with-tags "gemini github"so.tar.gzdownloads from the releases page have the integration compiled in. - Per-call timeout honored:
DispatchOptions.Timeoutcaps the upstream MCP round-trip so a hung GitHub request surfaces asSandboxErrTimeoutinstead of hanging the turn. - Subprocess supervision: a crashed
github-mcp-serveris detected on the next tool call, respawned, and the call retried once. Cooldown-throttled (5 s minimum between attempts) so a flapping upstream binary can't loop. - Latency histogram (
nomaddev_github_call_seconds) for SLO dashboards; bad-args / approval-denied pre-flights are excluded so the histogram tracks only real upstream round-trips. -
quickstart-systemd.shauto-installsgithub-mcp-serverwhenNOMADDEV_GITHUB_TOKENis configured — single-command deploy for the systemd path matches the Docker path. - Pre-flight argument size cap (
NOMADDEV_GITHUB_MAX_ARG_BYTES, default 256 KiB) — an LLM emitting a 100 MB blob is rejected asSandboxErrBadRequestbefore the stdio pipe sees it. - Sensitive-arg redaction in the
command.request/tool.approval.requestwire envelopes — values for keys matchingtoken/password/secret/auth/api_key/credential/ etc. are masked on the wire (display only; dispatch still gets the originals). Long strings truncated to 4 KiB. - Upstream API drift CI guard
(
.github/workflows/upstream-drift.yml) runs a weekly + on-PR smoke against the latestgithub-mcp-serverrelease so breaking changes surface before we bump the pinned version in the Dockerfile. - Result size cap (
NOMADDEV_GITHUB_MAX_RESULT_BYTES, default 1 MiB): aget_file_contentsreturning a 50 MB blob is replaced with a preview-bearing truncated envelope (truncated: true,original_bytes, head-of-payload) so it can't blow Gemini's context window. - Per-user PAT routing via
NOMADDEV_GITHUB_USER_TOKENS_PATH— JSON file mapping JWTsub→ fine-grained PAT, plumbed viaWithUserSub(ctx, sub)from the wsserver layer to aPerUserTokenSourcethat falls through to the shared default on miss. Hot-reload on file mtime change. TheTokenSourceinterface remains the seam for DB-backed or OAuth-onboarded variants. - Live API CI smoke
(
.github/workflows/github-mcp-live.yml) — weekly + manual workflow that drivesTestLive_*against the real GitHub API on the pinned upstream version. Secret-gated (GITHUB_MCP_LIVE_TOKEN) so forks and external PRs skip cleanly.
See docs/github.md for setup, PAT scopes,
troubleshooting, and the auth-extension seam. The GitHub MCP
integration is 100% feature-complete; future work tracks upstream
catalogue growth, not capability gaps.
Objective: Work the prioritized top-10 from the missing-features
review at /root/.claude/plans/review-this-repository-and-delegated-moon.md.
Each numbered subsection shipped independently as its own PR. 10/10
complete; the review's wider gap list (~50 items grouped by lens)
remains the backlog source.
Closes the "stolen JWT is good until expiry" gap and stops forcing mobile users to re-onboard every time their access token rolls.
- Two token kinds. Tokens carry a
kindclaim:access(short-lived, presented at/ws) orrefresh(long-lived, only valid atPOST /auth/refresh). Defaults: access1h, refresh720h(30 days). Tokens minted before Phase 8 (nokindclaim) are accepted asaccessfor back-compat. -
POST /auth/refresh. Mobile clients exchange a refresh token for a fresh(access, refresh)pair. The presented refresh JTI is rotated into the revocation list so it can never be replayed. Accepts the token in theAuthorizationheader, a JSON body, or a form field. -
POST /auth/revoke. Authenticated revocation endpoint — the caller's own token (access or refresh) is added to the revocation list. Idempotent (204 either time). A leaked token can now be killed before it expires naturally. - JTI revocation list with three backends:
sqlite(durable across restarts, default — file atNOMADDEV_AUTH_REVOCATION_PATH),memory(lost on restart),none(pre-Phase-8 behavior). A janitor goroutine prunes entries whoseexphas passed. -
gen-jwt -kind {access|refresh|pair}for issuing the new token shapes;pairemits both as JSON for piping into onboarding. -
/wsenforceskind=access. Refresh tokens presented at/wsare rejected with 401 before upgrade — defense in depth against accidental or malicious replay.
See docs/auth.md for the full claim shape,
endpoint contracts, and revocation backend notes.
Closes the supply-chain hole where a compromised registry could
repoint alpine:3.20 at a malicious manifest between deploys.
-
NOMADDEV_SANDBOX_IMAGEaccepts a content-addressed ref (alpine:3.20@sha256:…). Docker enforces the digest at pull time; the runner additionally re-inspects the local image before every exec and refuses to start the container ifRepoDigestsno longer contains the expected digest — catches a host-localdocker tagattack that would otherwise bypass pull verification. -
NOMADDEV_SANDBOX_REQUIRE_DIGEST=truehard-fails at boot on a tag-only image so a misconfigured production deploy can't silently fall back to the unpinned path. Defaultfalsefor back-compat. - Parser is shared across builds (no
-tags dockerneeded for the validation tests) and emits a structured warning when the configured image is unpinned, so operators see the recommendation in the startup log.
See docs/sandbox.md for the
verification flow and threat-model rationale.
Closes the trivial-DoS surface where a hostile client can either send a 1 GB envelope (OOM) or stream tens of thousands of small frames a second (starve the dispatcher) without hitting any per-server cap.
-
NOMADDEV_WS_MAX_MESSAGE_BYTES(default 256 KiB) bounds inbound frame size viagorilla/websocket'sSetReadLimit. Oversized frames are closed with the standard 1009 (message too big) code and counted onnomaddev_ws_inbound_rejected_total{reason="message_too_large"}. -
NOMADDEV_WS_RATE_LIMIT(envelopes/sec) +NOMADDEV_WS_RATE_BURST(bucket size) cap inbound envelopes per connection via a token-bucket limiter (golang.org/x/time/rate). Rejected frames return a structurederror{code: "rate_limited"}envelope without dropping the connection — a well-behaved client can throttle and resume. - Both knobs default to permissive-but-safe values; set
NOMADDEV_WS_RATE_LIMIT=0to disable rate limiting entirely. - Metric
nomaddev_ws_inbound_rejected_total{reason}for SLO dashboards and abuse alerts.
Lets operators verify the binary / image they downloaded was built by this repo on a tag push and contains no known HIGH/CRITICAL CVEs.
- Release artifacts now ship SBOMs. Every binary in the GitHub
release has a matching
.spdx.json(Syft, SPDX-JSON predicate) plus a.sig+.pemcosign signature pair (keyless via Sigstore Fulcio- Rekor). The container image is signed by digest with
cosign signand the SBOM is attached as acosign attest --type spdxjsonattestation.
- Rekor). The container image is signed by digest with
- CI fails on supply-chain regressions.
aquasecurity/trivy-actionscans the production Dockerfile build on every PR and fails onHIGH/CRITICALCVEs in OS or Go-library layers (withignore-unfixed: trueso we don't block on unpatched upstream CVEs that the SBOM still surfaces downstream).golang.org/x/vuln'sgovulncheckcovers reachable vulns in the Go module graph on the same trigger. - Verification is documented.
docs/supply-chain.mdwalks throughcosign verify-blob,cosign verify, andcosign verify-attestationwith the exact--certificate-identity-regexpoperators should require.
Until now the per-session replay buffer doubled as an audit trail — fine for client reconnect, useless for "who did what when" queries without scraping every SID's ring buffer. This carves out a dedicated JSON-Lines sink so security tooling has one stable stream to consume.
- New
internal/auditpackage.Eventstruct,Sinkinterface (Log,Close), and four backends:none(silent),stderr(default — interleaves with regular slog, grep bykind),stdout(sidecar-friendly),file(append-only at0o600, parent dir created at0o700). - Wired into the four security-critical paths:
ws.connect(sub, sid, remote, jti),ws.auth_failed(remote, reason),auth.refreshandauth.revoke(sub, sid, jti, token_kind), andapproval.granted/approval.denied(sub, sid, approval id, deny reason). Each line is self-contained JSON — pipe straight intojq, promtail, or a SIEM agent. - Defaults to
stderrso operators see audit events from the first boot without configuring a path; flip tofilefor durable per-deploy logs. - Audit calls never block or fail the action they record. Write errors fall back to slog rather than propagating; the approval grant/deny flow proceeds whether or not the sink wrote.
See internal/audit/audit.go for the
event schema and internal/wsserver/audit_integration_test.go
for the end-to-end wiring tests.
The original README claimed "explicit biometric approval" but the SPA shipped a one-tap Approve button. Native biometrics (Face ID / Touch ID) are unavailable in the web-only export, and WebAuthn requires HTTPS — which the default deploy doesn't have because Tailscale handles transport encryption end-to-end. This phase aligns the README with reality and adds a real explicit-consent gate that works on the plain-HTTP deploy.
- Typed-confirmation gate (
ApprovalSheet): the operator must type the exact tool name (case-insensitive) before the Approve button enables. Disabled state surfaces asaccessibilityState.disabledso screen readers announce it. Deny remains one-tap with the existing optional reason field. -
requireTypedConfirmationprop (defaulttrue) lets callers opt out (test fixtures, low-risk deployments). - README accuracy fix. The Security Considerations bullet now describes typed-confirmation as the default and points WebAuthn-based biometric at the TLS-reverse-proxy upgrade path.
- WebAuthn is the documented next step for operators behind TLS termination; it stays out of this phase to keep scope tight and avoid forcing an HTTPS dependency on the default deploy.
Protects existing user state from a bad upgrade. The previous code
ran CREATE TABLE IF NOT EXISTS and called it done — fine for a
fresh deploy, useless for catching a corrupted page mid-upgrade
or refusing to start when an operator accidentally downgrades to a
binary that doesn't know about the current schema.
-
PRAGMA integrity_checkon every store (sessions.db,history.db, the JTI revocation DB). Constructors refuse to boot on anything other thanok— page-level corruption that a normal query path might miss surfaces immediately at startup. - Forward-only migration framework
(
internal/dbutil). Each store declares a[]dbutil.Migrationslice keyed byVersion. Migrations run in their own transaction that also bumpsPRAGMA user_version— a failed migration rolls back atomically and the same step retries on the next boot. Versions must be contiguous starting at 1. - Refuse-to-boot on accidental downgrade. If
user_version > max(migrations), the constructor returnsErrSchemaTooNewinstead of silently writing to a schema it doesn't understand. - Cross-package integration test confirms every real store
bumps
user_versionto ≥ 1 on first open and stays at the same version after a restart, catching the failure mode where a future maintainer wires a migration list but forgets to callMigrate.
See docs/operations.md
for inspection commands and the migration authoring rules.
The old /healthz returned 200 even when the SQLite stores were
unreachable, and docker-compose.yml had healthcheck: disable: true
because distroless/static ships no shell or wget. Both are fixed
here.
- New
GET /readyzthat probes each configured SQLite store (sessions.db,history.db, the JTI revocation DB) with a 2-second per-probe budget and returns200 {"status":"ok","checks":{...}}or503 {"status":"degraded","checks":{"name":"<error>","..."}}. -
/healthzstays pure liveness — always 200 if the process is responding. Restart loops bind to that; alerting binds to/readyz. -
-healthcheck <url>flag on the orchestrator binary does a 3-secondGETand exits0/1— reuses the same binary as its own probe client so distroless/static doesn't need a shell. -
docker-compose.ymlwiresHEALTHCHECK ["CMD", "/usr/local/bin/orchestrator", "-healthcheck", "http://127.0.0.1:8080/readyz"]with a 30s interval, 3 retries, 15s start period. Compose flips the container tounhealthyafter three consecutive failures andrestart: unless-stoppedbounces it. -
PingContext(ctx)added to the three SQLite stores so the probe is a cheapSELECT 1round-trip, not a write.
See docs/operations.md
for the liveness-vs-readiness contract and the systemd notes.
Until now, a primary or secondary GitHub rate-limit during a github_ tool call surfaced straight to the model mid-turn — the biggest source of "your assistant just died" failures under any serious workload.*
- Pattern-matches the upstream's error text (
api rate limit exceeded,secondary rate limit,abuse detection,rate limit reset at, …) — thegithub-mcp-serversubprocess can't pass headers through stdio, so the marker scan is the only signal we have. - Bounded exponential backoff with jitter between retries
(
NOMADDEV_GITHUB_RATE_LIMIT_BASE_BACKOFF, default1s; capped at 30s). The upstream'sRetry-Afterhint, when surfaced in the error text, takes precedence over the calculated value. -
NOMADDEV_GITHUB_RATE_LIMIT_RETRIEScaps re-invocations (default 3). Setting to 0 disables retry entirely (pre-8.9 behavior — first rate-limit error surfaces to the model). -
nomaddev_github_rate_limit_retries_total{outcome}—outcome ∈ {retried, gave_up}. Alert on a non-zerogave_uprate or a spike inretriedand you know the PAT scope or tool mix is hitting the API too hard. - Caller-ctx honored mid-backoff — if the user.intent
ctx fires while we're sleeping for a retry, we surface the
rate-limit message immediately and bump
gave_uprather than blocking past the turn budget. - Marker-matcher and backoff helpers are tag-free so the
default-build suite covers them; the
*mcp.CallToolResult-aware wrappers live under-tags githubwith their own test file.
The previous deploy mentioned sqlite3 .backup as a footnote and
left scheduling to the operator. Now the systemd quickstart installs
a daily backup timer; the Docker path inherits the same script via
documented host-cron usage.
-
infra/scripts/nomaddev-backup.sh— usessqlite3 .backup(online API, safe with concurrent writers) for each of the three SQLite stores (sessions.db,history.db,revocations.db); verifies every snapshot withPRAGMA integrity_checkbefore gzipping, so a corrupt source DB fails the timer rather than poisoning the archive directory; prunes archives older than the configurable retention horizon. -
nomaddev-backup.service+.timer— aType=oneshotunit driven by a daily timer withRandomizedDelaySec=15minandPersistent=true(a host that was offline at 03:00 runs the missed backup on next boot). -
quickstart-systemd.shinstalls the script to/usr/local/bin/nomaddev-backup, drops the service + timer in place, ensuressqlite3is present (viaapt-get), and enables the timer. The done-message surfaces the timer next-run, snapshot destination, and retention. - Configurable via env vars —
NOMADDEV_BACKUP_DIR(default${DATA_DIR}/backups) andNOMADDEV_BACKUP_RETENTION_DAYS(default 14). Operators on external storage (NFS, object-store gateway) pointNOMADDEV_BACKUP_DIRat the mount and the existing systemd hardening (ProtectSystem=strict, explicitReadWritePaths) keeps the unit tight. - Restore procedure documented in
docs/operations.md— stop the orchestrator, decompress the chosen snapshot, swap files, restart. The orchestrator's startup integrity check (Phase 8.7) catches any inconsistency in the restored file before it accepts writes.
All ten items from the review's /root/.claude/plans/review-this-repository-and-delegated-moon.md
top-10 are now shipped (8.1 through 8.10). The review's wider gap
list still has ~50 unaddressed items grouped by lens — see the plan
file for the inventory.
Objective: Work the Developer Experience lens from the review's wider gap list. Small, cohesive items that unblock contributors. 4/4 batches shipped (9.1 governance, 9.2 CI coverage + ADR + ChatScreen test + dev-loop docs, 9.3 session-export CLI + SQLite chaos tests, 9.4 mobile E2E).
-
SECURITY.md— disclosure policy via GitHub Security Advisories, supported-versions matrix, response-timeline commitments, and a clear in/out-of-scope list. -
CONTRIBUTING.md— local-dev setup, build-tag matrix, commit + PR style, CI job rollup, ADR convention, test layout. -
CODE_OF_CONDUCT.md— Contributor Covenant 2.1 by reference + the reporting channel.
- CI coverage floor. The
testjob now emits acoverprofile, prints the func-level summary, enforces a 55% minimum (current measured 64%), and uploads the report as a 14-day artifact. Floor set well below the current level so legitimate refactors don't bounce the build; tighten as the suite grows. - ADR practice adopted.
docs/adr/0001-record-architecture-decisions.mdcodifies when a decision warrants an ADR and pins the four-section format (Status / Context / Decision / Consequences). Past decisions stay un-ADR'd; new cross-cutting ones get one. -
ChatScreen.test.tsx. The mobile suite covered ApprovalSheet, SettingsScreen, the store, and the wire client — but the top-level screen that ties them together had zero coverage. New tests exercise empty state, turn rendering, Composer submit + disabled-when-not-open, the approval grant (with the typed-confirmation gate from 8.6) + deny paths, and the gear-button navigation. 7 new tests, full mobile suite at 34. - GitHub MCP local-dev loop. New section in
docs/github.mddocuments the no-PAT default path plus the tiered fidelity ladder (upstream binary install at the pinned version → fine-grained PAT against a throwaway repo → mock-translator orchestrator with auto-grant approvals → wsclient one-shot tool call). Avoids burning the live-CI PAT rate budget for contributor exploration.
-
cmd/session-export— small Go binary that dumps one SID's data fromsessions.dborhistory.dbas JSON Lines. Opens the DB read-only so a running orchestrator isn't disturbed; auto-detects which store the file is viasqlite_master. 7 tests cover SID filtering, both auto-detect paths, the both-tables ambiguity case, and explicit--kindoverride on the wrong store. - SQLite chaos / failure-injection tests. New
internal/dbutil/chaos_test.gocovers four real-world failure modes: bit-flip corruption (integrity_check surfacesErrIntegrityCheckFailed), half-truncated file (integrity_check or first read fails), non-SQLite file at the configured path (Ping fails cleanly), and atomic-rollback of a partially-applied migration (alphatable must not exist +user_versionmust not bump).
Closes the DX-lens follow-up that needed its own PR because Playwright brings a separate test stack (real browser, full orchestrator round-trip) from the Jest unit suite.
-
@playwright/testadded as a mobile devDep (chromium only — extra browsers add test time without catching real regressions for a small web SPA). -
mobile/e2e/onboarding-to-first-turn.spec.tsdrives the exact code path operators hit on a phone: fragment-based deep link (#token=…&sid=…), fragment-stripped on first paint, navigates to /chat, WS handshake to "open", Composer un-disables, user types a turn, mock translator's canned reply lands in the feed. - New
mobile-e2eCI job. Builds the SPA + orchestrator withmake build-full, starts the binary with mock translator + auto-grant approvals + memory backends, waits for/healthz, mints a JWT viascripts/gen-jwt(masked in the workflow log), runs Playwright. Uploads the HTML report + traces on failure for post-mortem. - Jest excludes
e2e/viatestPathIgnorePatternsso the unit-test stack doesn't trip on Playwright's node-only globals. Full mobile Jest suite still at 34 passing.
Phase 9: Developer-experience lens — done. All four shipped
items (9.1–9.4) plus the deferred reproducible-build verification
that needs diffoscope for the next attempt — captured in
claude/dx-tooling's revert commit, which records the
investigator-friendly diagnostic context.
Objective: Work the Security-gaps-beyond-top-10 lens from the review's wider gap list — items the original top-10 prioritization left for follow-up because they're either narrower in blast radius or carry more architectural weight. Both batches (10.1 + 10.2) shipped.
-
CheckOriginallowlist.gorilla/websocket's upgrader previously accepted any origin unconditionally. NewNOMADDEV_WS_ALLOWED_ORIGINS(CSV) populates a strict case-insensitive same-origin gate on/ws. Empty preserves the pre-10.1 behavior (Tailscale deploys have no meaningful browser origin boundary); operators behind a TLS reverse proxy turn on the gate without code changes. Same-origin / non-browser clients without anOriginheader always pass. - CSP + hardening headers on the SPA.
withSecurityHeaderswraps the SPA handler withContent-Security-Policy(default-src 'self',connect-src 'self' ws: wss:,frame-ancestors 'none'),X-Content-Type-Options: nosniff,Referrer-Policy: strict-origin-when-cross-origin,X-Frame-Options: DENY. The/wsand/metricspaths keep their existing shapes — CSP only applies to browser-context responses. - JWT secret rotation grace window. New
NOMADDEV_JWT_PREV_SECRETS(CSV) lets the verifier accept tokens signed under previous-generation secrets while new tokens are signed underNOMADDEV_JWT_SECRET. Rotation workflow lives indocs/auth.md. Startup logsorchestrator: JWT rotation grace activewhen any prev secrets are configured. - Inline-script secret redaction. The Phase-7
RedactArgshelper masks values of sensitive-keyed args but leftscriptcontent alone — anexport TOKEN=abc123line in a bash script reached the approval card in plain text. NewredactScriptscans script-shaped arg values for(export|set)? NAME=VALUEshapes and masks the value whenNAMEmatches the same sensitive-key list. Heuristic on purpose: prose-shaped fields (body,description) don't get the scanner;script/commandkeys do.
- Per-session sandbox workspace. New
NOMADDEV_SANDBOX_PER_SESSION_WORKSPACEflag (default false for back-compat). When true, the docker runner bind-mounts<WorkspaceDir>/<sanitized-sid>/at/workinstead of the shared root.sandbox.ExecRequest.SessionIDcarries the SID from the WS layer through both the direct command.request path and the middleware tool-dispatcher path. The SID is sanitized (alphanumerics +-_., capped at 64 bytes,..collapsed to__) so a malformed claim can't escape the workspace root. -
sanitizeSIDtested in 4 scenarios covering allowed characters, path-traversal collapse, shell-meta stripping, and the 64-byte length cap. - Known limitation captured.
fsopsstill operates on the shared root — per-fsops isolation is a separate plumb-through that's deferred because the engine is a Service-level singleton today. Documented indocs/sandbox.mdso multi-tenant operators know to treat sandbox isolation as defense-in-depth on top of per-user PAT scoping rather than a complete boundary. - User-namespace remapping documented in
docs/sandbox.md. Daemon-level config (/etc/docker/daemon.jsonwith"userns-remap": "default"); the orchestrator can't drive this from inside, but the doc captures the workspace-ownership trade-off (chown 100000:100000vs running orchestrator asdockremap). - Total-resource budgeting documented in
docs/sandbox.md. Worst-case container RSS isMAX_CONCURRENT × MEMORY; the existing semaphore caps concurrent runs. Added a sizing table for the common deploy profiles (CX22, CAX11, multi-tenant). A pool-style "total memory budget" model is architecturally bigger than per-run caps; the per-run × concurrent product covers the same blast radius for any realistic deploy.
Phase 10: Security gaps not in the top-10 — done. Both batches (10.1 wire + auth + redaction hardening, 10.2 per-session isolation
- userns / quota docs) shipped. Future security follow-ups would target per-fsops session isolation (engine refactor), a real total-memory pool model (only if a multi-tenant deploy hits the worst-case sizing), and per-tool scopes on the JWT.
Objective: Work the Production-hardening lens — the last remaining lens from the missing-features review. Operator-facing observability, deployment automation, and the docs that turn the orchestrator from "runs on a box" into "operable in production by someone who didn't write it."
- Grafana dashboard at
monitoring/grafana-dashboard.json— 10 panels covering the SLO surface area: active WS conns, connect-rate by outcome, sandbox p50/p95/p99, middleware turn rate + latency, per-tool GitHub MCP rate, rate-limit retries, inbound rejection reasons, session-event throughput by kind. Import via the UI (uidnomaddev-overview) or provision-as-config. - Prometheus alert rules at
monitoring/alertmanager-rules.yml— 7 rules across three groups (availability, capacity, security). Every rule binds to a metric already exported frominternal/metrics; no new instrumentation required. - Tailscale ACL example at
infra/tailscale/acl-example.hujson— default-deny tailnet policy with two invariants: thenomaddev-usersgroup reaches:8080, onlynomaddev-adminscan shell into the host. Taggedtag:nomaddev-server. Test stanzas pin the invariants so the admin console refuses to publish a broken policy. - Cloud-init template at
infra/cloud-init/nomaddev-bootstrap.yaml— drop into a fresh Ubuntu 24.04 VPS at provision time and the orchestrator is up + on the tailnet without an SSH session. Pairs with the Tailscale ACL above. Templates JWT secret, Tailscale auth key, and Gemini API key from cloud-provider user-data substitution. - Data-handling / privacy doc at
docs/privacy.md— inventories every piece of data the orchestrator touches: what's persisted, where, for how long, what leaves the host (Gemini, GitHub, Tailscale), audit-trail content, wire redaction limits, retention policy summary, and a wipe-everything recipe. - Single-node disclaimer + log-rotation guidance added to
docs/operations.md. Captures the supported-deploy posture explicitly (no active-active, no failover, hub state is in-process), sketches what a real HA shape would need (shared DBs + stateless hub + network-attached audit), and ships a/etc/logrotate.d/nomaddevrecipe for the file-backend audit log usingcopytruncate.
- New
internal/tracingpackage.Init(ctx, Config, log)wires the globalTracerProviderwith an OTLP/HTTP exporter and returns a Shutdown hook callers defer unconditionally (no-op when disabled). Quiet fallback on misconfiguration — a typo in the OTLP URL logs a warning and disables tracing instead of taking the orchestrator down. - Default off.
NOMADDEV_OTEL_ENABLED=falseis the shipping default;otel.Tracer(...)returns a noop tracer at every call site so the codebase pays only the tens-of-nanoseconds tracer-noop cost when tracing is off. - First span:
ws.dispatch.<envelope.type>. One root span per inbound envelope on the dispatcher entry point withenvelope.type,session.sub,session.sidattributes. Gives operators immediate trace-side visibility per turn / per command.request without spreading instrumentation through every package; future Phase-11.3 can add child spans on sandbox.Exec / githubmcp.Call when the trace shape stabilizes. - Config knobs.
NOMADDEV_OTEL_OTLP_ENDPOINT(collector URL),NOMADDEV_OTEL_SERVICE_NAME/_VERSION(resource attributes),NOMADDEV_OTEL_SAMPLE_RATIO(0.0–1.0, parent-based head sampling),NOMADDEV_OTEL_INSECURE(plain-HTTP collector on a Tailscale tailnet, default true). Documented in.env.exampleand tested ininternal/tracing/tracing_test.go(disabled-default, bad-endpoint, defaults-filled-in).
-
SIGHUPreopensaudit.log. Newaudit.Reopenerinterface;JSONSink.Reopen()closes the current file and opens a fresh fd at the same path. Non-file sinks (stderr/stdout/noop) treat Reopen as a no-op so the SIGHUP handler incmd/orchestrator/main.gocalls it unconditionally. The logrotate recipe indocs/operations.mdswapscopytruncatefor apostrotateSIGHUP — no events truncated, no in-flight buffer lost. -
sandbox.execspan (Phase 11.3) on the docker runner withsandbox.tool/sandbox.session_id/sandbox.shell/sandbox.timeout_msattributes. Wraps the bind-mount + container lifecycle so the span's wall-clock covers the full run. -
github.callspan (Phase 11.3) on the GitHub MCP client withgithub.tool/github.session_idattributes. Args are deliberately omitted from span attributes — they'd dwarf trace storage and could leak secrets. - Two new audit tests pin the file-Reopen path (write, rename, reopen, write — pre-HUP event in the rotated file, post-HUP event in the fresh file) and the non-file-sink no-op invariant.
-
traceparentextraction at upgrade.wsHandlercallsotel.GetTextMapPropagator().Extractagainst the upgrade headers BEFORE the connection's lifetime begins; the resultingconnCtxis threaded intorunConnection→readPump→dispatch. A traceparent from an otel-instrumented client (browser SPA, curl--header, sibling service) lands as the parent of thews.dispatch.<envelope.type>span. - W3C propagator registered.
tracing.Initnow installs a compositeTraceContext{} + Baggage{}propagator — the default is no-op, so without this the extract call would silently lose every parent context. - Dispatcher ctx threaded through to runners.
handleCommandRequest/handleUserIntentnow take adispatchCtxfromdispatch; both derive their per-job cancel-ctx (execCtx/turnCtx) from it instead ofcontext.Background(). The 11.3sandbox.execandgithub.callspans now chain under thews.dispatchroot → flame-graph view shows the full upstream → dispatch → tool tree end-to-end. - New
trace_propagation_test.gouses the otel in-memory exporter to assert that a synthetictraceparenton the upgrade lands on the dispatch span'sTraceIDandParent.SpanID— pins the contract.
Phase 11: Production hardening — done. Four batches shipped (11.1 observability + IaC + privacy + ops docs, 11.2 OpenTelemetry wiring + dispatch span, 11.3 SIGHUP-reopen + per-tool child spans, 11.4 trace propagation + dispatcher ctx threading). The tracing story is now complete: end-to-end spans from any otel-instrumented upstream through the orchestrator and out to the sandbox / GitHub MCP tool.
- Per-tool JWT scopes. New
internal/auth/scopes.goplus scope checks at both dispatch entry points (the directcommand.requestpath and the middleware tool-dispatch path). Two-tier policy: tokens whosescopeslist has notools:entry are legacy-permissive (pre-12 mints keep working); once anytools:<x>is named, strict mode kicks in and only listed tools are allowed.tools:*is the wildcard;tools:githubauthorizes the wholegithub_*family; per-tooltools:github_<name>always wins over the family scope. 7 unit tests pin the policy. Documented indocs/auth.md. -
traceparentvia query string. The browser WebSocket API doesn't let JS set custom upgrade headers, so the SPA can't ship atraceparentheader. NewwsHandlerfallback: when the upgrade carries notraceparentheader, the orchestrator extracts it from?traceparent=…on the URL instead. Header wins on both being present so a transparent reverse proxy can override. Pinned by a second propagation test using an in-memory exporter. - Reproducible-build report-only CI job. Picks up the
PR #32 deferral. New
reproducible-build-reportjob inci.ymlbuilds the orchestrator twice with the release-workflow flags, runsdiffoscopeagainst the two binaries when the hashes mismatch, and uploads the report as a 14-day artifact. Non-blocking (continue-on-error: true) so a real reproducibility regression doesn't bounce unrelated PRs — the artifact is the deliverable.
- SPA-side
traceparentmint + inject. Newmobile/src/wire/traceparent.tsgenerates a W3C00-<32hex>-<16hex>-01value per connection usingcrypto.getRandomValues; the WS URL builder appends it as?traceparent=…. Pairs with 12.1's server-side query-string fallback so mobile-side timing shares atrace_idwith the server-side dispatch spans (Phase 11.2 / 11.4). 3 unit tests pin the W3C format, randomness, and the crypto-required invariant. - Per-fsops session isolation. Phase 10.2's known
limitation (
fsops still operates on the unscoped root) is now closed.fsops.Enginegains aPerSessionfield; the middleware dispatcher attaches the calling SID viafsops.WithSessionID(ctx, sid)before invokingEngine.Run.resolveSafereads the SID from ctx and routes paths through<root>/<sanitized-sid>/(created at 0o700 on first use) when per-session mode is enabled. Reuses the Phase-10.2NOMADDEV_SANDBOX_PER_SESSION_WORKSPACEknob — sandbox + fsops isolate in lockstep. 4 new tests pin: per-SID path separation, empty-SID falls back to shared root,perSession=falseignores SID, and..-traversal still rejected under the per-SID prefix.
- New
internal/webauthnpackage wrappinggithub.com/go-webauthn/webauthn.Serviceowns the four ceremony entry points (BeginRegistration / FinishRegistration / BeginLogin / FinishLogin); the SQLite-backedStorepersists per-(sub, credential_id) rows with the public key, sign count, and attestation type. Uses the Phase 8.7 dbutil migration pattern. - In-memory
SessionCachefor in-flight ceremony challenges. 5-minute TTL, used-onceTakesemantics so a replayed finish gets a clean miss; pruned on every Put / Take. - Four new HTTP endpoints under
/auth/webauthn/:register/begin+register/finish— JWT-gated; an operator must already be authenticated to add a security key to theirsub.login/begin+login/finish— unauthenticated; takessuband returns a fresh JWT pair on successful assertion.
- Probe resistance.
login/beginreturns the same 401 message whether the sub exists with no keys or doesn't exist at all; the server log carries the real reason for the operator. - Disabled by default. WebAuthn requires HTTPS-or-localhost,
which the default Tailscale plain-HTTP deploy doesn't have. The
routes only register when
NOMADDEV_WEBAUTHN_ENABLED=true; unregistered routes return 404 (the canonical "not configured" signal). - 9 unit tests + 5 handler tests pin the store roundtrip, the session-cache TTL + used-once semantics, the disabled-route 404, JWT-required behavior, the begin-register options+session-token shape, and the probe-resistant login-begin error.
See docs/webauthn.md for the operator
workflow, threat model, and SPA-side integration sketch.
- New
mobile/src/wire/webauthn.tswraps the four server endpoints with aregisterSecurityKey(...)/signInWithSecurityKey(...)pair. Owns the base64url ↔ ArrayBuffer conversion the W3C API requires forchallenge,user.id,excludeCredentials[].id,allowCredentials[].id, plus the W3C-shaped attestation / assertion JSON the server's go-webauthn parser expects on finish. - Settings screen gains a "Register security key" button
with an optional label input. The button is gated on
isWebAuthnAvailable()— present only when the page is loaded over HTTPS or http://localhost (matches the WebAuthn spec requirement and the docs/webauthn.md prerequisite). - Onboard screen gains a "Sign in with security key" path
alongside the existing JWT-paste flow. On success the returned
JWT pair lands in the same
setCredentials(url, token)slot, so the WS client picks up immediately. - Probe-resistant error passthrough. When the server returns its deliberately-opaque "no security key registered for that account" 401, the SPA surfaces the server message verbatim rather than inventing a clearer "user not found" string — preserves the threat model end-to-end.
- 16 new unit tests (
mobile/src/__tests__/webauthn.test.ts) pin the base64url roundtrip, option-decoding (creation + request), attestation / assertion serialization, the isWebAuthnAvailable feature gate, and full register / login ceremonies with mockedfetch+navigator.credentials. Browser-sidenavigator.credentials.create/getis end-to-end covered by Playwright's virtual authenticator when the real ceremony is wired into the E2E (future follow-up).
Closes the unbounded-growth gap in history.db: long-running
sessions inflated Gemini context tokens on every user.intent
(via LoadWindow) and grew the on-disk file forever. A background
goroutine now collapses the oldest half of a session's text into
one system.summary row once it crosses a configurable word
budget.
- New
Compactor+Summarizerininternal/history/summarizer.go. Janitor goroutine ticks everyNOMADDEV_HISTORY_SUMMARY_INTERVAL(default 5 m). For each session inturns, sumsstrings.Fieldsword counts acrossrole IN ('user','assistant')rows; if the total crossesNOMADDEV_HISTORY_SUMMARY_WORD_THRESHOLD(default 15000), POSTs the oldest 50 % toNOMADDEV_HISTORY_SUMMARY_URLas a[{role,text,ts}]array and reads{"summary": "..."}back. One transaction deletes the victims and inserts a singlerole = 'system.summary'row at the smallest freedturn_idxso chronological order is preserved. Opt-in (NOMADDEV_HISTORY_SUMMARY_ENABLED, default off); SQLite backend only. - No schema change — Phase 8.7 contract preserved. The
system.summaryvalue is just data in the existingrole TEXTcolumn. The migrations slice ininternal/history/sqlite.gostays atVersion: 1.PRAGMA user_versionis still1after the change;internal/dbutil's integrity-check and downgrade-protection invariants are untouched. - Concurrency-safe. Compaction acquires the same per-SID
mutex that
Appenduses, soturn_idxstays monotonic against concurrent wsserver appends. Tested by a 20-append / 1-compaction race ininternal/history/summarizer_test.go. - Audit-safe.
tool_call/tool_resultrows are never selected for summarization — the LLM-bound textual chatter goes through the summarizer; structured tool I/O stays intact. - Failure-safe. Any non-2xx response, decode error, or
empty
summaryaborts the transaction; the database is left untouched and the next tick retries naturally. - Wire-compatible. Summary rows carry the same
{"text": "..."}parts_jsonshape as user/assistant turns, so the translator's history-replay path needs no special-casing. - 8 new unit tests cover below-threshold no-op,
oldest-half replacement with tool-row preservation,
idx-monotonic
Appendafter compaction, summarizer-error rollback, concurrent-Append safety, reopen survival, the HTTP client wire shape, and multi-session sweeps.
See docs/middleware.md
for the architecture and
docs/operations.md
for the env var table and inspection commands.
Remaining Phase-12 follow-ups: pool-style memory quota (only
if a multi-tenant deploy hits the worst-case sizing — documented
sizing approach in docs/sandbox.md covers the same blast
radius); mobile native build (Expo EAS — separate infra setup).
Closes the "every failing tool call burns a human-input turn" gap:
when a middleware-dispatched command.request returns a retryable
failure (non-zero exit, sandbox_timeout, sandbox_oom), the
orchestrator now formats the captured stderr as a structured
system.error_report and feeds it back into the translation layer so
the LLM can author a fix as a new command.request. Bounded retry
prevents an infinite loop; final failure is escalated to the Mobile
Control Hub as a wire envelope.
- New
system.error_reportevent type ininternal/event/types.gowith payload{tool, original_call_id, exit_code, error_code, error_message, stderr, attempt, max_attempts, escalated}. Used in two places: as aToolResult.Output["error_report"]enrichment that the translator reads on the next stage, and as a wire envelope to the Mobile Control Hub on budget exhaustion (escalated:true). - Recovery state machine in
internal/middleware/recovery.go:ShouldAutoRetry(exitCode, errCode)classifies retry-eligible failures (non-zero exit,sandbox_timeout,sandbox_oom; structural errors likesandbox_bad_request/sandbox_unauthorizedare terminal).BuildErrorReport(...)formats the payload and tail-truncates stderr to 8 KiB.RetryBudgettracks consecutive failures so a sporadic transient doesn't burn budget for the rest of a multi-step turn. - Orchestration loop in
internal/wsserver/middleware.go.consumeStagenow allocates a per-turnRetryBudget(MaxAutoRetries), enriches the resumedToolResulton retry, and on exhaustion emits thesystem.error_reportenvelope viabufferAndSendthen closes the turn withfinish_reason="error". - Configuration knob.
NOMADDEV_MAX_AUTORETRIES(default2) wires throughconfig.MiddlewareConfig.MaxAutoRetries→middleware.RuntimeConfig.MaxAutoRetries.0disables the loop entirely; the first retryable failure escalates immediately. - Test coverage. Recovery primitives unit-tested in
internal/middleware/recovery_test.go. End-to-end behavior pinned byTestMiddleware_AutoRetry_*ininternal/wsserver/middleware_test.go: single-failure recovery (no wire envelope), budget exhaustion (exactly onesystem.error_reportenvelope, threecommand.requestenvelopes forMaxAutoRetries=2), zero-budget immediate escalation, and non-retryable failures bypassing the loop.
See docs/middleware.md
for the architecture and
docs/events.md for the
wire-level sequence diagram.
Closes the "the LLM applied a patch that breaks the build, and now the
next tool call is fighting a corrupted workspace" gap. apply_code_patch
gains an optional verify_command that runs in the ephemeral sandbox
immediately after the write; a non-zero exit rolls the file back to its
pre-edit contents and feeds the verify command's stderr into the Phase 13
auto-recovery loop so the LLM authors a fix on the next stage.
- Schema + validation.
verify_command(optional string, ≤ 8 KiB) added to theapply_code_patchtool spec ininternal/middleware/tools.go.Validate(ToolApplyCodePatch, …)type-checks and length-caps it. - Snapshot-aware fsops. New
Engine.ApplyCodePatchWithSnapshotandEngine.RestoreFileininternal/fsops/run.goreturn the pre-edit file bytes alongside the apply result and provide a scope-checked restore primitive.applyCodePatchPlancarries the original bytes so the snapshot is captured during the same read that drives the TOCTOU-closing dry-run — no extra disk hit. - Composition path.
CompositeDispatcher.applyCodePatchWithVerifyininternal/middleware/dispatcher.goapplies the patch, dispatchesverify_commandas anexecute_scriptrun in the same workspace, streams its chunks through the same channel the caller already consumes, and on any non-zero exit / runner failure restores the file and appends arolled backstderr notification. The terminal frame carries the verify command's exit code with noSandboxErr*code, soShouldAutoRetrytreats it as retryable and the recovery loop feeds the verify stderr back to the translator. - Approval surfacing.
Server.buildApprovalPreviewininternal/wsserver/sandbox.gocopiesverify_commandinto the approvalpreviewpayload alongside the diff so the operator sees what will run AND that a non-zero exit will roll the patch back. The mobile ApprovalSheet renders a new "Verify after apply" row labeled "rollback on non-zero exit" (mobile/src/components/ApprovalSheet.tsx). - Test coverage. Unit tests in
internal/middleware/tools_test.gopin schema validation; round-trip and out-of-root tests ininternal/fsops/engine_test.goexerciseApplyCodePatchWithSnapshotandRestoreFile; end-to-end composition tests ininternal/middleware/dispatcher_apply_verify_test.gocover verify-success, verify-failure-rollback, dispatch-error-rollback, missing-sandbox fast-fail, and the empty-string fallback to the plain fsops path. The mobile ApprovalSheet test asserts the verify row renders only when the preview carries one.
See
docs/middleware.md
for the dispatcher composition walkthrough.
Closes the "one big migration runs strictly serially" gap: a refactor
that touches a dozen independent files used to be one long tool-call
chain, each edit waiting on the last. dispatch_worker_pool lets the
orchestrator fan a migration out across isolated git worktrees, fork the
conversation context into one headless sub-dispatcher per sub-task, run
them in parallel under a concurrency cap, and merge each finished branch
back into the primary branch. Opt-in and off by default — it grants the
orchestrator a new host-side git privilege.
- New
dispatch_worker_pooltool. Tool spec, arg/result types,ParseWorkerPoolArgs, and the up-front disjointness validator live ininternal/middleware/workerpool.go; the tool is registered inIsMutatingBaseTool/KnownTool/Validateininternal/middleware/tools.goand appended to the catalogue (with approver gating) ininternal/middleware/factory.goonly whenNOMADDEV_WORKER_POOL_ENABLED=true. - Host-side git control plane. New
internal/gitctlpackage shells out to the hostgitbinary (worktree add/remove, commit, merge) againstNOMADDEV_SANDBOX_WORKSPACE_DIR, which must be a pre-cloned git repo. Every invocation passes-c core.hooksPath=/dev/null(repo-supplied hooks never run — they would be host RCE), plusGIT_CONFIG_NOSYSTEM=1,GIT_CONFIG_GLOBAL=/dev/null,GIT_TERMINAL_PROMPT=0, and a fixed argv (no shell). - Headless sub-dispatchers.
runWorkerPool/dispatchOneTask/runWorkerToolCallininternal/wsserver/workerpool.gocreate one git worktree + temp branch per sub-task under<workspace>/.nomaddev-worktrees/<id>, fork the parent session's windowed conversation history into an independent headless turn loop seeded with that sub-task's prompt, and run the loop confined to the worktree. - Approval gate intact. The
dispatch_worker_poollaunch is a mutating tool and takes one human approval; every mutating tool call a headless sub-dispatcher makes (write_patch,apply_code_patch,execute_script, …) still goes through the normal human-approval round-trip. Nothing is auto-granted. - Conflict-free by construction. Each sub-task declares a
pathsarray of files/dirs it will modify; the orchestrator rejects the call up front if any two scopes overlap (equal or nested). After a sub-dispatcher finishes, a post-commitgit diffcheck verifies the changed files stayed inside the declared scope — a task that escaped is markedscope_violationand is not merged (its branch is kept for inspection). Disjoint scopes mean the merge-back never conflicts. - Fork-bomb guard. A sub-dispatcher's tool catalogue
(
SubDispatcherTools) excludesdispatch_worker_pool— workers cannot spawn pools. - Bounds.
NOMADDEV_WORKER_POOL_MAX(default4) is a server-wide concurrency semaphore;NOMADDEV_WORKER_POOL_MAX_TASKS(default8) caps thetasksarray length;NOMADDEV_WORKER_POOL_TASK_TIMEOUT(default10m) is a per-sub-dispatcher wall-clock timeout. Worktrees are removed after the pool; temp branches are deleted for merged tasks and kept for failed / scope-violated ones. RequiresNOMADDEV_SANDBOX_PER_SESSION_WORKSPACE=false— the tool returns a clear error otherwise. - Wire + metrics. New
worker.updateevent (internal/event/types.go) streams sub-task lifecycle progress; three Prometheus instruments (nomaddev_worker_pool_dispatches_total,nomaddev_worker_pool_tasks_total,nomaddev_worker_pool_task_seconds) land ininternal/metrics/metrics.go.
See docs/middleware.md
for the orchestration walkthrough,
docs/events.md for the
worker.update wire shape, and
docs/adr/0002-concurrent-worker-pool.md
for the design decisions.
export NOMADDEV_JWT_SECRET="$(head -c 48 /dev/urandom | base64 | tr -d '\n')"
make build
./bin/orchestrator -listen :8080In another shell, mint a token and connect:
TOKEN="$(go run ./scripts/gen-jwt -sub matt -sid sess-1 -ttl 1h)"
./bin/wsclient -url ws://127.0.0.1:8080/ws -token "$TOKEN" -send pingFor the Phase 8 access/refresh flow, mint both at once and use the
refresh endpoint to rotate the access token without re-running
gen-jwt:
PAIR="$(go run ./scripts/gen-jwt -kind pair -sub matt -sid sess-1)"
ACCESS="$(echo "$PAIR" | jq -r .access_token)"
REFRESH="$(echo "$PAIR" | jq -r .refresh_token)"
# Use ACCESS at /ws (above). Later, exchange REFRESH for a new pair:
curl -sS -X POST http://127.0.0.1:8080/auth/refresh \
-H "Authorization: Bearer $REFRESH" | jq .
# Revoke a token before it expires naturally:
curl -sS -X POST http://127.0.0.1:8080/auth/revoke \
-H "Authorization: Bearer $ACCESS" -o /dev/null -w '%{http_code}\n'Drive the Phase 3 sandbox runner end-to-end against the mock backend:
./bin/wsclient -url ws://127.0.0.1:8080/ws -token "$TOKEN" \
-send command.request -script 'echo hi' \
-disconnect-after command.result -timeout 5sDrive the Phase 4 NLP middleware turn loop with the mock translator and the
auto-grant approval bypass (memory history so it doesn't touch /var/lib):
export NOMADDEV_MIDDLEWARE_RUNTIME=mock
export NOMADDEV_APPROVAL_AUTO_GRANT=true
export NOMADDEV_HISTORY_BACKEND=memory
./bin/orchestrator -listen :8080 &
./bin/wsclient -url ws://127.0.0.1:8080/ws -token "$TOKEN" \
-send user.intent -text "hello there" \
-disconnect-after assistant.message -timeout 10sBuild the Phase 5 SPA into the orchestrator binary and connect with a browser:
make build-full # npm install + expo export → embed → go build
./bin/orchestrator -listen :8080 &
go run ./scripts/qr-jwt \
-server-url http://127.0.0.1:8080 -sub matt -sid sess-1 -ttl 1h \
-out qr.png
# stdout prints the deep-link URL — open it in a browser or scan qr.png.For SPA dev with Metro hot-reload, run make dev-mobile and point the
Expo dev server at the orchestrator (Expo serves the UI on its own port;
the WebSocket connects back to :8080/ws).
Run the test suite:
make test-race # default Go suite — mock sandbox + mock translator
make test-mobile # mobile SPA tests (Jest + mock-socket)
make test-docker # real Docker runner round-trip (requires daemon)
make test-gemini # real Gemini API (requires NOMADDEV_GEMINI_API_KEY)CI exercises the default Go suite, the SPA test suite (Jest), the
Docker-tagged sandbox tests (the ubuntu-latest runner has Docker
pre-installed), and tag-build smoke covering -tags docker, -tags gemini, and the combined build. See
.github/workflows/ci.yml.
The Docker-tagged tests (internal/sandbox/docker_test.go) call
requireDaemon(t) and skip cleanly on machines without a daemon. The
Gemini-tagged tests (internal/middleware/gemini_test.go) call
requireKey(t) and skip when NOMADDEV_GEMINI_API_KEY is absent. The
OpenAI- and Anthropic-tagged tests
(internal/middleware/{openai,anthropic}_test.go) drive the translators
against an httptest SSE stub, so they run in CI without any API key.
Build the Docker-enabled binaries with make build-docker, or pick an
LLM backend with make build-gemini, make build-openai (which also
enables runtime=deepseek), or make build-anthropic. make build-all
links Docker, GitHub MCP, and all three LLM SDKs into one binary. See
.env.example for all configuration knobs.
Prerequisites: A fresh Ubuntu VPS (any provider — verified on Hetzner CX22 / CAX11), a Tailscale account. No DNS, no certificate, no extra infrastructure.
Pick one path:
| Path | When to use | One-command deploy |
|---|---|---|
| Docker / GHCR | Default. Sidesteps Go 1.25 / npm build on the VPS by pulling the prebuilt multi-arch image. | sudo bash infra/scripts/quickstart-docker.sh |
| systemd | When you don't want Docker on the box. Downloads the matching prebuilt binary from the latest GitHub release. | sudo bash infra/scripts/quickstart-systemd.sh |
Both quickstarts auto-detect the tailnet IPv4, generate NOMADDEV_JWT_SECRET,
install/start the service, and run the smoke test. Re-runnable.
See infra/RUNBOOK.md for the full manual
walkthrough (review-every-script discipline), Hetzner-specific notes
(Cloud Firewall, CX22 sizing, IPv6), and incident response. The Docker
image is built from the multi-stage Dockerfile
(distroless/static, pure-Go SQLite, CGO_ENABLED=0); the systemd unit
at infra/systemd/nomaddev-orchestrator.service
runs as a dedicated nomaddev user with NoNewPrivileges,
ProtectSystem=strict, and ReadWritePaths=/var/lib/nomaddev.
Metrics: the orchestrator exposes Prometheus instruments at /metrics
(connection counts, replay events, sandbox-run histograms, middleware turn
histograms). Scrape from a Prometheus instance on the tailnet.
NomadDev is designed with paranoia as a feature. The public internet never touches the orchestrator. The LLM never touches the host system. The client never touches raw SSH.
- No Open Ports: Bypasses traditional firewall risks via Tailscale.
- Total Isolation: Execution occurs entirely within ephemeral Docker containers.
- Human-in-the-Loop: Destructive commands parsed by the middleware require explicit operator approval on the mobile client. The default UX requires the operator to type the exact tool name before the Approve button enables — typed confirmation works over plain HTTP via Tailscale (the default deploy). Operators who front the orchestrator with a TLS reverse proxy can layer WebAuthn / platform-biometric authenticators on top; see Phase 8.6 below.
No SSL/TLS certificate is required to run NomadDev. The orchestrator
listens on plain HTTP (:8080) by design — Tailscale already encrypts
every byte between the host and the client device, and the JWT gates
/ws. There is no HSTS, no http→https redirect, and no cert manager
in the stack. The mobile SPA does not use any secure-context-only
browser APIs (crypto.subtle, service workers, etc.); the only crypto
call is crypto.getRandomValues, which works on plain HTTP.
If your organization demands HTTPS, drop Caddy or nginx in front of
:8080 on the tailnet and point QR onboarding at the proxy URL. The
WS client adapts http:// → ws:// and https:// → wss://
automatically. See docs/auth.md for
details. Adding TLS support to the orchestrator binary itself is an
explicit non-goal.