Skip to content

shaal/shipyard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

shipyard

Drain a backlog one quality-gated task at a time — autonomously.

Shipyard is three small pieces for Claude Code that work together:

Piece What it is Replaces
/ship A skill that completes one task end-to-end behind a four-axis confidence gate — it won't commit until the work is honestly proven ≥95/100. a careful human review
/ship-next A command that drives /ship unattended on the next backlog/roadmap task, then ships it via a squash-merged GitHub PR into the target branch. "do the next one"
shipyard A CLI that loops /ship-next, draining the whole backlog one merged PR at a time, and stops cleanly when it's done or stuck. babysitting the loop
shipyard (loop the backlog)  →  /ship-next (one task, unattended, PR-merged)  →  /ship (do it + gate it)

How each task ships: the target branch is whatever you have checked out when you launch shipyard. Each iteration syncs that branch, cuts a short-lived ship/<task> branch, does the work behind the gate, then pushes, opens a GitHub PR, and squash-merges it back into the target with gh pr merge --squash --admin (an unattended admin merge that doesn't wait for CI). The iteration ends back on the target branch, fast-forwarded to the merge — which is how the loop detects progress.

The point is the confidence gate. Each task is scored on four 0–25 axes — Requirements Fit · Functional Robustness · Verification Evidence · System Safety — each backed by executed evidence (tests, real runs, adversarial review), not "I read the code and it looks right." Total must be ≥95 with no axis below 15. A task that can't clear the bar parks itself (no commit) instead of shipping weak work, so an unattended run never piles junk onto your branch.

Install

Needs Node.js 18+, git, and Claude Code on your PATH. Works on Windows, macOS, and Linux. Pick either — no npm install required:

Zero-install (npx) — nothing left on your system but the skill files:

npx @shaal/shipyard init     # one-time: add /ship + /ship-next to ~/.claude
npx @shaal/shipyard 5        # run the loop (prefix every command with npx)

Global install — shorter command, nicer if you'll use it often:

npm install -g @shaal/shipyard
shipyard init
shipyard 5

Run … init once either way: it copies the /ship skill into ~/.claude/skills/ship/ and the /ship-next command into ~/.claude/commands/ship-next.md (cross-platform — uses your home dir, no symlinks). This is a file copy Claude Code loads from disk, so it's needed even with npx. Restart Claude Code and /ship + /ship-next are available everywhere.

Before it's on npm, run straight from GitHub: npx github:shaal/shipyard init. With npx, pin freshness via npx @shaal/shipyard@latest … to dodge a stale cache.

Quick start

In any git repo that has a task list — a markdown checklist with - [ ] items (e.g. ROADMAP.md, TASKS.md) or a beads backlog:

shipyard 5          # ship up to 5 tasks, each behind the gate, one PR each
shipyard            # ship until the backlog is empty or progress stalls
shipyard --dry-run  # show the plan + exact command, run nothing

Point it at a specific task file with --tasks (handy when a repo has several checklists, or the file lives somewhere shipyard wouldn't auto-find):

shipyard --tasks ROADMAP.md        # draw tasks only from ROADMAP.md
shipyard 5 --tasks docs/PLAN.md    # ...capped at 5 tasks
SHIPYARD_TASK_FILE=ROADMAP.md shipyard   # env form (the flag wins if both set)

An explicit task file is authoritative: it overrides beads detection and auto-discovery, and the loop stops on its own once the file has no unchecked - [ ] boxes left (no need to pass an iteration cap).

Using npx instead of a global install? Prefix each command: npx @shaal/shipyard 5.

Each iteration runs claude -p "/ship-next", which finds the next unchecked task, branches off your target, completes it, runs an adversarial review, scores the gate, then pushes, opens a PR, and squash-merges it back into the target. Shipyard watches your git HEAD: if an iteration ships no merge (a task parked at the gate), it counts that as no-progress and stops after a couple of those — so it halts instead of spinning.

Needs the GitHub CLI (gh), authenticated. The default flow opens and admin-merges a PR per task. Run gh auth login first; shipyard warns at startup if gh is missing.

Tuning quality vs speed

Shipyard defaults to max quality: claude-opus-4-8 on every task, plus an adversarial-review pass (independent subagents that try to refute the work before the gate passes). That's thorough but slow (~10–25 min/task). Dial it with env vars:

# Faster / cheaper — sonnet does the work, gate still runs:
SHIPYARD_MODEL=claude-sonnet-4-6 shipyard

Configuration

All optional; sensible defaults:

Env var Default Meaning
SHIPYARD_CMD /ship-next Slash command run each iteration. A project can ship its own .claude/commands/ship-next.md to override.
SHIPYARD_TASK_FILE Markdown - [ ] checklist to draw tasks from (same as --tasks; the flag wins if both set). Authoritative — overrides beads/auto-discovery, and the loop stops when the file has no unchecked boxes.
SHIPYARD_MODEL claude-opus-4-8 --model for claude. claude-sonnet-4-6 for speed.
SHIPYARD_MCP off off drops all MCP servers (faster cold start; none are needed to ship code). on keeps them.
SHIPYARD_PROGRESS_REF HEAD Git ref whose advance = "a task shipped". Use origin/<branch> for a push flow.
SHIPYARD_READY_CMD auto A command that succeeds while work remains. Auto-detects bd ready in a beads workspace; otherwise relies on stall-detection.
SHIPYARD_ZERO_STREAK_LIMIT 2 Stop after N consecutive iterations that ship nothing.
SHIPYARD_CLAUDE_ARGS Extra args appended to the claude invocation.

Per-iteration logs (the full stream-json) land in <tmp>/shipyard-<timestamp>/iter-NN.jsonl.

How a task ships (the four phases of /ship)

  1. Complete the next task to its "done when" criteria.
  2. Gate — score the four axes against executed evidence; iterate until ≥95 (no axis <15). Adversarial reviewers try to break it first.
  3. Document — update only what actually changed.
  4. Commit — one task, specific files, a conventional message. (Interactively, /ship pauses for your "go ahead" here; /ship-next skips that pause and commits autonomously, since the loop is unattended.)

Under the shipyard loop, /ship-next wraps those phases in a branch-per-task PR flow: it cuts a ship/<task> branch off the target, runs the four phases on it, then pushes and squash-merges a PR back into the target. The PR body embeds the four confidence scores verbatim — per-axis number, evidence, and self-critique from the Phase 2 gate — so every merged PR carries the introspection that justified shipping it.

Platform notes

  • macOS / Linux — works out of the box. On macOS, long runs stay awake via caffeinate.
  • Windows — runs natively (no WSL/Git Bash required); the CLI is plain Node. Ensure claude and git are on your PATH.

Components, separately

License

MIT © Ofer Shaal

About

Drain a backlog one quality-gated task at a time with Claude Code — a confidence-gate skill (/ship), an unattended driver (/ship-next), and a cross-platform loop runner (shipyard).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors