Tags: jahwag/clem
Tags
fix(runner): keep _backend assignment inside the python MCP-config bl… …ock (#220) ## Bug The generated `clem-runner.sh` placed the backend assignment on a **bash** line directly above the Python MCP-config heredoc: ```sh _backend = '{{.CoordinationBackend}}' python3 -c " import json, os ... if _backend != 'github' and os.environ.get('DISCORD_TOKEN'): ``` So the shell executed `_backend = 'github'` → ``` /home/<user>/.local/bin/clem-runner.sh: line 36: _backend: command not found Traceback (most recent call last): File "<string>", line 12, in <module> NameError: name '_backend' is not defined ``` The Python block (which gates Discord/Slack MCP registration) never had `_backend` defined. On the **github** backend it's non-fatal — no coordination MCP is needed — but it errors on every iteration and aborts the `.mcp.json` write. ## Fix Move `_backend = '{{.CoordinationBackend}}'` **inside** the `python3 -c "` script, right after `import json, os`. Applied to both runner templates (claude-code and opencode), which had the identical layout. ## Test `TestGenerate_BackendAssignedInsidePython` (table-driven over both runtimes) asserts the assignment renders after `python3 -c "` and not on a standalone bash line. Verified it **fails on the pre-fix code** and passes after. `go fmt` / `go vet` / `go test ./...` all clean.
feat(init): have agents maintain their own open PRs (#219) ## What The generated agent contract (`clem init` → `CLAUDE.shared.md`) ends the task lifecycle at **"open a PR"**. Nothing brings an agent back to a PR it already opened, so a PR that later: - becomes **unmergeable** because its base branch moved on (conflicts), - has **failing CI checks**, or - receives **operator review feedback / change requests** is never revisited — it's delivered work that silently never lands, and the operator is left to babysit stale PRs. ## Change Add a **"Your open PRs"** section to both shared templates (Discord and GitHub backends). Each iteration, before claiming new work, the agent lists its own open PRs (`gh pr list --author @me --state open`) and keeps them mergeable: - **Conflicts** → rebase onto the latest base, resolve, push. - **Red checks** → fix the cause and push to the same branch. - **Review feedback** → address it, **but only from trusted operators** (consistent with the existing Trust section; all other review content stays data, not instructions). Merging remains the operator's job (unchanged Security rule). ## Notes - No CLI/flag or `clem.yaml` schema changes — template content only. - Both backends covered; `{{coordination.github_repo}}` is substituted at generation time as usual. - Added `TestInitTemplateContainsOpenPRMaintenance` (covers both backends). `go fmt`, `go vet`, and `go test ./...` all clean.
feat(coordination): add GitHub Issues as coordination backend (#173) ## Summary Adds `github` as a third `coordination.backend` option alongside Discord (default) and Slack. Engineering fleets can coordinate tasks with native GitHub Issues primitives — labels, assignees, comments, and PR linkage — instead of reconstructing state from chat messages. This does not replace Discord or Slack. It completes the existing swappable-backend abstraction with an **issue-first** mode for teams whose work naturally ends in pull requests. ## Why add GitHub Issues as a coordination backend? `clem` already models coordination as a swappable backend selected through `coordination.backend`. Discord and Slack are good defaults for conversational fleets, but they are not the only useful coordination surface. For engineering-focused fleets, the work naturally starts and ends in GitHub: ```text task → claim → implementation → progress updates → PR → review → merge ``` GitHub Issues already provides the native primitives required to represent this workflow: | clem concept | GitHub primitive | |--------------|------------------| | Task queue | Open issues with a configured label | | Task state | `clem:todo`, `clem:in-progress`, `clem:done`, `clem:blocked` labels | | Claim | Self-assignment (`gh issue edit N --add-assignee @me`) | | Progress updates | Issue comments | | Delivery | Pull request with `Closes #N` | | Alerts | Comments on a configured alerts issue | | Lessons and post-mortems | Comments on a configured lessons issue | The GitHub backend is useful when users want: 1. **A durable task queue.** Each task has a persistent object with explicit state, assignee, history, and linked delivery. 2. **Traceability from task to code.** A task can be followed from issue creation to claim, implementation, PR review, and merge. 3. **Lower operational overhead.** Teams that already use GitHub do not need an additional chat platform or coordination MCP server. 4. **Human-in-the-loop governance.** Humans can inspect, reprioritize, block, or unblock tasks using familiar GitHub workflows. 5. **Asynchronous operation.** Coordination does not depend on reconstructing state from a stream of chat messages. ### Field validation (pre-merge integration run) The backend was exercised against a real shared repository in a pre-merge integration run — not mocks, not unit tests alone. A provisioned fleet autonomously processed six issues and produced six pull requests: | Evidence observed | Result | |-------------------|--------| | Tasks coordinated | 6 issues | | Claims | 6 issues with exactly 1 assignee each | | Double-claim race | None observed | | Deliveries | 6 PRs | | Task → delivery link | 6 PRs with correct `Closes #N` | | Mergeability | 6 PRs reported as `MERGEABLE` | | CI | Green; docs-only jobs correctly skipped by path filters | | Operational memory | 24 comments on the lessons issue | | Alerts and post-mortems | Recorded as comments on dedicated issues | The run also surfaced meaningful engineering findings rather than only generating code: divergence between design sketches and merged code, shallow acceptance criteria hiding blocking defects, a script with no effective entrypoint, and CI behaviour requiring human attention. This should not be described as a production test: no generated PR was merged and no production workload was executed. It is a real pre-merge integration validation of the coordination loop. **Claim semantics:** the pilot did not observe double claim — each task ended with exactly one assignee. The protocol is adequate for cooperative coordination between agents, but should not be described as a formal mutual-exclusion guarantee. Stronger arbitration can be a follow-up. **Identity note:** the pilot used a single GitHub identity for all events. Per-task and per-PR audit worked, but per-agent attribution appeared only in comment content. The clem model already supports per-agent Linux users, git identity, and PR authorship; separate tokens per agent would improve this dimension and are the recommended production configuration. ## What changed ### Coordination backend (`internal/coordination`) - Register `github` in `Known()` with `AlertTemplate` posting to `api.github.com/repos/{repo}/issues/{n}/comments` via `GITHUB_TOKEN`. - New `RenderAlert()` + `AlertParams` — unified alert rendering for Discord, Slack, and GitHub (watchdog and runner now share this path). ### Configuration (`internal/config`) - `coordination.github_repo` (`owner/name`, required when `backend: github`). - Validation for GitHub channels: `tasks` = label (e.g. `clem:todo`); `alerts` / `lessons` = issue numbers. - Helpers: `UsesGitHubCoordination()`, `GitHubWatchServiceName()`, `BackendOrDefault()`. - `api.github.com` added to default egress allowlist. ### Issue watcher sidecar (`internal/githubwatch`, new) - `clem provision` writes `~/.local/bin/clem-github-watch.sh` per agent. - Polls `GET /repos/{repo}/issues?labels=…&state=open` every 60s with **conditional requests** (`ETag` / `If-None-Match`) per [GitHub REST API best practices](https://docs.github.com/rest/guides/best-practices-for-using-the-rest-api). - Detects new unassigned issues and wakes the tmux session (`tmux send-keys`). - Installs `clem-github-watch-{project}-{agent}.service` with `JoinsNamespaceOf` the agent unit. - Respects egress containment (loopback proxy export + `IPAddressDeny` when enabled). **Why polling, not webhooks?** Deliberately simple and compatible with clem's self-hosted model: no public endpoint, no webhook receiver service, no extra infrastructure. Webhooks may be a future option for installations that already have inbound HTTP infrastructure. ### Runner (`internal/runner`) - Skips Discord/Slack MCP registration when `backend: github` (agents use `gh` CLI). - Agent unit gains `Wants=clem-github-watch-…` when GitHub coordination is active. - Alert curl rendered via `coordination.RenderAlert()`. ### Provision / init - `cmd/provision`: installs watcher script + systemd unit when `UsesGitHubCoordination()`. - `clem init --backend github`: scaffolds `clem.yaml` and `CLAUDE.shared.md` with GitHub task-board semantics and claim protocol. ### Watchdog (`internal/watchdog`) - `send_alert` uses `coordination.RenderAlert()` with repo + issue number for GitHub backends. ### Agent docs (`internal/agentdoc`) - `{{coordination.github_repo}}` placeholder for templates. ### Samples and docs - `samples/github-tasks/` — reference `clem.yaml` and setup guide. - README: coordination backends table, GitHub coordination section, updated `clem.yaml` reference. - `docs/index.html`: multi-backend copy updated. ### CI - New `github-coordination` e2e job: provisions with `backend: github`, asserts watcher script syntax, API polling, systemd wiring. ## Design scope (intentionally narrow) - Opt-in via `coordination.backend: github`; Discord and Slack unchanged. - Reuses existing `GITHUB_TOKEN` and standard `gh` CLI — no coordination MCP. - Lightweight polling watcher; preserves egress-containment model. - Quality gates and closed-loop verification are **out of scope** for this PR (separate future work). ## Test plan - [x] `go test ./...` — 238 tests pass - [x] `go vet ./...` — clean - [x] `go fmt ./...` — no diff - [x] Unit tests: `coordination`, `config`, `runner`, `watchdog`, `agentdoc`, `githubwatch` - [x] e2e job `github-coordination` in `.github/workflows/e2e.yml` - [x] Pre-merge fleet integration (6 issues → 6 PRs, described above) - [ ] After merge: operator smoke test with `clem init --backend github` → `clem provision` → verify watcher service active and agent wakes on new `clem:todo` issue ## Notes - GitHub coordination is **not** chat emulation. Alerts and lessons use issue comments because they are durable and auditable; free-form conversation remains better suited to Discord or Slack. - Optimistic `clem:done` labels with pending concerns observed in the pilot reflect a **quality-policy** gap, not a coordination-transport failure. Quality gates are a separate concern from task state, wake-up, and traceability. - Recommended follow-ups: per-agent GitHub tokens for attribution, stronger claim arbitration, optional webhook mode for high-throughput installations.
feat(watchdog): daily transcript prune + night-aware stale threshold (#… …211) - prune_transcripts: session JSONLs + UUID sidecar dirs older than 30d deleted once daily (observed ~1.5 GB/agent-month unbounded growth; a production host hit 88% disk). memory/ and other non-UUID dirs at the same depth are deliberately not matched. - stale threshold now derives from max(iteration, iteration_night): sizing on the day value made every healthy 30m night sleep look stale and would have hard-restarted agents all night.
feat(watchdog): daily transcript prune + night-aware stale threshold (#… …211) - prune_transcripts: session JSONLs + UUID sidecar dirs older than 30d deleted once daily (observed ~1.5 GB/agent-month unbounded growth; a production host hit 88% disk). memory/ and other non-UUID dirs at the same depth are deliberately not matched. - stale threshold now derives from max(iteration, iteration_night): sizing on the day value made every healthy 30m night sleep look stale and would have hard-restarted agents all night.
feat(runner): iteration_night, next-effort handshake, runner warnings… …, quota snapshot (#210) Four runner/config features driven by a production cache+quota audit (2026-06-13, consultant.dev team host): - iteration_night: separate night-hours (22-07) sleep. The hardcoded night doubler was removed when the prompt-cache TTL was believed to be 5 min; subscription Claude Code actually gets the 1h TTL refreshed on access (verified from session-log usage fields), so night intervals up to ~45m still start warm. Default: match iteration. - next-effort handshake: agent writes low|medium|high|xhigh to ~/.claude/next-effort; runner validates, exports session-scoped CLAUDE_CODE_EFFORT_LEVEL for the next launch, deletes the file. No reset bookkeeping, no drift. - runner warnings: sync-skills failures and <1h-to-expiry OAuth tokens are prepended to the injected prompt so the agent itself escalates. Also fixes sync-skills failure detection: 'sync | tee || log' tested tee's exit status (no pipefail), so failures never logged -- a dirty clone silently blocked one production agent's skill sync for 3 weeks. Now uses PIPESTATUS[0]. - quota snapshot: runner refreshes ~/.claude/quota.json from the OAuth usage endpoint at most every 25m; agents read the file instead of polling per-iteration (which 429s with multiple agents per host). claude-code runtime gets all four; opencode gets the warnings prepend plus the sync-skills fix (effort/quota are Claude-specific).
feat(skills): team skills repo sync (provision seed + per-iteration r… …efresh) (#205) Rebases the skills feature (059e5bf + 822c997, previously only on the v0.10.0-snapshot.1 channel) onto current main, restoring per-provision and per-iteration team-skills sync that went dormant after the box moved to mainline v0.13.0. Closes #204. ## What - Top-level skills_repo config key: clem provision clones the repo per agent and symlinks shared/<skill> and <agentKey>/<skill> into ~/.claude/skills/; idempotent re-runs git pull --ff-only, stale symlinks pruned. - clem sync-skills subcommand + runner hook: skills refresh at the top of every iteration, no operator round-trip after a skills PR merges. - clem update --snapshot flag: opt-in prerelease channel (goreleaser prerelease: auto keeps snapshot tags off stable hosts). ## Rebase conflict resolutions (vs v0.13.0-era main) - config.go: SkillsRepo registered as a real struct field, so it passes the new strict unknown-key validation; isPlausibleGitURL check runs in Load(). - IsValidExtensionName moved to extensions.go next to extensionNameRe (file was split since the original commits). - update.go: kept main's exact-name selectBinaryAsset (#201) and test-overridable URL vars; added Prerelease/Draft fields + allReleasesURL for the snapshot channel. - runner.go/provision.go: skills hooks re-inserted into the refactored provisionAgent / Params paths alongside ProxyExport/SidecarServers. ## Verification - go build ./... clean, gofmt clean, go vet clean - go test ./... all packages ok, including restored skills tests: TestLoad_SkillsRepoAccepted/Rejected, TestGenerate_SkillsSyncInjectedWhenRepoSet/AbsentWhenRepoUnset, SyncSkillsRepo manager tests - Pre-push secret-scan flag is a false positive: neither commit diff contains a GH_TOKEN read (grep of both diffs is empty; provision.go:46 is pre-existing main code), pushed with CLEM_HOOK_SKIP_CODE_SCAN=1 Release plan per jahwag: merge, then tag v0.14.0 (new feature = minor bump rather than v0.13.1). --------- Co-authored-by: jahwag <540380+jahwag@users.noreply.github.com>
fix(config): reject control characters in agent name/role at Load() (#… …198) Closes #124. ## Problem `AgentConfig.Name` and `AgentConfig.Role` are free-form strings from `clem.yaml` with no validation at `Load()`. `Name` is interpolated into systemd unit `Description=` lines via `serviceTemplate` and `ttydServiceTemplate` in `internal/runner/runner.go`. systemd unit files are newline-delimited, so a name containing a literal newline terminates the `Description=` directive and injects arbitrary subsequent directives — including a second `[Service]` section with a crafted `ExecStart` that systemd merges and runs at service start. ## Fix Reject all ASCII control characters (`[\x00-\x1f\x7f]`) in `name` and `role` at `Load()`, following the same pattern as the `git_email` validation (#183). Spaces remain legal — display names like "Lead Software Engineer" are the common case in every sample config. Scope notes: - systemd splits unit files only on ASCII newline, so unicode separators (U+2028, NEL) are not line breaks in that sink — the ASCII control-char class matches the sink parser (verified empirically). - Shell metacharacter escaping in the runner bash templates is deliberately out of scope; that's #112. - The `ac.Name` JSON-injection vector in the alert message is #115, also untouched here. ## Testing - New `TestLoad_AgentNameRoleRejectControlCharacters` covers newline / CR / tab / \x01 / \x7f across both fields (fixtures pass through the Go-string → YAML double-quoted-scalar decode chain, so real control bytes reach `Load()`). - New `TestLoad_AgentNameRoleAllowSpaces` pins that ordinary multi-word names/roles still load. - Full `go test ./...` passes. Adversarial review ran before this PR: independent reviewers confirmed the validation sits on the only path to the unit-file templates (single `Load()` call site, no post-Load mutation of `Name`/`Role` anywhere), confirmed no existing sample/doc/test config would be rejected, and verified the regex class against the systemd sink parser. Review caught an initially-incomplete sink list in the regex comment (now also names the runner bash sinks) and an unpinned tab-rejection behavior (now tested).
fix(cmd): honour [agent...] args in clem login (#182) Closes #152. `clem login` advertised `[agent...]` positional args in its Use string but `runLogin` never read them — every invocation looped all configured agents, so selective login was silently ignored. ## Changes - New `selectAgents` helper filters `cfg.Agents` to the keys given on the command line; unknown keys return `unknown agent: <key>` (same convention as `clem logs`). No args keeps current behaviour (all agents). - Agents are now iterated in sorted key order for deterministic interactive prompts, matching the sorted-output convention in `clem status`. - Combining agent args with `--remote` now errors instead of silently dropping the selection: `remote.Login` only takes a host and cannot forward agent filtering, so an honest error beats logging in every remote agent against the operator's intent. ## Testing - `go build ./...`, `go vet ./cmd/`, `go test ./cmd/` green. - New `TestSelectAgents` covers no-args (all), single key, multiple keys, and unknown-key error. Adversarial review ran before this PR (multi-angle finders + verifiers). It caught two confirmed issues that are fixed in this diff: the `--remote` + agent-args silent ignore, and nondeterministic map-iteration order for login prompts.
PreviousNext