Skip to content

fix(tmux): suppress nudge Escape while agent is generating (stop interrupting the Mayor)#4242

Merged
Bella-Giraffety merged 2 commits into
gastownhall:mainfrom
andrewboldi:fix/nudge-escape-interrupts-mayor
Jun 12, 2026
Merged

fix(tmux): suppress nudge Escape while agent is generating (stop interrupting the Mayor)#4242
Bella-Giraffety merged 2 commits into
gastownhall:mainfrom
andrewboldi:fix/nudge-escape-interrupts-mayor

Conversation

@andrewboldi

Copy link
Copy Markdown
Contributor

Problem

System nudges are delivered to agents via tmux send-keys. The delivery protocol (NudgeSessionWithOpts / NudgePane) sends a literal Escape keystroke (step 5) before Enter — originally to exit a vim-mode composer's INSERT mode so Enter submits (GH#307), commented as "harmless for bash/Claude Code".

It is not harmless for Claude Code: Escape cancels in-flight generation. So when a nudge lands while the agent is mid-turn — the Mayor being the common victim — the Escape interrupts its current work. This affects every direct NudgeSession caller: witness handlers, the daemon health-check heartbeat, the mail router, broadcast, and gt nudge.

EscapeCancelsRequest already existed to skip the Escape for agents where it cancels generation (Gemini; Copilot auto-skipped), but Claude was wrongly assumed exempt.

Fix

Gate the Escape on the agent's busy state instead of sending it unconditionally:

  • shouldSendEscape(target) captures the pane and returns false when the busy indicator (esc to interrupt) is present — i.e. the agent is generating.
  • Step 5 now runs only when !SkipEscape && shouldSendEscape(...).
  • Applied to both NudgeSessionWithOpts and NudgePane (the latter previously sent Escape with no gate at all).

This reuses the exact signal IsIdle / WaitForIdle already trust, so it adds no new external assumption. Behavior is strictly safer:

  • Idle agent → Escape still sent (vim-mode GH#307 preserved).
  • Generating agent → Escape suppressed (no interrupt); the typed text + Enter simply queue as the next turn.
  • Non-Claude agents that don't render esc to interrupt are unaffected (gate returns true → unchanged).

Why not the alternatives

  • Set EscapeCancelsRequest: true for Claude — simplest, but drops the vim-mode INSERT-exit entirely, even when idle.
  • Sniff vim mode from the pane — fragile and version-dependent; false negatives strand vim users, false positives reintroduce the interrupt. The busy-state signal is the actual harm condition and is already battle-tested.

Design note

On pane-capture error, shouldSendEscape returns false (suppress). When we can't confirm idle, not interrupting the agent is the safer default, and it's harmless for the common non-vim case where Enter alone submits. Matches IsIdle's return-false-on-error convention.

Fragility (tracked)

Detection relies on the literal esc to interrupt status string. This is pre-existing (shared with IsIdle / WaitForIdle), now load-bearing for nudge correctness. Documented in-code and tracked in #4240.

Tests

  • TestShouldSendEscapeForLines — pure decision logic (busy ⇒ suppress; idle/empty ⇒ allow).
  • TestShouldSendEscape_LivePane — real tmux pane: idle ⇒ allow, esc to interrupt rendered ⇒ suppress. Capture-only, so it sidesteps the sendEnterVerified flakiness of the full nudge integration tests.

go build / go vet / gofmt clean on internal/tmux.

Note: 3 pre-existing TestNudgeSession_* real-tmux integration tests fail in my local sandbox — verified identical with these changes stashed (a sendEnterVerified environment timing issue, not introduced here).

Relates to #4240

System nudges (witness, daemon health-check, mail, broadcast, gt nudge)
deliver via tmux send-keys and, by default, send a literal Escape keystroke
(step 5) to exit a vim-mode composer's INSERT mode (GH#307). In Claude Code
that Escape cancels in-flight generation, so a nudge landing while an agent is
mid-turn — e.g. the Mayor — interrupts its current work.

Gate the Escape on the agent's busy state: skip it when the pane shows the
"esc to interrupt" indicator (agent generating), send it only when idle. This
reuses the same signal IsIdle/WaitForIdle already trust, rather than fragile
vim-mode sniffing, and is strictly safer than the previous always-send
behavior. Applied to both NudgeSessionWithOpts and NudgePane (which previously
sent Escape unconditionally).

Adds a pure-function test (shouldSendEscapeForLines) and a live-pane wiring
test (shouldSendEscape). The "esc to interrupt" heuristic's fragility is
documented in code and tracked in gastownhall#4240.

Relates to gastownhall#4240

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codecov-commenter

codecov-commenter commented Jun 11, 2026

Copy link
Copy Markdown

❌ 12 Tests Failed:

Tests completed Failed Passed Skipped
10683 12 10671 52
View the full list of 12 ❄️ flaky test(s)
github.com/steveyegge/gastown/internal/cmd::TestEnsureBeadsConfigYAML_CreatesWhenMissing

Flake rate in main: 100.00% (Passed 0 times, Failed 16 times)

Stack Traces | 0s run time
=== RUN   TestEnsureBeadsConfigYAML_CreatesWhenMissing
    install_test.go:98: config.yaml = "prefix: hq\nissue-prefix: hq\ndolt.idle-timeout: \"0\"\nexport.auto: \"false\"\n", want "prefix: hq\nissue-prefix: hq\ndolt.idle-timeout: \"0\"\n"
--- FAIL: TestEnsureBeadsConfigYAML_CreatesWhenMissing (0.00s)
github.com/steveyegge/gastown/internal/cmd::TestFilterAndSortSessions_SortOrder

Flake rate in main: 86.36% (Passed 3 times, Failed 19 times)

Stack Traces | 0s run time
=== RUN   TestFilterAndSortSessions_SortOrder
    agents_test.go:323: filterAndSortSessions returned 7 agents, want 8
--- FAIL: TestFilterAndSortSessions_SortOrder (0.00s)
github.com/steveyegge/gastown/internal/cmd::TestGetTrackedIssues_FallsBackToShowTrackedDependencies

Flake rate in main: 100.00% (Passed 0 times, Failed 16 times)

Stack Traces | 0s run time
=== RUN   TestGetTrackedIssues_FallsBackToShowTrackedDependencies
    convoy_external_tracking_test.go:98: getTrackedIssues: querying tracked issues for hq-cv-ext: bd dep: unexpected bd args: dep list hq-cv-ext --direction=down --type=tracks --json
--- FAIL: TestGetTrackedIssues_FallsBackToShowTrackedDependencies (0.00s)
github.com/steveyegge/gastown/internal/cmd::TestGuessSessionFromWorkerDir

Flake rate in main: 86.36% (Passed 3 times, Failed 19 times)

Stack Traces | 0s run time
=== RUN   TestGuessSessionFromWorkerDir
--- FAIL: TestGuessSessionFromWorkerDir (0.00s)
github.com/steveyegge/gastown/internal/cmd::TestGuessSessionFromWorkerDir/different_rig

Flake rate in main: 86.36% (Passed 3 times, Failed 19 times)

Stack Traces | 0s run time
=== RUN   TestGuessSessionFromWorkerDir/different_rig
    agents_test.go:626: guessSessionFromWorkerDir(".../myrig/crew/alpha", "/town") = "gt-crew-alpha", want "mr-crew-alpha"
--- FAIL: TestGuessSessionFromWorkerDir/different_rig (0.00s)
github.com/steveyegge/gastown/internal/cmd::TestJSONOutput_ErrorsReturnNonZeroExit

Flake rate in main: 86.36% (Passed 3 times, Failed 19 times)

Stack Traces | 0.01s run time
=== RUN   TestJSONOutput_ErrorsReturnNonZeroExit
    convoy_stage_test.go:2519: error output should still be valid JSON: unexpected end of JSON input
        raw:
--- FAIL: TestJSONOutput_ErrorsReturnNonZeroExit (0.01s)
github.com/steveyegge/gastown/internal/cmd::TestJSONOutput_NoHumanReadableText

Flake rate in main: 86.36% (Passed 3 times, Failed 19 times)

Stack Traces | 0.01s run time
=== RUN   TestJSONOutput_NoHumanReadableText
    convoy_stage_test.go:2463: stdout is not valid JSON: unexpected end of JSON input
        raw:
--- FAIL: TestJSONOutput_NoHumanReadableText (0.01s)
github.com/steveyegge/gastown/internal/cmd::TestRunConvoyList_UsesTownRootAndStripsBeadsDir

Flake rate in main: 100.00% (Passed 0 times, Failed 16 times)

Stack Traces | 0.01s run time
=== RUN   TestRunConvoyList_UsesTownRootAndStripsBeadsDir
⚠ Warning: skipping convoy hq-cv-town: querying tracked issues for hq-cv-town: bd dep: unexpected bd args: dep list hq-cv-town --direction=down --type=tracks --json
    convoy_bd_routing_test.go:156: expected convoy JSON output, got:
        []
--- FAIL: TestRunConvoyList_UsesTownRootAndStripsBeadsDir (0.01s)
github.com/steveyegge/gastown/internal/cmd::TestRunConvoyStatus_UsesTownRootAndStripsBeadsDir

Flake rate in main: 100.00% (Passed 0 times, Failed 16 times)

Stack Traces | 0s run time
=== RUN   TestRunConvoyStatus_UsesTownRootAndStripsBeadsDir
    convoy_bd_routing_test.go:211: runConvoyStatus: getting tracked issues for hq-cv-status: querying tracked issues for hq-cv-status: bd dep: unexpected bd args: dep list hq-cv-status --direction=down --type=tracks --json
--- FAIL: TestRunConvoyStatus_UsesTownRootAndStripsBeadsDir (0.00s)
github.com/steveyegge/gastown/internal/polecat::TestManagerAgentLifecycleUsesTownBeadsDir

Flake rate in main: 100.00% (Passed 0 times, Failed 16 times)

Stack Traces | 0.01s run time
=== RUN   TestManagerAgentLifecycleUsesTownBeadsDir
    manager_test.go:2178: manager create did not use town BEADS_DIR; log:
        env=.../TestManagerAgentLifecycleUsesTownBeadsDir3558607980/001/.beads args=init --prefix gt --server
        env=.../TestManagerAgentLifecycleUsesTownBeadsDir3558607980/001/.beads args=config set issue_prefix gt
        env=.../TestManagerAgentLifecycleUsesTownBeadsDir3558607980/001/.beads args=migrate --yes
        env=.../TestManagerAgentLifecycleUsesTownBeadsDir3558607980/001/.beads args=config set types.custom agent,role,rig,convoy,slot,queue,event,message,molecule,gate,merge-request
        env=.../TestManagerAgentLifecycleUsesTownBeadsDir3558607980/001/.beads args=config get types.custom
        env=.../TestManagerAgentLifecycleUsesTownBeadsDir3558607980/001/.beads args=--allow-stale version
        env=.../TestManagerAgentLifecycleUsesTownBeadsDir3558607980/001/.beads args=create --json --id=gt-gastown-polecat-rust --title=gt-gastown-polecat-rust --description=gt-gastown-polecat-rust
        
        role_type: polecat
        rig: gastown
        agent_state: spawning
        hook_bead: null
        cleanup_status: null
        active_mr: null
        notification_level: null --type=task --labels=gt:agent --force
        env=.../TestManagerAgentLifecycleUsesTownBeadsDir3558607980/001/.beads args=show gt-gastown-polecat-rust --json
        env=.../TestManagerAgentLifecycleUsesTownBeadsDir3558607980/001/.beads args=update gt-gastown-polecat-rust --description=gt-gastown-polecat-rust
        
        role_type: polecat
        rig: gastown
        agent_state: nuked
        hook_bead: null
        cleanup_status: null
        active_mr: null
        notification_level: null
--- FAIL: TestManagerAgentLifecycleUsesTownBeadsDir (0.01s)
github.com/steveyegge/gastown/internal/refinery::TestEngineerClearAgentActiveMRUsesTownBeadsDir

Flake rate in main: 100.00% (Passed 0 times, Failed 16 times)

Stack Traces | 0.02s run time
=== RUN   TestEngineerClearAgentActiveMRUsesTownBeadsDir
    engineer_test.go:127: refinery active_mr cleanup did not use town BEADS_DIR; log:
        env=.../TestEngineerClearAgentActiveMRUsesTownBeadsDir4243007610/001/.beads args=--allow-stale version
        env=.../TestEngineerClearAgentActiveMRUsesTownBeadsDir4243007610/001/.beads args=show gt-gastown-polecat-rust --json
        env=.../TestEngineerClearAgentActiveMRUsesTownBeadsDir4243007610/001/.beads args=update gt-gastown-polecat-rust --description=gt-gastown-polecat-rust
        
        role_type: polecat
        rig: gastown
        agent_state: idle
        hook_bead: null
        cleanup_status: null
        active_mr: null
        notification_level: null
--- FAIL: TestEngineerClearAgentActiveMRUsesTownBeadsDir (0.02s)
github.com/steveyegge/gastown/internal/rig::TestInitBeadsWritesConfigOnFailure

Flake rate in main: 100.00% (Passed 0 times, Failed 16 times)

Stack Traces | 0.01s run time
=== RUN   TestInitBeadsWritesConfigOnFailure
    manager_test.go:682: config.yaml = "prefix: gt\nissue-prefix: gt\ndolt.idle-timeout: \"0\"\nexport.auto: \"false\"\n", want "prefix: gt\nissue-prefix: gt\ndolt.idle-timeout: \"0\"\n"
--- FAIL: TestInitBeadsWritesConfigOnFailure (0.01s)

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

(cherry picked from commit 1580c738dbd76fd223a39181009f95464db9c970)
@Bella-Giraffety Bella-Giraffety added status/review-approved Automated review approved priority/p1 High — core workflow broken kind/bug Broken behavior and removed status/needs-triage Inbox — we haven't looked at it yet labels Jun 12, 2026
@Bella-Giraffety

Copy link
Copy Markdown
Collaborator

PR Sheriff review complete for #4242.

15 research legs:
R1 nudge flow: scoped bugfix, core approach sound with heuristic caveat.
R2 tmux API: reuses existing IsIdle/busy signal idiomatically.
R3 escape logic: found one actionable false-positive risk when checking after typed nudge text.
R4 tests: helper/live coverage useful; avoid flaky full delivery test.
R5 CI: original lint/test/integration failures match red main, not touched files.
R6 lint: failing lint is unrelated repo-wide debt.
R7 integration: behavior intentionally narrows Escape delivery; targeted tmux tests are the right local gate.
R8 busy signal: reuse is convergent, though long-term structured idle events would be cleaner.
R9 SkipEscape options: Copilot/Gemini paths preserved; stale Claude comment needed cleanup.
R10 NudgePane: gating is required because sling paths can target live busy panes.
R11 command safety: no new shell/quoting risk.
R12 user impact: Mayor interruption fixed while idle vim-mode behavior is preserved.
R13 style: acceptable with duplicated rationale collapsed.
R14 linked issues: real bug, must preserve GH#307 vim-mode behavior.
R15 policy: bugfix, P1, kind/bug, maintainer fixup allowed.

5 pre-implementation reviews:
P1 correctness: approved pre-text busy snapshot.
P2 regression: approved if snapshot occurs after rewind/copy-mode cleanup.
P3 tests: helper/live coverage sufficient; add fail-closed capture-error test.
P4 style: update stale EscapeCancelsRequest comment.
P5 policy: allowed to fix up rather than request contributor changes.

Fixup pushed to this PR branch:

  • Preserves contributor's central shouldSendEscape approach.
  • Computes sendEscape before injecting nudge text in both NudgeSessionWithOpts and NudgePane, so the message itself cannot match esc to interrupt and suppress idle vim-mode Escape.
  • Keeps the 600ms readline wait and rewind recovery only when Escape is actually sent.
  • Adds capture-error fail-closed coverage and updates the stale agent config comment.

5 post-implementation reviews:
V1 correctness: approved; both nudge paths consistent.
V2 tests: sufficient and non-flaky; no E2E blocker.
V3 cleanup: passes cleanup-first review; no new shim/bandaid surface.
V4 risk review: only non-blocking residual heuristic/race caveats remain.
V5 merge review: approve with baseline-CI note.

Local verification:

  • PASS: go test ./internal/tmux -run 'Test(HasBusyIndicator|ShouldSendEscapeForLines|ShouldSendEscape_LivePane|ShouldSendEscape_CaptureErrorSuppressesEscape)' -count=1
  • PASS: go build ./...
  • BASELINE FAIL: go test ./internal/tmux -count=1 fails on existing sleep vs coreutils tmux environment expectations, unrelated to this change.

Recommendation: approve/merge the current fixed-up PR branch. If GitHub CI remains red with the same repo-wide baseline failures seen on main, do not attribute those failures to this PR.

@Bella-Giraffety

Copy link
Copy Markdown
Collaborator

CI reran after the fixup commit and remains red in unrelated baseline areas: repo-wide lint debt, convoy/session/beads runtime tests, and config expectation tests. No failure is in internal/tmux or in the new Escape-gating tests. Sheriff recommendation remains: approve/merge this fixed-up PR branch if baseline CI failures are waived or resolved separately.

@andrewboldi

Copy link
Copy Markdown
Contributor Author

Restoring a comment I removed earlier by mistake.

CI triage — red checks are a pre-existing baseline, not from this PR

This change touches only internal/tmux (two unexported helpers + one gated keystroke). The red Test / Integration Tests / Lint checks are inherited from main, verified three independent ways:

1. This PR's own tests all pass in CI:

  • internal/tmux.TestShouldSendEscapeForLines (all 6 subtests) ✅
  • internal/tmux.TestShouldSendEscape_LivePane
  • internal/tmux.TestNudgeSession_WithRetry / WithStoredPaneID / WakesAgentWindowNotActiveWindow ✅ (these only flaked in my local sandbox; CI's tmux env passes them)

2. Every failing test/lint finding is in a package/file this PR does not touch:

  • Failing tests: internal/cmd (TestGuessSessionFromWorkerDir, TestFilterAndSortSessions_SortOrder, TestRunConvoyList/Status_UsesTownRootAndStripsBeadsDir, TestGetTrackedIssues_*, TestJSONOutput_*, TestEnsureBeadsConfigYAML_CreatesWhenMissing), internal/polecat, internal/refinery, internal/rig — all beads / town-root / session-dir tests.
  • Lint findings: internal/doltserver/doltserver.go, internal/cmd/statusline.go, internal/cmd/capacity_dispatch.go, internal/git/git.go — none in internal/tmux.

3. The base commit already fails these jobs. main @ ccd16c18 (the #4213 merge this branch forks from) shows X Test and X Lint in its own CI run — i.e. main is red independently of this PR. (Every recent main push run is failure.)

No action needed on this PR for the red checks; they reflect existing breakage on main.

@andrewboldi

Copy link
Copy Markdown
Contributor Author

Reviewed the maintainer fixup (0ef7d665) on the contributor side — it's a strict improvement and I agree with it:

  • R3 fix is correct. Snapshotting sendEscape before injecting the nudge text closes the false-positive where a message containing esc to interrupt could suppress the idle vim-mode Escape. No inverse risk: an idle pane can't become busy before the Enter (typing doesn't start generation — only the Enter does), so this doesn't add false-negatives. Applied symmetrically in NudgePane. 👍
  • The capture-error fail-closed test and the stale EscapeCancelsRequest comment cleanup are good catches.

Local verification on the fixed-up branch: go test ./internal/tmux -run 'Test(HasBusyIndicator|ShouldSendEscape)' passes; go build / go vet / gofmt clean.

No further changes from my side — good to merge. The esc to interrupt heuristic fragility is tracked separately in #4240.

@Bella-Giraffety Bella-Giraffety merged commit 8d0ca12 into gastownhall:main Jun 12, 2026
7 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug Broken behavior priority/p1 High — core workflow broken status/review-approved Automated review approved

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants