fix: per-channel dispatch state for cross-thread concurrent turns by yourbuddyconner · Pull Request #48 · tkhq/valet

yourbuddyconner · 2026-06-12T04:55:00Z

Summary

Reworks the abort / completion / dispatch path to be per-channel and per-row so concurrent thread turns enabled by PR Fix scheduled trigger prompt queue preservation #42 actually dispatch in parallel, the Stop button is scoped to one thread, and a class of "default to wipe everything" fallbacks no longer destroy sibling threads' state.
Adds channel context (channelType / channelId) to the runner→DO aborted frame so completion routes correctly under concurrent dispatch when messageId is missing (getProcessingChannelKey() returns null on 2+ processing rows; the stuck-processing watchdog only fires when the runner is disconnected).
Client now tracks per-thread status / pending follow-ups / abort sentinel / in-flight HTTP send, so cross-thread turns each render their own chrome and a Stop on one thread doesn't optimistically clear siblings.

Test plan

pnpm typecheck — clean
pnpm test — 748 worker tests pass (added regression coverage for aborted-frame channel routing, no-context drain skip, markCompletedById(undefined) no-op, markCompletedMostRecentByChannel, and watchdog action payloads)
cd packages/client && pnpm build — clean
Manual: trigger three manual automations and confirm they dispatch in parallel (the original PR Fix scheduled trigger prompt queue preservation #42 symptom)
Manual: Stop button on one thread while siblings are running — only that thread aborts
Manual: Slack-channel-scoped stop with a threaded message in flight — abort fans out to the active thread, not session-wide
Manual: thread error followed by a sibling-thread completion — session chrome clears, errored thread chip stays red until next prompt

Deploy notes

Bump IMAGE_BUILD_VERSION in backend/images/base.py before make deploy-modal — packages/runner/ changed and the sandbox image is cache-keyed by that version. Without the bump, the DO ships expecting new aborted-frame fields that old runner images don't send (graceful fallback exists, but the R6-F2 fix won't actually activate).
Watch CF observability for the first ~24h after merge: grep for markCompletedById log lines and stuck-processing watchdog firings. A row that wedges from concurrent dispatch under a connected runner is invisible to the 5-min watchdog (symptom: thread silently stuck on 'thinking' until session reload).

Process

Six rounds of adversarial code review during the rewrite; each round found real regressions in the prior round's fixes. The recurring pattern was scoping mistakes at the abort/completion boundary under concurrent dispatch, plus latent "when in doubt, do it to everything" fallbacks that were harmless in the single-thread era and catastrophic with concurrent threads in flight. The single squashed commit is the converged form after all six rounds.

PR #42 enabled cross-thread concurrent dispatch (TKAI-65) but several session-wide state slots and code paths still assumed only one prompt could be in flight at a time. The result: three manual automations triggered three sandbox threads but the prompts dispatched serially; the Stop button on one thread aborted every active thread; errored threads left the session chrome stuck on 'thinking' forever; and a class of "default to wipe everything" fallbacks could destroy sibling threads' state under the right race. This branch reworks the dispatch / abort / completion path to be per-channel and per-row, with channel-scoped fan-out and routing fallbacks that never escalate to session-wide destruction. Worker (SessionAgentDO + PromptQueue) - prompt_queue rows now carry per-row dispatched_at and received_at, replacing the global lastPromptDispatchedAt / promptReceivedAt state-key slots. Watchdog reads min(dispatched_at) for stuck- processing detection. - channel_state table tracks per-channel busy + per-channel idle_queued_since + per-channel error_safety_net_at. The watchdog's RecoveryAction discriminated union carries the channel_key (or messageId) for each variant so force_complete / revert_and_drain / clear_safety_net target the right row. - sendNextQueuedPrompt drains every dispatchable cross-channel queued prompt in one pass (skippedBusyIds Set), instead of dispatching one and exiting. Per-channel busy guards prevent same-channel double dispatch; cross-channel concurrency is unlocked. - handleAbort with channelType='thread' aborts only that thread. Channel-scoped aborts (e.g. {slack, C1}) fan out one frame per DISTINCT active thread on that channel. Lookup matches BOTH channel_key AND (reply_channel_type, reply_channel_id) so threaded Slack/Telegram messages (whose channel_key was rewritten to 'thread:<tid>') are still found. - aborted frame from the runner now carries channelType/channelId. The DO uses it as the second-choice channel resolver when messageId is missing — getProcessingChannelKey() returns null when there are 2+ processing rows (the exact concurrent-dispatch case), so without this the row would wedge forever (the stuck-processing watchdog only fires when the runner is disconnected). - markCompletedById(undefined) is now a no-op. The previous escalation to unscoped markCompleted() ran DELETE FROM prompt_queue WHERE status='completed' after wiping every processing row, orphaning sibling channels' runner state. Replaced with markCompletedMostRecentByChannel(channelKey) so a missing-messageId ack completes exactly one row on the resolved channel. - handlePromptComplete skips queue drain when neither messageId nor channel context is resolvable, so a confused-abort frame can't surprise the user by dispatching an unrelated channel's queued prompt. - 'aborted' handler emits a scoped agentStatus:idle (via getChannelTargetById) — N fan-out frames no longer produce N unscoped idle broadcasts that flip session-wide chrome. Runner (PromptHandler / AgentClient) - Abort handler dispatched fire-and-forget (IIFE wrapper captures synchronous throws) so a channel-wide fan-out of N abort frames doesn't serialize the inbound WS loop behind N OpenCode /abort round-trips. - sendAborted forwards channelType/channelId so the DO can route the completion under concurrent dispatch. - Removed dead lastPromptSentAt / pendingReplyChannelType / pendingReplyChannelId fields from PromptHandler. Client (use-chat / chat-container) - threadStatuses: per-thread {status, detail} map replaces the single agentStatus slot for concurrent-thread rendering. Each thread renders its own Stop button. - threadPendingFollowups: per-thread map of queued follow-up prompts, replacing the single pendingPrompt slot. - abortingThreadsRef: 30s sentinel on the aborting thread suppresses late agentStatus events that left the runner before the abort frame arrived; cleared on the runner's confirmation 'idle' (or 'error' — for teardown errors). - pendingHttpSendsRef: per-thread AbortController cancels the in-flight HTTP POST when the user aborts, so a large-payload send in flight when /stop is clicked doesn't land in the DO as an orphaned prompt. - Session-switch effect clears all three refs so cross-session thread-id collisions don't suppress lazy loading or apply stale optimistic state. - 'error' status persists against the runner's trailing 'idle' (in both the agentStatus reducer AND the runnerBusy=false status reducer) so the error chip stays visible until the next prompt. anyOtherActive excludes 'error' so the session chrome can clear even when a sibling thread is in a sticky error state. - isDispatchBusy uses per-thread state when activeThreadId is set, falling back to session-wide isSessionBusy otherwise — fixes command-bar state on threadless flows while keeping per-thread state on threaded ones. Protocol - call-tool envelope carries opencodeSessionId; OpenCode tool wrapper sets x-opencode-session-id so the gateway routes tool calls to the originating session. - aborted frame carries channelType?/channelId? (optional, backward- compatible with old runners — DO falls back to legacy resolution). Tests - 748 worker tests pass; pnpm typecheck clean; client pnpm build clean. - Added regression tests: - aborted{channelType,channelId} routes completion under 2+ processing rows - aborted{} with no context does not wipe siblings or drain queued - markCompletedById(undefined) is a no-op (does not escalate) - markCompletedMostRecentByChannel scopes completion to the channel - SessionHealthMonitor: force_complete carries channelKey; clear_safety_net carries channelKey - prompt-queue test mock extended to handle WHERE channel_key = ?, SELECT DISTINCT thread_id, MIN/MAX(dispatched_at). Six rounds of adversarial code review during the rewrite; each found real regressions in the prior round's fixes. The pattern was almost always the same: scoping mistakes at the abort/completion boundary under concurrent dispatch, plus latent "when in doubt, do it to everything" fallbacks that were harmless in the single-thread era and catastrophic with concurrent threads in flight.

github-actions · 2026-06-12T04:56:26Z

Preview deployment: https://pr-48.dev-valet-turnkey-client.pages.dev

packages/runner/src/agent-client.ts and prompt.ts changed in this branch, both of which run inside the sandbox. Without this bump, the cached image ships without the new aborted-frame channel context the DO now relies on to route completion under concurrent dispatch.

yourbuddyconner requested a review from a team June 12, 2026 04:55

yourbuddyconner temporarily deployed to dev June 12, 2026 04:55 — with GitHub Actions Inactive

yourbuddyconner temporarily deployed to dev June 12, 2026 04:58 — with GitHub Actions Inactive

yourbuddyconner merged commit a31d7d6 into main Jun 12, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: per-channel dispatch state for cross-thread concurrent turns#48

fix: per-channel dispatch state for cross-thread concurrent turns#48
yourbuddyconner merged 2 commits into
mainfrom
fix/per-channel-dispatch-state

yourbuddyconner commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yourbuddyconner commented Jun 12, 2026

Summary

Test plan

Deploy notes

Process

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant