Skip to content

fix(cli): submit fast tool results after stream end#5071

Open
he-yufeng wants to merge 1 commit into
QwenLM:mainfrom
he-yufeng:fix/auto-edit-completed-tool-race
Open

fix(cli): submit fast tool results after stream end#5071
he-yufeng wants to merge 1 commit into
QwenLM:mainfrom
he-yufeng:fix/auto-edit-completed-tool-race

Conversation

@he-yufeng

Copy link
Copy Markdown
Contributor

What this PR does

Fixes the tool-result handoff race where a very fast tool completion could be dropped after the model stream had already ended but before React replaced handleCompletedTools with a fresh callback. The hook now tracks the actual sendMessageStream lifetime in a synchronous ref and uses that for the active-stream guard, while keeping the existing isResponding state for UI rendering.

Why it's needed

#4672 reports AUTO_EDIT / YOLO turns stalling after a read/edit error until the next user instruction. The maintainer root-cause analysis points to handleCompletedTools reading stale isResponding from a React callback closure: the stream finishes, a tool fails immediately, and the old callback still sees isResponding === true, returns early, and never sends the tool error back to the model. AUTO_EDIT / YOLO removes the approval delay, so the race is easier to hit there.

Reviewer Test Plan

How to verify

Run npm run test --workspace=packages/cli -- src/ui/hooks/useGeminiStream.test.tsx. The new regression test is submits a fast tool result after the stream ended but before React replaces the callback; it saves the callback captured while React state still says responding, ends the held stream, then fires a fast failed read_file result through that stale callback and verifies a ToolResult request is sent.

Evidence (Before & After)

Before: the stale callback path returned at the isResponding guard and dropped the tool result, so the second sendMessageStream call never happened. After: the guard reads the actual stream-lifetime ref, sees that the stream has ended, marks the tool submitted, and sends the failed tool result back to the model.

Tested on

OS Status
🍏 macOS ⚠️ not tested
🪟 Windows ✅ tested
🐧 Linux ⚠️ not tested

Environment (optional)

Windows 11, Node v24.15.0, npm 11.12.1.

Validation run locally:

npm run test --workspace=packages/cli -- src/ui/hooks/useGeminiStream.test.tsx -t "fast tool result"  # 1 passed
npm run test --workspace=packages/cli -- src/ui/hooks/useGeminiStream.test.tsx                      # 104 passed
npm run typecheck --workspace=packages/cli                                                        # passed
npm run lint --workspace=packages/cli -- src/ui/hooks/useGeminiStream.ts src/ui/hooks/useGeminiStream.test.tsx  # passed
npm run build --workspace=packages/cli                                                            # passed

Risk & Scope

  • Main risk or tradeoff: the guard now depends on actual stream lifetime instead of React render timing. A small counter is used so overlapping streams, such as /btw, do not clear the guard early.
  • Not validated / out of scope: manual YOLO-mode terminal reproduction on macOS/Linux was not run locally; the regression is covered at the hook level.
  • Breaking changes / migration notes: none.

Linked Issues

Fixes #4672

中文说明

What this PR does

修复一个工具结果回传的竞态:模型流已经结束,但 React 还没来得及把 handleCompletedTools 换成新闭包时,快速完成或快速失败的工具结果可能被旧闭包丢掉。现在 hook 用同步 ref 记录真实的 sendMessageStream 生命周期,并用它做 active-stream guard;原来的 isResponding state 仍然只负责 UI 渲染语义。

Why it's needed

#4672 反馈 AUTO_EDIT / YOLO 模式下读文件或编辑出错后,本轮不会继续更新文件,需要下一次用户指令才恢复。maintainer 已定位根因:handleCompletedTools 从 React callback 闭包里读到旧的 isResponding。时序是模型流先结束,工具马上失败,旧 callback 仍认为 isResponding === true,于是提前 return,模型收不到工具错误,agent loop 卡住。AUTO_EDIT / YOLO 没有 approval 延迟,所以更容易撞到这个窗口。

Reviewer Test Plan

How to verify

运行 npm run test --workspace=packages/cli -- src/ui/hooks/useGeminiStream.test.tsx。新增回归测试 submits a fast tool result after the stream ended but before React replaces the callback:测试先保存 React state 仍是 responding 时捕获的旧 callback,再结束 held stream,然后让一个快速失败的 read_file 结果走这个旧 callback,验证它仍会发送 ToolResult 请求。

Evidence (Before & After)

修复前:旧 callback 会在 isResponding guard 直接 return,工具结果被丢弃,第二次 sendMessageStream 不会发生。修复后:guard 读取真实 stream-lifetime ref,发现模型流已经结束,于是标记工具已提交,并把失败的工具结果回传给模型。

Tested on

OS Status
🍏 macOS ⚠️ not tested
🪟 Windows ✅ tested
🐧 Linux ⚠️ not tested

Environment (optional)

Windows 11,Node v24.15.0,npm 11.12.1。

Risk & Scope

  • 主要风险/取舍:guard 现在依赖真实 stream 生命周期,不再依赖 React render 时机。这里用了一个小 counter,避免 /btw 这类重叠 stream 提前清掉 guard。
  • 未验证/不在范围内:本地没有在 macOS/Linux 上手动复现 YOLO 终端流程;本 PR 用 hook 级回归测试覆盖该竞态。
  • 破坏性变更/迁移:无。

Linked Issues

Fixes #4672

});

await act(async () => {
await staleOnComplete?.([fastFailedTool]);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] TS2349: staleOnComplete is typed as ((completedTools: TrackedToolCall[]) => Promise<void>) | null, but fastFailedTool is TrackedCompletedToolCall. These types are incompatible — tsc --noEmit fails with "This expression is not callable." Tests pass at runtime because vitest doesn't typecheck.

Suggested change
await staleOnComplete?.([fastFailedTool]);
await (staleOnComplete as ((tools: TrackedCompletedToolCall[]) => Promise<void>) | null)?.([fastFailedTool]);

Or update capturedOnComplete's type annotation to accept TrackedCompletedToolCall[].

— qwen3.7-max via Qwen Code /review

@@ -2409,7 +2418,6 @@ export const useGeminiStream = (
submitQuery(responsesToSend, SendMessageType.ToolResult, prompt_ids[0]);
},
[

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] Missing cancellation guard: removing isResponding from the dependency array eliminates the stale-closure that accidentally protected against post-cancellation tool submissions. Now activeModelStreamsRef.current always reads 0 after stream abort, so handleCompletedTools proceeds to submitQuery at line 2418 — sending tool results (potentially containing file contents or shell output) to a new API call the user explicitly tried to cancel via Ctrl+C.

Scenario: Ctrl+C → stream aborts → counter→0 in finally → tool completes → activeModelStreamsRef.current > 0 guard passes → submitQuery fires unwanted API call. The submitQuery function resets turnCancelledRef.current = false, so the cancel state is lost.

Before this PR, the stale isResponding === true closure returned early. Now the ref is always current, making this path deterministic.

Fix: Add a cancellation check before submitQuery at line 2418:

if (turnCancelledRef.current || abortControllerRef.current?.signal.aborted) {
  markToolsAsSubmitted(callIdsToMarkAsSubmitted);
  return;
}

— qwen3.7-max via Qwen Code /review

@wenshao

wenshao commented Jun 13, 2026

Copy link
Copy Markdown
Collaborator

✅ Local verification report — PR #5071

Verified locally by a maintainer on macOS — the OS this PR's matrix marked ⚠️ not tested. I reproduced every check in the test plan, plus an adversarial test-effectiveness counter-proof and a real-CLI end-to-end run driven through tmux.

Environment

  • macOS (Darwin 25.5.0), Node v22.22.2, npm 10.9.7
  • PR head 8ed1615 (parent 66c6986); fresh git worktree + npm ci + npm run bundle

Results at a glance

# Check Result
1 Code review of the race fix ✅ mechanism sound
2 Full hook suite useGeminiStream.test.tsx (macOS) 104 passed
3 Test-effectiveness counter-proof (revert source, keep test) decisive — only the new test goes red
4 lint / typecheck / build / bundle ✅ all exit 0
5 Real CLI end-to-end (tmux TUI + mock OpenAI, YOLO) ✅ loop continues past the tool error

1. Code review

The fix swaps the early-return guard in handleCompletedTools from if (isResponding) (React state — lags one render, so a captured callback closure reads a stale value) to if (activeModelStreamsRef.current > 0) (a ref — updated synchronously). Confirmed:

  • The ref is +1 before the stream starts and -1 in finally; setIsResponding(false) only fires when the counter reaches 0, so overlapping streams (e.g. /btw) don't clear the guard early. Math.max(0, …) guards against underflow.
  • isResponding is dropped from the useCallback deps and is no longer referenced anywhere inside handleCompletedTools, so the dep removal introduces no new stale closure. Its only remaining consumer is the streamingState memo (UI). Single-stream UI semantics are unchanged; overlapping-stream UI becomes strictly more correct (stays Responding for the whole overlap).

2. Full suite on macOS

npm run test --workspace=packages/cli -- src/ui/hooks/useGeminiStream.test.tsx
→ Test Files 1 passed (1) | Tests 104 passed (104)

Matches the author's Windows run and fills the macOS gap.

3. Test-effectiveness counter-proof — the decisive evidence

A regression test for a race is only worth as much as its ability to fail on the bug. I reverted only useGeminiStream.ts to the PR's parent 66c6986 (restoring the if (isResponding) return bug) while keeping the PR's test file, then re-ran the suite:

Tests  1 failed | 103 passed (104)

The single failure is the new test submits a fast tool result after the stream ended …, at expect(mockSendMessageStream).toHaveBeenCalledTimes(2) (a waitFor timeout) — i.e. on the buggy source the stale callback hits the isResponding guard, drops the tool result, and the tool-result sendMessageStream never happens. Restoring the PR source → back to 104/104. This proves the new test pinpoints exactly the race it claims to, with zero collateral on the other 103 tests.

4. Static checks

lint (changed files), typecheck (whole cli package), build, and bundle each exit 0. The bundled dist/cli.js contains the activeModelStreamsRef symbol, confirming the artifact used in check #5 is the fixed build.

5. Real CLI end-to-end (tmux + mock OpenAI server, YOLO)

This race lives in the React TUI path (useGeminiStream), not the headless -p path — so I drove the real bundled CLI (dist/cli.js) in an interactive tmux TUI, YOLO mode, pointed at a local mock OpenAI server. Scenario = #4672's shape: the model calls read_file on a missing path → ENOENT → the CLI must hand the error back and continue.

Observed in the TUI:

✦ Reading the requested file now.
x  ReadFile {"file_path":"/tmp/…-DEADBEEF.txt"}
   File not found: /tmp/…-DEADBEEF.txt
✦ CONTINUATION_OK the read_file error was received: the file does not exist…

Mock server log shows the two model requests, the second carrying the tool result back 145 ms after the first:

turn 1   messages=2   lastRole=user   → model emits read_file tool_call
turn 2   messages=4   lastRole=tool   → continuation        (Δ 145 ms)

The existence of turn 2 (lastRole: tool) is the end-to-end signal that the failed tool result was submitted and the agent loop continued — the exact behavior #4672 reported as broken.

Honest scope note: this race is a sub-render-timing window; on the happy path both the fixed and the buggy build can emit turn 2 (the window isn't hit on every run). So this run proves the fixed CLI works end-to-end and the change doesn't regress normal tool-error handling — the deterministic regression guarantee comes from check #3, not from this run.

Verdict

LGTM — recommend merge. Every claim in the PR test plan reproduces on macOS, the new regression test is proven to actually catch the race (red on the parent commit, green on the PR), and the real CLI handles the YOLO tool-error → continue path end-to-end with no regression.

🇨🇳 中文版(点击展开)

✅ 本地验证报告 — PR #5071

由维护者在 macOS 上本地验证 —— 正是本 PR 矩阵里标记 ⚠️ not tested 的系统。我复现了测试计划中的每一项,并额外做了一次"测试有效性反证"和一次通过 tmux 驱动的真实 CLI 端到端运行。

环境

  • macOS (Darwin 25.5.0)、Node v22.22.2、npm 10.9.7
  • PR head 8ed1615(父提交 66c6986);全新 git worktree + npm ci + npm run bundle

结果速览

# 检查项 结果
1 竞态修复的代码审查 ✅ 机制正确
2 完整 hook 测试套件(macOS) 104 passed
3 测试有效性反证(回退源码、保留测试) 决定性 —— 仅新测试变红
4 lint / typecheck / build / bundle ✅ 全部 exit 0
5 真实 CLI 端到端(tmux TUI + mock OpenAI,YOLO) ✅ 工具出错后 loop 继续

1. 代码审查

修复把 handleCompletedTools 里的早退守卫从 if (isResponding)(React state,会滞后一个 render,被捕获的旧闭包读到陈旧值)改为 if (activeModelStreamsRef.current > 0)(ref,同步更新)。确认:

  • ref 在 stream 开始前 +1、在 finally-1;只有计数归零才 setIsResponding(false),因此重叠 stream(如 /btw)不会提前清掉守卫。Math.max(0, …) 防下溢。
  • isRespondinguseCallback 依赖数组移除, handleCompletedTools 函数体内已不再引用它,所以移除依赖不会引入新的 stale closure。它唯一的剩余消费者是 streamingState memo(UI)。单 stream 的 UI 语义不变;重叠 stream 的 UI 反而更正确(整个重叠期持续显示 Responding)。

2. macOS 全套测试

npm run test --workspace=packages/cli -- src/ui/hooks/useGeminiStream.test.tsx
→ Test Files 1 passed (1) | Tests 104 passed (104)

与作者的 Windows 结果一致,补上了 macOS 的空缺。

3. 测试有效性反证 —— 决定性证据

竞态的回归测试,其价值取决于它"在 bug 上能否变红"。我useGeminiStream.ts 回退到 PR 父提交 66c6986(恢复 if (isResponding) return 这个 bug),同时保留 PR 的测试文件,再跑一遍:

Tests  1 failed | 103 passed (104)

唯一失败的就是新测试 submits a fast tool result after the stream ended …,失败点在 expect(mockSendMessageStream).toHaveBeenCalledTimes(2)(waitFor 超时)—— 即在 bug 版源码上,旧闭包命中 isResponding 守卫、丢弃工具结果,回传工具结果的第二次 sendMessageStream 永远不发生。还原 PR 源码后 → 回到 104/104。这证明新测试精准命中它声称的竞态,且对其余 103 个测试零误伤。

4. 静态检查

lint(改动文件)、typecheck(整个 cli 包)、buildbundle 各自 exit 0。打包出的 dist/cli.js 含有 activeModelStreamsRef 符号,确认第 5 项用的产物就是修复后的构建。

5. 真实 CLI 端到端(tmux + mock OpenAI server,YOLO)

这个竞态存在于 React TUI 路径(useGeminiStream),不在 headless -p 路径 —— 所以我用 tmux 在交互式 TUI 里驱动真实打包的 CLI(dist/cli.js),YOLO 模式,指向一个本地 mock OpenAI server。场景即 #4672 的形态:模型调用 read_file 读一个不存在的路径 → ENOENT → CLI 必须把错误回传并继续。

TUI 中观察到:

✦ Reading the requested file now.
x  ReadFile {"file_path":"/tmp/…-DEADBEEF.txt"}
   File not found: /tmp/…-DEADBEEF.txt
✦ CONTINUATION_OK the read_file error was received: the file does not exist…

mock server 日志显示两次模型请求,第二次在第一次之后 145 ms 把工具结果带了回来:

turn 1   messages=2   lastRole=user   → 模型给出 read_file 工具调用
turn 2   messages=4   lastRole=tool   → 续轮                 (Δ 145 ms)

turn 2(lastRole: tool)的存在就是端到端信号:失败的工具结果被成功提交、agent loop 继续了 —— 正是 #4672 所报告的"坏掉的"行为。

诚实的范围说明: 这个竞态是一个 render 时机内的亚窗口;在正常路径上,修复版和 bug 版可能发出 turn 2(窗口并非每次命中)。所以这次运行证明的是"修复后的 CLI 端到端可用、且未破坏正常的工具错误处理",而确定性的回归保证来自第 3 项,而非这次运行。

结论

LGTM —— 建议合并。 PR 测试计划中的每一项都在 macOS 复现;新回归测试被证明确实能抓住该竞态(在父提交上红、在 PR 上绿);真实 CLI 在 YOLO 工具出错 → 继续的路径上端到端工作,无回归。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

自动接受编辑和YOLO模式下因为读取出错不去更新文件,需要再下一次指令才会更新本地文件,频繁触发。

2 participants