Tags: coder/mux
Tags
🤖 refactor: auto-cleanup (#3543) ## Summary Long-lived auto-cleanup PR. Each pass applies at most one extremely low-risk, behavior-preserving cleanup picked from recently merged `main` commits, then advances the checkpoint below. ## This pass Extracted the Agent Memory sub-experiment list into a `MemorySubExperimentRows` component in `src/browser/features/Settings/Sections/ExperimentsSection.tsx`. The toggles were the only nested config rendered as an inline `.map` in the section render, while every sibling nested config (`AdvisorToolExperimentConfig`, `HeartbeatDefaultsControls`) is an extracted component wrapped in `ExperimentSettingsPanel`. Pulling the map into its own component makes the call site mirror those siblings. - Behavior-preserving: identical DOM output (same `ExperimentSettingsPanel` wrapper and inner divider list); gating on `useExperimentValue(MEMORY)` is unchanged. - Single file; covered by `ExperimentsSection.test.tsx` / `ExperimentsSection.advisor.test.tsx` (13 pass) and the `MemorySettingsEnabled` Storybook play assertions. <details> <summary>Previous passes (still part of this PR's diff)</summary> **Drop redundant array spreads in `deriveTasksSectionAgentGroups`** — Removed three redundant `[...visible]` array spreads in `src/browser/features/Settings/Sections/TasksSection.agents.ts`. Each call was `[...visible].filter(...).sort(...)`, but `Array.prototype.filter` already returns a fresh array, so the subsequent in-place `.sort()` mutates that copy rather than the shared `visible` array. Behavior-preserving; covered by `TasksSection.test.ts` (5 pass). **Dedupe workspace-by-id lookup in host actions** — Deduplicated the repeated "list workspaces and find one by id" pattern that the `workspace.sendMessage` and `workspace.archive` workflow host actions both inlined. Both call sites share a small file-local `findWorkspaceById` helper in `src/node/services/workflows/workspaceHostActions.ts` (documented as distinct from `findWorkspaceByWorkItemKey`, which deliberately reads config instead of `list()`). Behavior-preserving; covered by `workspaceHostActions.test.ts` (33 pass). </details> ## Validation - `make static-check` (eslint, typecheck x2, prettier) — passing. (`shfmt`-based shell check is skipped locally because the binary is unavailable in this environment; no shell files were touched.) - `bun test src/browser/features/Settings/Sections/ExperimentsSection.test.tsx src/browser/features/Settings/Sections/ExperimentsSection.advisor.test.tsx` — 13 pass. ## Risks Negligible. Pure local refactors: an inline render block extracted into a component, plus the prior passes' redundant-clone removal and duplicated-lookup extraction, all with unit coverage over the affected code. --- Auto-cleanup checkpoint: e785422 --- _Generated with `mux` • Model: `anthropic:claude-opus-4-8` • Thinking: `xhigh` • Cost: `n/a`_ <!-- mux-attribution: model=anthropic:claude-opus-4-8 thinking=xhigh costs=n/a --> --------- Co-authored-by: mux-bot[bot] <264182336+mux-bot[bot]@users.noreply.github.com>
🤖 feat: allow model and thinking overrides on sub-agent launch (#3484) ## Summary Adds optional `model` and `thinking` parameters to the `task` (sub-agent launch) tool so a launching agent can override the spawned sub-agent's model and thinking level. Both default to the existing **inherit** semantics, and the param descriptions instruct the model to omit them unless explicitly told otherwise (and to never assume a particular model is available). ## Background Previously a sub-agent's model/thinking were always inherited (parent live runtime env → parent workspace AI settings → per-agent config defaults → global default). There was no way for a launching agent to request a specific model/thinking for a delegated task. The override slots (`TaskCreateArgs.modelString` / `thinkingLevel`) already existed at the top of `resolveTaskAISettings`' precedence chain but were never wired to the tool. ## Implementation - **Schema** (`toolDefinitions.ts`): new `model` and `thinking` fields on the task tool, both `.nullish()` (strict-mode convention). `thinking` is preprocessed to coerce a JSON number to a string before parsing. - **Handler** (`tools/task.ts`): `parseTaskAiOverrides` parses the inputs **reusing the exact UI logic** and forwards them to `taskService.create`. Invalid input throws a descriptive tool error before any task is spawned. The parent runtime fallback hint is still forwarded so unspecified fields keep inheriting. - **DRY model parsing**: relocated `normalizeModelInput` from `src/browser/utils/models/` to `src/common/utils/ai/` (history preserved via `git mv`; its only deps were already in `common`), updated the browser importers, and collapsed the duplicate ACP `normalizeModelForCommand` onto the shared helper. - **DRY thinking parsing**: handler uses `parseThinkingInput` (named levels OR numeric indices). Numeric indices stay deferred as `ParsedThinkingInput` and are resolved against the **chosen** model's policy via `resolveThinkingInput` inside `resolveTaskAISettings` (so `2` means the same thing it does for `/model+level` in the UI). `TaskCreateArgs.thinkingLevel` was widened from `ThinkingLevel` to `ParsedThinkingInput`; existing concrete-level callers are unaffected (named levels pass through unchanged). ## Validation - `make static-check` green (lint, typecheck, docs sync regenerated for the two new tool params). - New tests: task-tool handler forwards an alias + named thinking, forwards a numeric thinking as a deferred index, and rejects an invalid model before spawning; taskService resolves a numeric thinking override against the inherited model's policy. Full `taskService.test.ts` (150) and `task.test.ts` (23) pass. ## Risks Low. Behavior is opt-in; omitting both params preserves the prior inherit path exactly. The `thinkingLevel` type widening is backward-compatible (named levels are a subset of `ParsedThinkingInput` and pass through `resolveThinkingInput` unchanged). The ACP refactor is a straight delegation to an already-tested helper. --- _Generated with `mux` • Model: `anthropic:claude-opus-4-8` • Thinking: `xhigh`_ <!-- mux-attribution: model=anthropic:claude-opus-4-8 thinking=xhigh -->
🤖 feat: add deep-review auto-fix loop mode (#3478) ## Summary Adds looped auto-fix support to `deep-review-workflow`, including iteration budgeting, replay-safe workflow step IDs, validation gating, branch/HEAD drift safety, and clearer workflow task actions that only offer task-workspace navigation when the child workspace still exists. ## Background `deep-review-workflow --fix` previously performed a single review/fix pass. This change adds a `--loop` mode so the workflow can re-review after applied fixes and stop when the code is clean, the fix budget is exhausted, validation fails or is not run, no fix progress is made, or the loop safety cap is reached. It also fixes a workflow-events UX issue where stale task-workspace navigation could be shown after a child task workspace had been deleted. ## Implementation - Adds `--loop` / `--no-loop` parsing and `maxLoopIterations` handling. - Splits the workflow into reusable single-pass and loop orchestration paths with per-iteration step suffixes to avoid stale durable replay results. - Tracks `maxFixes` as a run-wide budget and performs a read-only confirmation pass after budget exhaustion. - Hardens auto-fix preflight for local clean worktrees, reviewed branch/HEAD drift, detached HEAD, ambiguous hex-like refs, and durable checkpoint retries after a patch advances `HEAD`. - Requires post-fix validation to pass before continuing loop iterations; `not-run` now stops with an incomplete loop result. - Adapts scratch workflow ignore handling to the latest lazy scratch-ignore behavior on `main`. - Makes workflow task workspace/open affordances depend on live workspace metadata, while keeping completed task reports accessible after child workspaces are deleted. ## Validation - `bun test src/node/services/workflows/builtInWorkflowDefinitions.test.ts --timeout 20000` - `bun test src/node/services/workflows/WorkflowDefinitionStore.test.ts --timeout 20000` - `bun test src/node/services/workflows/builtInWorkflowDefinitions.test.ts` - `bun test src/node/services/workflows/WorkflowDefinitionStore.test.ts` - `bun test src/browser/features/Tools/WorkflowRunToolCall.test.tsx` - `make typecheck` - `make static-check` ## Risks This touches durable workflow orchestration, parent-workspace patch application safety, scratch workflow ignore behavior, and workflow-run task actions. Regression coverage was added for loop exits, run-wide budgets, replay after applied patches, detached/ambiguous refs, validation not-run handling, scratch gitignore edge cases, and stale/deleted task-workspace actions. ## Pains The branch was rebased after `main` landed lazy scratch workflow ignore handling and inline workflow task structured output, so the final branch explicitly resolves those overlaps and was revalidated after the rebase. --- _Generated with `mux` • Model: `openai:gpt-5.5` • Thinking: `xhigh` • Cost: `$193.56`_ <!-- mux-attribution: model=openai:gpt-5.5 thinking=xhigh costs=193.56 --> --------- Signed-off-by: Thomas Kosiewski <tk@coder.com>
🤖 feat: show file edit LoC delta preview (#3472) Summary - Shows a compact line delta preview (for example `+433, -23`) in successful file edit tool headers so it remains visible when the tool call is collapsed. - Counts unified diff payload additions/deletions while ignoring `---`/`+++` file headers. - Adds UI coverage for the collapsed header behavior and delta-count helper coverage. Background - Collapsed file edit tool calls previously showed the target path but not the size of the applied change, making it harder to scan completed write/edit operations. Validation - `bun test src/browser/features/Tools/FileEditToolCall.ui.test.tsx` - `make typecheck` - `make static-check` Risks - Low; the change only derives a display-only summary from the existing diff already rendered by the tool call UI. --- _Generated with `mux` • Model: `openai:gpt-5.5` • Thinking: `high` • Cost: `n/a`_ <!-- mux-attribution: model=openai:gpt-5.5 thinking=high costs=n/a -->
🤖 fix: prefer foreground workflow runs (#3462) Summary Update workflow tooling guidance so agents default to foreground `workflow_run` calls and reserve background workflow runs for cases where parallel or independent work can proceed. Background Foreground workflow runs return the completed result directly. Recommending foreground by default avoids an unnecessary `task_await` round-trip when the agent has no useful work to do while the workflow runs. Implementation - Added schema-level guidance to `workflow_run.run_in_background`. - Updated the `workflow_run` tool description and `task_await` warning text. - Updated the workflow authoring skill with foreground-first workflow invocation guidance. - Regenerated tool hook docs and built-in skill content. Validation - `make static-check` - `bun test src/common/utils/tools/toolDefinitions.test.ts` - `make typecheck` Risks Low. This changes tool and skill guidance text only; no workflow execution behavior changes. --- _Generated with `mux` • Model: `openai:gpt-5.5` • Thinking: `xhigh` • Cost: `$2.89`_ <!-- mux-attribution: model=openai:gpt-5.5 thinking=xhigh costs=2.89 -->
🤖 fix: filter streamText's synthesized default finish in StreamManager ( #3441) ## Summary Drop `ai`'s synthesized-default `finish` part inside `StreamManager` so that PR #3415's missing-terminal-event guard turns a clean upstream EOF into a retryable `stream_truncated` error for **both** OpenAI and Anthropic providers, instead of silently committing partial output as if the assistant finished cleanly. ## Background PR #3415 added a `receivedTerminalEvent` guard in `StreamManager` that surfaces a missing terminal SSE event as a retryable `stream_truncated` error. That guard only fires when the SDK stream ends without emitting any `finish` part at all. Empirically that branch was unreachable: every real OpenAI and Anthropic stream ends with a `finish` part — but on truncated upstreams the part is a **synthesized default**, not a real terminal signal. The synthesis originates inside the `ai` package's `streamText`. Its internal `runStep` TransformStream initializes: ```js let stepFinishReason = "other"; let stepRawFinishReason = void 0; ``` and unconditionally emits those values from its own `flush()` at end-of-stream — even when the upstream SSE closed before any terminal event arrived. So every adapter ends up looking, at the StreamManager boundary, like it cleanly finished with `(other, undefined)` regardless of whether it actually did. Per-provider truncation behavior, observed in the installed source: - **OpenAI** (`@ai-sdk/openai`): each adapter (Responses, Chat Completions, legacy Completions) initializes its own `finishReason = { unified: "other", raw: undefined }` and emits it from its own `flush()`. `streamText` normalizes that to `(other, undefined)` and forwards it. - **Anthropic** (`@ai-sdk/anthropic`): the adapter has no `flush()` and only emits its `finish` part on a real `message_stop`. On a truncated stream there is no adapter-level finish at all — and `streamText`'s `runStep.flush()` then synthesizes the same `(other, undefined)` part. Same symptom at the StreamManager layer, two different SDK-internal causes. ## Implementation Filter the synthesized default at the `streamText` → `StreamManager` boundary — the layer that actually produces it. In `StreamManager.processStreamWithCleanup`'s `case "finish":` handler, treat a part whose normalized `finishReason === "other"` **and** `rawFinishReason === undefined` as a non-event: do not set `receivedTerminalEvent = true`. The existing `!receivedTerminalEvent` branch below then routes the stream to `handleTruncatedStreamCompletion`, which writes a retryable `stream_truncated` partial with the streamed text preserved. **Why the discriminator is safe (empirical):** - **OpenAI** — `mapOpenAIResponseFinishReason` and `mapOpenAIFinishReason` only return `unified: "other"` from their `default:` branches, which are reached via `isResponseFinishedChunk` / `isResponseFailedChunk`, both of which carry a defined `raw` value. The `(other, undefined)` shape is therefore unreachable as a real OpenAI finish. - **Anthropic** — `mapAnthropicStopReason` only returns `"other"` for the `"compaction"` case and the `default:` fallback. Both call sites in the adapter (`message_delta` and `message_start` handlers) pair the unified reason with `raw: value.message.stop_reason` (a defined string from the API). `(other, undefined)` is unreachable as a real Anthropic finish. - **streamText's own flush** — the only path in this layer that produces `(other, undefined)` is the synthesized default in `runStep`'s end-of-stream flush. So the discriminator distinguishes precisely between "the SDK fabricated a finish to keep the type system happy" and "the model genuinely finished with `other`". A defensive test guards the false-positive surface: a real `(other, "compaction")` finish must pass through as a clean completion. ## Risks Behavioral change is localized to streams that previously committed partial output silently on a clean truncated EOF. After this change those surface as retryable `stream_truncated` errors — the UX PR #3415 originally intended. Regression surface is the synthesized-default discriminator itself: a false positive would treat a legitimate `(other, undefined)` finish as truncated, triggering an unnecessary retry. We mitigate by: 1. Tying the discriminator to the empirically-unreachable `(other, undefined)` shape, verified against the OpenAI and Anthropic mappers (see Implementation). 2. A regression test that asserts real `(other, <raw>)` finishes (e.g. Anthropic's `"compaction"`) still complete cleanly. If a future provider adapter does emit `(other, undefined)` as a real terminal finish, the worst case is a retry — preferable to silently committing partial output as a clean completion. ## Pains The first revision moved this same heuristic into `StreamManager` and was correctly flagged by Codex as theoretically too broad — the public `LanguageModelV2` contract permits any adapter to emit `(other, undefined)` as a legitimate terminal finish. The second revision scoped a similar filter to the `@ai-sdk/openai` adapter callsites, which was contract-safe but did not actually fix the bug: `streamText`'s `runStep.flush()` re-synthesizes the identical part one layer up, and it produced no fix at all for Anthropic where the adapter has no `flush()` to filter in the first place. This revision returns the fix to `StreamManager` but now with concrete evidence — gathered from reading `node_modules/{ai,@ai-sdk/openai,@ai-sdk/anthropic}/dist/index.js` — that the `(other, undefined)` shape is unreachable from the two real adapter mappers we care about, and is uniquely produced by `streamText`'s own flush. The discriminator's safety is a property of the two SDKs in use, not a guarantee of the public V2 contract. --- _Generated with `mux` • Model: `anthropic:claude-opus-4-7` • Thinking: `xhigh` • Cost: `$60.15`_ <!-- mux-attribution: model=anthropic:claude-opus-4-7 thinking=xhigh costs=60.15 -->
🤖 fix: prevent immersive review hunk layout flashes (#3442) ## Summary Fixes layout flashes while iterating through hunks in Immersive Review by synchronizing visual hunk state before paint and keeping the agent-assisted callout from changing the diff column height mid-file. ## Background Immersive Review hunk navigation is keyboard-first, so J/K iteration should feel stable. The previous flow corrected scroll position, cursor selection, and reveal state after paint, and the per-hunk assisted banner could mount/unmount above the diff when moving between flagged and unflagged hunks in the same file. ## Implementation - Moved file reveal gating, selected-hunk cursor/selection sync, and scroll/outline DOM writes to layout effects so they settle before the browser paints. - Added a fixed assisted-banner slot for files containing assisted hunks, so entering or leaving an assisted hunk does not reflow the diff column. - Kept the assisted callout inside the diff column and rendered it as a stable one-line row. ## Validation - `make fmt` - `bun test src/browser/features/RightSidebar/CodeReview/ImmersiveReviewView.test.tsx` - `make typecheck` - `make lint` - `make static-check` - `git diff --check` ## Risks Low-to-medium. The change is scoped to Immersive Review layout/navigation. The main risk is altered timing around scroll/cursor effects, covered by existing Immersive Review tests plus a new regression test for same-file assisted/non-assisted hunk iteration. --- _Generated with `mux` • Model: `openai:gpt-5.5` • Thinking: `xhigh` • Cost: `3826192{MUX_COSTS_USD:-0}`_ <!-- mux-attribution: model=openai:gpt-5.5 thinking=xhigh costs=10.47 -->
🤖 feat: immersive review assisted-mode badge + agent status bar (#3432) ## Summary Two gaps in full-screen **Immersive Review** are fixed: there's now a clear indicator when the **Assisted** filter is active, and a top status bar that surfaces the agent's **TODO plan** (a compact horizontal strip) alongside **live chat/streaming status** so reviewers waiting on the agent keep their signal without leaving review. ## Background Immersive review renders into an opaque overlay (`#review-immersive-root`, `absolute inset-0 z-50`) that covers the normal review panel. 1. **No Assisted-mode indication.** The Assisted toggle/badge lives in `ReviewControls`, which is hidden behind that overlay. Once immersive, the diff could be filtered to agent-flagged hunks with zero on-screen indication. (The existing per-hunk assisted *banner* only means "this hunk was flagged" — not "the worklist filter is on".) 2. **No chat status.** A common workflow is reviewing code while waiting for the agent to respond. In immersive mode the transcript, composer streaming barrier, and pinned TODO list are all hidden, so the user had no view of plan progress or streaming state. ## Implementation **Issue 1 — Assisted-mode badge** - `ReviewPanel` now threads `assistedOnly` + `assistedCount` + `assistedUnreadCount` into the immersive portal (previously only the per-hunk `assistedHunkIds`/comments were passed). - `ImmersiveReviewView` renders a header badge (`Sparkles` + `--color-review-accent` + `unread/total` via `counter-nums`) only while `assistedOnly` is on. Kept visually/semantically distinct from the per-hunk assisted banner. **Issue 2 — Agent status bar** (`ImmersiveReviewAgentStatusBar.tsx`, mounted between header and the per-hunk banner) - TODO plan rendered as a single **horizontal** strip via the shared `<TodoList layout="horizontal">` (one row tall, scrolls sideways — minimal review height), collapsible with per-workspace persisted expand state (new `getImmersiveReviewAgentBarExpandedKey`), plus a summary line ("TODO · 2 in progress · 3 pending"). - Compact streaming chip: `Starting…` / `Streaming…` / a prominent "Mux has a question". - **Flash-free & graceful:** - Chip is gated on the *held* phase from `useWorkspaceStreamingStatusPhase` (150ms), so starting↔streaming handoffs don't blink. - Because plans persist across turns, the bar stays mounted between streams and only unmounts once the held phase clears **and** there are no todos — no mid-review flicker. - Data is synchronous from `WorkspaceStore` (no skeleton needed); subscriptions live in this leaf so todo/stream churn doesn't re-render the large diff tree. - Crash-safe: falls back to empty/idle when the workspace isn't registered (renders `null` in tests/stories rather than throwing). **Banner scoping** — the per-hunk assisted comment banner now lives inside the diff column wrapper (not above the whole body), so it spans only the diff width and lines up with the code it refers to instead of stretching across the minimap + notes sidebar. ## Validation - New behavioral tests: `ImmersiveReviewAgentStatusBar.test.tsx` (plan render, idle→null, streaming/starting chips, awaiting-question precedence, collapse persistence) and an assisted-badge gating test in `ImmersiveReviewView.test.tsx`. - New Storybook stories for Chromatic coverage: `ImmersiveWithAssistedMode` (header badge), `ImmersiveWithAgentStatusBar` (horizontal TODO strip + live streaming chip), and `ImmersiveWithStreamingNoTodo` (chip-only state when streaming before any plan is written). The status-bar stories seed the `WorkspaceStore` so the bar has real state. Snapshot budget stays within limit (250/250, no bump). - Snapshot budget: rebased on `main` (current limit 250); the two new immersive stories fit within budget (249/250), so no budget change or story compression was needed. ## Risks Low. The status bar is an additive, self-subscribed leaf that returns `null` when there's nothing to show, so it can't reserve review height or cascade re-renders into the diff. The only change to existing render paths is the new header badge (gated on `assistedOnly`), three new optional props threaded from `ReviewPanel`, and moving the assisted banner inside the diff column. --- _Generated with `mux` • Model: `anthropic:claude-opus-4-8` • Thinking: `xhigh` • Cost: `$47.63`_ <!-- mux-attribution: model=anthropic:claude-opus-4-8 thinking=xhigh costs=47.63 -->
🤖 fix: make mermaid diagram controls zoom (#3433) ## Summary Fixes #3425 by making the Mermaid diagram +/- controls scale the rendered SVG instead of only changing the diagram max-height. ## Background The previous controls changed a persisted max-height value. For diagrams already shorter than that cap, clicking +/- produced no visible change, which made the controls appear broken. ## Implementation - Replace the Mermaid max-height state with clamped persisted zoom state. - Apply the zoom through a CSS variable on rendered Mermaid SVGs so the diagram visibly scales. - Switch Mermaid containers to bidirectional overflow so zoomed diagrams remain scrollable. - Mirror the CSS in the VS Code webview styles because it reuses the shared message renderer. - Add a focused regression test for the zoom controls and persisted value. ## Validation - `bun test src/browser/features/Messages/Mermaid.test.tsx` - `make typecheck` - `make lint` - `nix shell nixpkgs#shfmt nixpkgs#shellcheck nixpkgs#hadolint -c make static-check` ## Risks Low. The change is scoped to Mermaid diagram controls and CSS. Existing SVG sanitization and rendering paths are unchanged. --- _Generated with `mux` • Model: `openai:gpt-5.5` • Thinking: `xhigh` • Cost: `18011{MUX_COSTS_USD:-0.00}`_ <!-- mux-attribution: model=openai:gpt-5.5 thinking=xhigh costs=0.00 -->
PreviousNext