Skip to content

Tags: tomaioo/mux

Tags

v0.24.1-nightly.3

Toggle v0.24.1-nightly.3's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
🤖 feat: add OpenAI WebSocket transport opt-in (coder#3241)

## Summary

Adds an opt-in OpenAI WebSocket transport setting for the built-in
OpenAI provider. When `webSocketTransportEnabled` is true and the
effective OpenAI wire format is Responses, eligible streaming Responses
API requests use `@vercel/ai-sdk-openai-websocket-fetch`; existing HTTP
behavior remains the default.

## Background

OpenAI's Responses WebSocket transport can reduce setup overhead for
streaming, multi-step workflows, but Mux previously had no first-class
provider-level opt-in. This keeps the feature scoped to the built-in
OpenAI provider and preserves the saved preference when users
temporarily switch to Chat Completions.

## Implementation

- Adds `webSocketTransportEnabled` to provider config/status schemas and
OpenAI provider settings.
- Shows the WebSocket control only in Responses wire format; hides it
for Chat Completions without clearing the saved value.
- Composes the upstream WebSocket fetch through a small helper that
preserves Mux's existing OpenAI fetch wrapper for non-eligible requests.
- Attaches per-model cleanup via a Mux-owned symbol and runs cleanup
from main stream and workspace title generation paths.
- Updates provider factory, stream lifecycle, and settings tests for
activation, gating, and cleanup behavior.

## Validation

- `make static-check`
- Focused tests for config/status, provider factory activation, helper
behavior, stream cleanup, title cleanup, and Settings UI behavior.
- Dogfooded Settings UI with `agent-browser` for default/off, enabled,
Chat Completions hidden, and Responses restored states.
- Created live test workspaces, sent OpenAI chat messages, and verified
backend-side WebSocket open evidence:
`wss://api.openai.com/v1/responses`.

## Risks

The main risk is provider transport composition regressions. The
implementation pre-filters non-eligible requests so Mux's existing fetch
behavior remains responsible for non-WebSocket HTTP paths, and cleanup
is scoped per model/run to avoid process-wide socket lifetime
complexity.

---

<details>
<summary>📋 Implementation Plan</summary>

# Implementation Plan: OpenAI WebSocket Transport Opt-In

## Goal

Add a non-breaking, optional **OpenAI WebSocket Transport** setting for
the **Built-in OpenAI Provider**. When `webSocketTransportEnabled` is
persisted as `true` and the effective OpenAI wire format is Responses,
eligible streaming Responses API requests use the published OpenAI
WebSocket fetch transport. Existing HTTP behavior remains the default.

## Verified context and constraints

- Product/domain decisions are already captured in `CONTEXT.md` and
`PRD.md`:
  - canonical setting name: `webSocketTransportEnabled`
  - provider config only; no request-level override
  - exposed in Settings → Providers → OpenAI near Wire Format
- inactive/disabled for Chat Completions while preserving the saved flag
  - no custom base URL validation
  - no automatic HTTP fallback after WebSocket failures
- use `@vercel/ai-sdk-openai-websocket-fetch`; do not implement the
WebSocket protocol locally
- per-stream connection lifecycle; explicit cleanup on
completion/error/cancel
  - no ADR for this iteration
- Repo investigation found existing OpenAI-specific provider
config/status/UI patterns to mirror:
- `serviceTier`, `wireFormat`, and `store` in provider config/status/UI
  - OpenAI status values are validated before surfacing to the frontend
- `ProvidersSection.tsx` already has adjacent OpenAI settings for
Service tier, Wire format, and Response storage
- Repo investigation found the main runtime seams:
- `providerModelFactory.ts` creates OpenAI models through
`createOpenAI({ ..., fetch })`
- the OpenAI branch already wraps fetch for Mux headers, DevTools
capture/stripping, Codex OAuth normalization/routing, and custom fetch
handling
- `streamManager.ts` owns the main guaranteed stream cleanup `finally`
path
- `workspaceTitleGenerator.ts` is another `streamText` owner using
`AIService.createModel()` models
- Upstream AI SDK docs confirm that OpenAI provider instances accept a
custom `fetch`, `createWebSocketFetch()` is passed to `createOpenAI({
fetch })`, the package exposes `.close()`, and only streaming `POST
/responses` requests use WebSocket while other requests fall through to
standard fetch.

## Recommended approach

**Approach A: Provider-config opt-in + small WebSocket fetch composition
module + language-model cleanup symbol**

Net product-code LoC estimate: **~230–360 LoC**

Estimated product-code breakdown:
- config/status schemas and provider service surfacing: ~20–35 LoC
- Settings UI control and helpers: ~55–90 LoC
- WebSocket fetch composition helper: ~55–90 LoC
- language-model cleanup helper: ~35–55 LoC
- provider factory integration: ~35–60 LoC
- stream-owner cleanup integration: ~20–30 LoC

Why this approach:
- keeps the existing `createModel()` return API stable
- isolates protocol package composition behind a small deep module
- preserves existing OpenAI fetch behavior instead of naively replacing
fetch
- gives deterministic test seams for enablement and cleanup
- avoids process-wide socket caching, URL validation, fallback retries,
or other speculative complexity

Rejected alternatives:
- **Process-wide cached WebSocket connections**: more latency upside
across separate user messages but requires cache keys, config
invalidation, key rotation handling, and app shutdown cleanup.
Product-code estimate if chosen later: ~180–300 additional LoC.
- **Change `createModel()` to return `{ model, cleanup }`**: explicit
but high-churn across call sites and tests. Product-code estimate:
~120–220 LoC plus broad type/test churn.
- **Implement the WebSocket protocol locally**: maximum control but
duplicates upstream transport behavior and beta protocol maintenance.
Product-code estimate: ~220–400 LoC plus higher maintenance risk.

## Implementation phases

### Phase 0 — Documentation alignment

1. Keep `CONTEXT.md` as the canonical glossary and decision summary for
this feature.
- Preserve the terms **Built-in OpenAI Provider**, **Direct OpenAI API
Key Path**, **OpenAI WebSocket Transport**, and
`webSocketTransportEnabled`.
- If implementation uncovers a domain decision that changes the agreed
semantics, update `CONTEXT.md` in the same change set rather than
leaving the glossary stale.
2. Keep `PRD.md` aligned with the implemented scope.
- It should continue to describe the feature as a non-breaking
provider-config opt-in.
- Update it if implementation materially changes accepted behavior,
package name, acceptance criteria, or dogfooding requirements.
3. Do not create an ADR unless implementation introduces a
hard-to-reverse architectural decision beyond the current per-stream
cleanup-symbol approach.

Quality gate after Phase 0:
- Confirm `CONTEXT.md` and `PRD.md` mention the current package name,
`@vercel/ai-sdk-openai-websocket-fetch`, before implementation begins.
- Confirm later implementation changes do not contradict the glossary or
PRD acceptance criteria.

### Phase 1 — Dependency and schema/status plumbing

1. Add `@vercel/ai-sdk-openai-websocket-fetch` using Bun.
- Use `bun add @vercel/ai-sdk-openai-websocket-fetch` so `package.json`
and lockfile remain consistent.
- Keep the dependency in normal dependencies, not dev dependencies,
because runtime provider creation uses it.
2. Add `webSocketTransportEnabled: z.boolean().optional()` to the
**Built-in OpenAI Provider** config schema.
- Place it near existing OpenAI-only fields such as `serviceTier`,
`defaultModel`, `apiVersion`, and other persisted OpenAI settings.
- Do not add it to request/provider options schemas; this is
intentionally provider config only.
3. Add `webSocketTransportEnabled?: boolean` to provider-status/oRPC
schema output.
- Place it near `wireFormat` and `store` because the settings UI
consumes these together.
4. Surface valid persisted values from the provider service.
- Mirror the `store` boolean pattern: only copy the value into provider
status when `typeof config.webSocketTransportEnabled === "boolean"`.
- Invalid persisted values should be omitted from status rather than
surfaced to UI.

Quality gate after Phase 1:
- Run targeted config/provider tests that cover provider schema and
provider service status.
- Expected tests to extend:
  - provider config schema tests
  - provider status/oRPC schema conformance tests
  - provider service tests for OpenAI-only fields

### Phase 2 — Settings UI control

1. Add the OpenAI provider settings control near Wire Format / Response
storage.
   - Label: **WebSocket transport**.
- Use risk-aware helper copy, e.g. "Experimental: uses OpenAI's
Responses WebSocket transport for streaming Responses API requests.
Unsupported endpoints may fail."
   - Avoid tests that assert exact prose; the prose can evolve.
2. Persist changes through the existing provider config mutation API.
- Enable: set `keyPath: ["webSocketTransportEnabled"]`, `value: true`.
- Disable: prefer setting `value: ""` to remove the field if existing
provider config mutation semantics treat empty string as delete;
otherwise set `false` only if that is the established boolean-toggle
convention. Verify the current `setConfig` behavior before implementing
this detail.
- Optimistically update the local provider config state with the chosen
value so the UI responds immediately.
3. Disable the control while effective OpenAI wire format is Chat
Completions.
- Use the same effective default as the existing Wire Format control:
missing wire format means Responses.
- Preserve the saved `webSocketTransportEnabled` value while disabled.
- Show disabled helper text such as "Only available with Responses wire
format."

Quality gate after Phase 2:
- Run targeted Settings UI tests.
- Verify behavior, not copy:
  - control is visible for the built-in OpenAI provider
  - control persists enable/disable through `setProviderConfig`
  - control is disabled when `wireFormat === "chatCompletions"`
- selecting Chat Completions does not delete the saved WebSocket
preference

### Phase 3 — Deep module: OpenAI WebSocket fetch composition

Create a small node-side helper module for WebSocket transport
composition.

Responsibilities:
1. Accept the existing Mux OpenAI fetch as its base/fallback behavior.
2. Accept an `enabled` boolean that has already applied runtime
eligibility (`webSocketTransportEnabled === true` and effective wire
format is Responses).
3. When disabled, return the original fetch and a no-op close hook.
4. When enabled, create a WebSocket fetch via `createWebSocketFetch()`
and return:
   - a fetch compatible with `createOpenAI({ fetch })`
- a close hook that calls the WebSocket fetch's `.close()` exactly once
5. Preserve existing Mux OpenAI fetch behavior.
   - Existing request shaping/normalization must still run.
- Existing HTTP fallthrough from the WebSocket package should still
benefit from Mux's fetch behavior where possible.
- If preserving the package's HTTP fallthrough requires a wrapper around
global fetch, keep that wrapper local and heavily tested; do not
reimplement the WebSocket protocol.
6. Do not catch WebSocket transport failures to retry over HTTP.
   - Let eligible request failures surface naturally.

Important implementation detail to verify while coding:
- The published package falls through to `globalThis.fetch` for
non-WebSocket requests. If using it directly would bypass Mux's base
fetch for HTTP fallthrough, compose a wrapper so non-eligible requests
still call Mux's base fetch. Keep this wrapper simple and test it with
mocked fetches.

Suggested public interface shape:
- `createOpenAIWebSocketTransportFetch({ enabled, baseFetch }): { fetch:
typeof fetch; close: () => void }`
- The helper should assert that `close` is callable when enabled and
should make cleanup idempotent.

Quality gate after Phase 3:
- Add direct unit tests for the helper using a mocked
`@vercel/ai-sdk-openai-websocket-fetch` package.
- Assert externally observable behavior:
  - disabled returns base-fetch behavior and no-op close
  - enabled delegates eligible requests to the WebSocket fetch
  - non-eligible requests preserve base-fetch behavior
  - close is idempotent and does not throw on repeated calls

### Phase 4 — Deep module: language-model cleanup helper

Create a Mux-owned cleanup helper for provider-created language models.

Responsibilities:
1. Attach cleanup to a model object without changing the provider model
factory return type.
2. Use a private Symbol so the attachment does not collide with AI
SDK/provider fields.
3. Assert the attached cleanup is a function.
4. Run cleanup at most once per model.
5. Swallow/log cleanup exceptions so cleanup failures do not mask the
original stream completion/error.
6. Clear the cleanup after running to avoid retaining closures longer
than necessary.

Suggested public interface shape:
- `attachLanguageModelCleanup(model, cleanup): LanguageModel`
- `runLanguageModelCleanup(model): void`

Quality gate after Phase 4:
- Unit tests for the helper:
  - cleanup runs exactly once
  - repeated cleanup is a no-op
  - models without cleanup are safe
- thrown cleanup errors are handled according to the chosen helper
contract

### Phase 5 — Provider model factory integration

1. In the OpenAI branch, compute runtime eligibility:
   - persisted/provider config `webSocketTransportEnabled === true`
   - effective wire format is Responses
   - no request-level override support
2. Keep existing config-to-provider-options logic for `serviceTier`,
`wireFormat`, and `store` unchanged.
3. Compose the existing OpenAI fetch with the WebSocket helper before
passing `fetch` to `createOpenAI`.
- Do not bypass existing `fetchWithOpenAICodexNormalization` behavior.
- Do not add a special Codex OAuth guard beyond the agreed
Responses-wire-format gating.
   - Do not validate custom base URLs.
4. After creating the model (`provider.responses(modelId)` or
`provider.chat(modelId)`), attach the close hook only when the helper
created an active WebSocket cleanup.
5. Ensure DevTools middleware wrapping does not discard cleanup.
- If cleanup is attached before `wrapLanguageModel`, verify whether
wrapping preserves object identity/metadata.
- If wrapping loses the symbol, attach cleanup after final wrapping, or
copy cleanup from inner to outer model.
- Add a test for the DevTools-enabled path if this is ambiguous during
implementation.

Quality gate after Phase 5:
- Provider model factory tests:
  - Responses + enabled activates WebSocket composition
  - Responses + missing/false setting does not activate it
  - Chat Completions + enabled does not activate it
  - invalid config value is not treated as enabled
  - custom base URL does not prevent activation when enabled + Responses
- Codex OAuth is not specially guarded; the code path follows the same
eligibility rule

### Phase 6 — Stream owner cleanup integration

1. Main streams (`streamManager`): call
`runLanguageModelCleanup(streamInfo.request.model)` or equivalent model
reference in the existing guaranteed cleanup `finally` block.
   - Prefer the actual `LanguageModel` object, not the model string.
   - Run cleanup before deleting stream state.
- Make cleanup safe for retry paths: if a stream is reset for an
internal retry, do not close the WebSocket before the final stream run
completes unless a new stream/model is created.
2. Workspace title/name generation: wrap each candidate's `streamText`
attempt in `try/finally` and call cleanup for that candidate's model.
- Ensure cleanup runs when the model does not call the expected tool and
the loop continues.
- Ensure cleanup runs when `streamText` or `toolResults` throws and the
loop tries the next candidate.
3. Search for any other `streamText` owners using provider-created
models before finalizing.
- Current exploration found main stream manager and workspace title
generation.
   - If new owners appear, apply the same cleanup pattern.

Quality gate after Phase 6:
- Lifecycle tests:
  - main stream completion closes once
  - main stream error closes once
  - main stream cancellation closes once
  - title generation success closes once
  - title generation failure/retry closes once per candidate model
  - internal multi-step/tool-calling stream does not close between steps

### Phase 7 — Validation and full static checks

Run validation in increasing scope:
1. Targeted tests added/modified in phases 1–6.
2. Typecheck.
3. Lint/fmt checks.
4. Full static check if the targeted suite and typecheck pass.

Suggested commands:
- `bun test src/common/config/schemas/providersConfig.test.ts`
- `bun test src/common/orpc/schemas/api.test.ts`
- `bun test src/node/services/providerService.test.ts`
- `bun test src/node/services/providerModelFactory.test.ts`
- `bun test src/node/services/streamManager.test.ts`
- `bun test
src/browser/features/Settings/Sections/ProvidersSection.test.tsx`
- `make typecheck`
- `make lint`
- `make static-check`

Use `run_and_report` when running multiple validation steps in one shell
call, per repo guidance.

## Dogfooding plan

Dogfooding is required before claiming the feature is ready. Live OpenAI
runtime dogfooding is optional if credentials/endpoints are unavailable,
but UI dogfooding should still run.

### Dogfood setup

1. Start an isolated dev-server environment.
- Prefer `make dev-server-sandbox` for web/settings dogfooding so the
run uses an isolated `MUX_ROOT` and free ports instead of the default
`make dev` state.
- Use `make dev-desktop-sandbox` only if Electron-specific desktop
behavior must be verified.
2. Configure a test OpenAI provider.
- If a real OpenAI API key is available, use it for live streaming
verification.
- If not, use deterministic UI-only dogfooding plus automated
tests/mocks for runtime behavior.
3. Use browser/Electron automation to open Settings → Providers →
OpenAI.
   - Use `agent-browser` or the repo's Electron automation helper.

### Dogfood scenarios

1. **Default state**
   - Confirm WebSocket transport is shown as disabled/off by default.
   - Screenshot: OpenAI settings default state.
2. **Enable in Responses mode**
   - Ensure Wire Format is Responses.
   - Enable WebSocket transport.
   - Confirm the UI persists the setting after refresh/reopen.
   - Screenshot: enabled setting in Responses mode.
3. **Chat Completions gating**
   - Switch Wire Format to Chat Completions.
- Confirm the WebSocket control is disabled while the saved preference
remains preserved.
   - Screenshot: disabled control in Chat Completions mode.
4. **Return to Responses**
   - Switch Wire Format back to Responses.
- Confirm the previously saved WebSocket preference reappears as
enabled.
   - Screenshot: restored enabled setting.
5. **Live stream, if credentials are available**
   - Send a short prompt with an OpenAI Responses model.
- Confirm the stream completes or a WebSocket endpoint/proxy failure
surfaces clearly without automatic HTTP fallback.
- Interrupt/cancel one stream and then start another to check cleanup
does not block subsequent streams.
- Record a short video covering enable → prompt → stream/visible failure
→ Chat Completions disablement.

### Dogfood artifacts

Attach or save:
- screenshots for default, enabled, Chat Completions-disabled, and
restored states
- a short video recording for the end-to-end UI flow
- notes on whether live OpenAI credentials were available and whether
runtime streaming was verified live or by automated mocks only

## Acceptance criteria

- Existing users see no behavior change unless
`webSocketTransportEnabled` is explicitly set true.
- Provider config accepts optional boolean `webSocketTransportEnabled`
for the **Built-in OpenAI Provider**.
- Provider status exposes valid boolean values and omits invalid
persisted values.
- OpenAI settings UI exposes the control near Wire Format with
risk-aware helper copy.
- UI disables the control for Chat Completions and preserves the saved
value.
- Runtime WebSocket activation requires `webSocketTransportEnabled ===
true` and effective Responses wire format.
- Runtime does not validate custom base URLs for WebSocket support.
- Runtime does not retry eligible WebSocket failures over HTTP.
- Existing OpenAI fetch behavior is preserved around the WebSocket
composition seam.
- WebSocket resources close on stream completion, error, and
cancellation for all provider-created-model stream owners.
- Automated tests cover config/status, settings UI, provider factory
activation/gating, helper behavior, and cleanup lifecycle.
- Dogfooding produces screenshots and, when feasible, a video recording.

## Risks and mitigations

- **Risk: WebSocket package HTTP fallthrough bypasses Mux fetch
wrappers.**
- Mitigation: test the composition helper with mocked eligible and
non-eligible requests; ensure non-eligible/fallthrough paths use the Mux
base fetch.
- **Risk: cleanup symbol is lost when models are wrapped by DevTools
middleware.**
- Mitigation: attach cleanup to the final returned model or explicitly
preserve/copy cleanup through wrapping; add a focused test if needed.
- **Risk: cleanup runs too early during AI SDK multi-step streams.**
- Mitigation: run cleanup only in outer stream-owner `finally`, not
inside fetch response completion per step.
- **Risk: cleanup misses title generation or future stream owners.**
- Mitigation: search all `streamText` call sites that use
provider-created models and add a helper usage pattern; consider a short
code comment at the helper call explaining the invariant.
- **Risk: UI tests become tautological.**
  - Mitigation: test behavior and state changes rather than exact prose.
- **Risk: optional live dogfood cannot run without credentials.**
- Mitigation: make live streaming dogfood optional, but require
automated mocked runtime tests and UI screenshots.

## Handoff notes for implementation

- Keep changes surgical; do not refactor unrelated provider config or
settings UI code.
- Prefer small deep modules over spreading package-specific logic
through provider factory and stream owners.
- Use defensive assertions in the helper modules for impossible
assumptions, especially cleanup function type and idempotent close
state.
- Do not add request-level
`muxProviderOptions.openai.webSocketTransportEnabled` support in this
iteration.
- Do not add an ADR unless the implementation discovers a
hard-to-reverse architectural choice not covered by this plan.

</details>

---

_Generated with `mux` • Model: `openai:gpt-5.5` • Thinking: `high` •
Cost: `$71.27`_

<!-- mux-attribution: model=openai:gpt-5.5 thinking=high costs=71.27 -->

v0.24.0

Toggle v0.24.0's commit message
release: v0.24.0

v0.23.3-nightly.22

Toggle v0.23.3-nightly.22's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
🤖 fix: stop scroll-up jitter at bottom + harden auto-scroll ownership (

…coder#3226)

## Summary

Fixes the small-but-noticeable jitter when the user starts scrolling up
from the very bottom while the chat transcript bottom-lock is engaged.
Eventually a large enough wheel/touch delta would "win" against the rAF
settle tick, but the first 1–3 notches of a slow gesture used to be
snapped back to the bottom — felt like the scrollport was fighting the
user.

## Background

`useAutoScroll.handleScroll`'s user-intent branch decided lock state
from a single 8 px geometric threshold. A small wheel notch (typical
mousewheel notch is 3–7 px) landed within that window, so the hook
re-engaged the lock and the 60-frame rAF settle loop wrote `scrollTop =
scrollHeight − clientHeight` on the next frame, snapping the user back
to the bottom. The user could only "win" by accumulating a single tick
of more than 8 px before the next rAF.

## Implementation

Two commits, both behavior-preserving outside the targeted regression
and validated against the existing 22 hook unit tests + the
`bottomLayoutShift` integration suite.

### Commit 1 — `fix: stop scroll-up jitter from the very bottom`

Asymmetric thresholds in `handleScroll`'s user-intent branch:

- **Locked → release** on > 1 px drift (`BOTTOM_LOCK_EPSILON_PX`). The
very first wheel-up notch releases without rAF snap-back.
- **Released → relock** only when the user is moving **toward** the
bottom (`currentScrollTop > previousScrollTop`) AND within 8 px
(`USER_BOTTOM_RELOCK_THRESHOLD_PX`). Direction is tracked with a single
new ref (`lastScrollTopRef`) updated at the top of every `handleScroll`
call.

The no-intent paths (1 px drift correction while locked, geometric
relock at 8 px after the intent window expires) are unchanged. The
existing "scroll back to bottom and the lock re-engages" UX is
preserved.

### Commit 2 — `fix: harden auto-scroll user-intent ownership`

Audit follow-ups (best-of-5 read-only audit converged on these):

1. **Filter delta-0 wheel events.** Cmd-wheel zoom on macOS, Shift-wheel
for horizontal-only, Bluetooth-mouse jitter, and pinch gestures all
dispatch `wheel` events with `deltaY === 0` (and often `deltaX === 0`).
Without filtering, every phantom wheel cleared `programmaticDisableRef`
and refreshed the 750 ms intent window, weakening every downstream gate
that relied on those refs. New `handleScrollContainerWheel` is exposed
by the hook; ChatPane wires it in place of `markUserScrollIntent`.

2. **Seed `lastScrollTopRef`** inside `disableAutoScroll` and
`jumpToBottom`. The released-branch direction check compares `scrollTop`
against this ref, but neither path always emits a scroll event
(`disableAutoScroll` never does; `jumpToBottom` skips the write when
`scrollTop` is already max). Without the seed, a small wheel-up notch
following an explicit programmatic disable could be misread as "moving
toward bottom" (e.g. `895 > 0`) and spuriously relock the lock that was
just disabled.

## Validation

- `bun test src/browser/hooks/useAutoScroll.test.tsx` — 25 / 25 pass (19
prior + 6 new regression tests covering the four scenarios above).
- `bun x jest tests/ui/chat/bottomLayoutShift.test.ts` — passes
(drift-correction / pin-on-resize / send / workspace-switch contracts
unchanged).
- `make static-check` — passes locally end-to-end.

## Risks

- Behavior near the 1 px drift epsilon under hi-DPI / browser zoom is
unchanged from before; `BOTTOM_LOCK_EPSILON_PX` was already used for the
no-intent drift correction. The fix uses the same value for the
locked-intent release path, so any pre-existing subpixel sensitivity is
consistent across paths.
- The wheel filter will not mark intent on a wheel event with both
deltas equal to 0 — by design. Users on assistive input devices that
emit `deltaY = 0` but expect intent marking are unaffected because such
events also do not move the scrollport, and our intent window only
matters when scroll motion follows.
- `lastScrollTopRef` seeding is purely additive — every code path that
writes to it before now still writes to it now; we just close two narrow
staleness windows (`disableAutoScroll` with no follow-up scroll event,
`jumpToBottom` when already at bottom).

## Pains

The audit phase (5 read-only sub-agents in parallel) was the right call
here: 4 of 5 audits independently flagged Findings coder#1
(`programmaticDisableRef` bypass) and coder#2 (`lastScrollTopRef`
cold-start), which I would have likely missed reasoning forward from the
initial fix alone. The audits also agreed that the new direction-aware
logic is the right primitive — none recommended walking it back.

Deferred to follow-up PRs (out of scope for "scroll-up jitter"):

- Workspace-switch hydration race (`hasLoadedTranscriptRows` flip
mid-read snaps to bottom).
- ResizeObserver disconnect/reconnect on every `autoScroll` toggle.
- Tab key not in `TRANSCRIPT_SCROLL_KEYS` (keyboard-nav focus-induced
scroll snaps back).
- Parallel patterns in `OutputTab` / `BashToolCall` / `InitMessage`
(different sub-views, each with its own bottom-lock heuristic).

---

_Generated with `mux` • Model: `anthropic:claude-opus-4-7` • Thinking:
`max` • Cost: `$11.30`_

<!-- mux-attribution: model=anthropic:claude-opus-4-7 thinking=max
costs=11.30 -->

v0.23.3-nightly.19

Toggle v0.23.3-nightly.19's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
🤖 perf: smooth text streaming (kill cascade re-renders, model-aware r…

…eveal) (coder#3219)

## Summary

Streamed assistant text (and reasoning) was visibly jittery — periodic
catch-up jumps every few seconds, rate stuck at ~72 chars/sec regardless
of what the model emitted, and a sub-frame of work for the entire chat
list on every delta. This PR makes the cadence smooth in three ordered
fixes plus a TPS-display fix discovered during review: leaf-subscribe
the streaming-stats pill so it stops invalidating `WorkspaceState`,
replace the smoothing engine's hard-snap with a model-aware soft
catch-up, compact streaming parts on append, and floor the TPS
calculator's time span so a new stream's first deltas don't spike the
displayed rate.

## Background

The renderer has had a two-clock smoothing model (`SmoothTextEngine` +
`useSmoothStreamingText`) for a while, but several regressions defeated
it:

1. `WorkspaceState.streamingTokenCount` / `streamingTPS` were computed
inside the `getWorkspaceState` snapshot using `Date.now()`. Every
coalesced delta produced a new snapshot reference, which cascaded
`WorkspaceShell → ChatPane → MessageRenderer` through every row.
`useDeferredValue` was bypassed for the entire stream by
`shouldBypassDeferredMessages`, so reconciliation ran at the ingestion
rate.
2. `getAdaptiveRate(backlog)` ignored the model's actual emission rate.
With a fast model (~120 cps) and `BASE_CHARS_PER_SEC=72`, the visible
cursor fell behind by ~5 chars per ingestion cycle until backlog crossed
`MAX_VISUAL_LAG_CHARS=120`, at which point `enforceMaxVisualLag` snapped
`visible := full - 120` and zeroed the budget — that snap is exactly the
visible "catch-up jump".
3. `requestIdleCallback({ timeout: 100 })` was used for streaming
deltas. The smoothing engine should be the only pacing layer; idle
batching just feeds (2).
4. `handleStreamDelta` appended a fresh `{ type: "text" }` part per
chunk; `mergeAdjacentParts` re-merged on every render. For a 10k-char
reply that's tens of thousands of merges per turn.
5. `calculateTPS` divided by `now - firstDelta.timestamp`. With one
delta that span is typically a few milliseconds, so e.g. `50 tokens /
0.005s = 10000 t/s`. Phase 1's microtask cadence exposed this — where
the prior idle-callback batching used to mask it by sampling later — and
Phase 2 wired TPS into the smoothing engine, amplifying its visibility.

## Implementation

Four commits, ordered so each phase is verifiable in isolation:

**Phase 1 — leaf-subscribe streaming stats, microtask ingestion
(`775e9023c`)**

- Removed `streamingTokenCount` / `streamingTPS` from `WorkspaceState`.
- Added `WorkspaceStreamingStats` + `streamingStatsStore` (`MapStore`) +
`useWorkspaceStreamingStats(workspaceId)` leaf hook (mirrors the
existing `useWorkspaceStatsSnapshot` pattern at
`WorkspaceStore.ts:4127`).
- Replaced `scheduleIdleStateBump` with `scheduleStreamingStateBump` for
streaming delta types (`stream-delta`, `tool-call-delta`,
`reasoning-delta`). It coalesces on `queueMicrotask` instead of an idle
callback. `init-output` and `bash-output` keep the idle path
(terminal-style throughput).
- Wired `cancelPendingStreamingBump` into stream-end / stream-abort /
replay reset / `removeWorkspace`.
- `StreamingBarrier` now reads via the leaf hook.

**Phase 2 — model-aware smoothing engine, soft catch-up (`85fb141da`)**

- `SmoothTextEngine.update()` accepts an optional `liveCharsPerSec`.
`getAdaptiveRate(backlog, liveCps)` combines a steady-state floor
(`max(BASE, liveCps)`), a soft catch-up ramp that drains lag over
`SOFT_CATCHUP_DRAIN_MS` once it exceeds `SOFT_CATCHUP_LAG_CHARS=60`, and
the legacy backlog-pressure ramp (kept as upper bound).
- Replaced the hard-snap discontinuity with the soft ramp.
`MAX_VISUAL_LAG_CHARS` is now 1024 (was 120) — a defensive safety net
for paused-tab pathological bursts that normal streams never hit.
- Bumped `MIN_FRAME_CHARS` from 1 to 2 so reveals coalesce to ~30 Hz at
the BASE rate (half the markdown re-parse cost; humans can't see the
difference). Tail-end reveal still works because the gate is now
`min(MIN_FRAME_CHARS, backlog)`.
- `useSmoothStreamingText` and `TypewriterMarkdown` thread
`liveCharsPerSec` through; `TypewriterMarkdown` accepts a new
`workspaceId` prop, forwarded from `AssistantMessage` and
`ReasoningMessage` (via `MessageRenderer`).

**Phase 3 — compact-on-append, clean prop surface (`0a945ed7b`)**

- `StreamingMessageAggregator.handleStreamDelta` /
`handleReasoningDelta` append into the previous adjacent text/reasoning
part in place. For a 10k-char reply this drops `parts.length` from
thousands to one and `mergeAdjacentParts` cost from O(N) to O(1).
Backend persistence (`partial.json`, `chat.jsonl`) is unaffected — those
writers live backend-side; this aggregator's `parts` is pure display
state.
- `TypewriterMarkdown`: dropped the `deltas: string[]` shape (always
passed as `[content]` literal — defeated `React.memo`) for `content:
string`. Removed the manual `React.memo` and the inner `useMemo` for the
streaming-context value (React Compiler handles both).

**Phase 4 — TPS calculator floor + stream-error token cleanup
(`a476613be`)**

- `calculateTPS` now floors the divisor at `MIN_TPS_TIME_SPAN_MS =
1000`. With one delta the rate becomes `tokens / 1s` instead of `tokens
/ 0.005s`. The reported TPS smoothly ramps up over the first second of a
stream instead of spiking and "dropping abruptly". Slight
under-statement during the settling window is the trade-off — strictly
preferable to an order-of-magnitude over-statement.
- The `stream-error` branch in `applyWorkspaceChatEventToAggregator` now
calls `clearTokenState`, matching `stream-end` and `stream-abort`.
Without it, the errored message's `deltaHistory` entry leaks into a
follow-up stream's TPS calculation.

## Validation

- `make typecheck` ✅
- `make lint` ✅
- Targeted streaming surface: 1009+ tests pass / 0 fail across
`SmoothTextEngine`, `useSmoothStreamingText`,
`StreamingMessageAggregator`, `applyWorkspaceChatEventToAggregator`,
`StreamingTPSCalculator`, `TypewriterMarkdown`, `ReasoningMessage`,
`StreamingBarrier{,View}`, `PinnedTodoList`, `WorkspaceStore`, plus the
broader `src/browser/utils/messages/`, `src/browser/features/Messages/`,
`src/browser/stores/`, and `src/browser/hooks/` suites.
- New behavioral tests:
- `SmoothTextEngine.test.ts`: rate tracks `liveCharsPerSec`; soft
catch-up engaged for 60–1024 char lags without snap; hard snap still
fires above the safety threshold.
- `StreamingTPSCalculator.test.ts`: 1s floor applied for tiny / zero
spans; raw span used once it exceeds the floor; negative spans (clock
skew) return 0.
- `applyWorkspaceChatEventToAggregator.test.ts`: `stream-error` calls
`clearTokenState`.

## Risks

Localized to the streaming display path; no protocol or persistence
changes.

- **Re-render shape (Phase 1).** Streaming deltas now bump
`WorkspaceState` once per microtask drain instead of once per
`requestIdleCallback`. Net effect under heavy load is *less* work
because the snapshot stops invalidating per-delta TPS, but it's a
behavioral shift — verified via the existing 106-test `WorkspaceStore`
suite plus targeted `StreamingBarrier` tests.
- **Smoothing engine constants (Phase 2).** `MAX_VISUAL_LAG_CHARS`
jumped 120 → 1024 and `MIN_FRAME_CHARS` 1 → 2. Existing test "caps
visual lag when incoming text jumps ahead" still passes against the new
soft-ramp behavior, and the new "hard-snaps when lag exceeds the safety
threshold" test confirms the safety net still functions.
- **Compact-on-append (Phase 3).** Touches the in-memory `parts` array
shape during streaming. The aggregator already had compaction at
stream-end (`compactMessageParts`); we're just doing it eagerly. No
on-disk format change. All `StreamingMessageAggregator` and
`applyWorkspaceChatEventToAggregator` tests pass.
- **TPS floor (Phase 4).** The reported rate during the first second of
a stream now under-counts versus the previous (mathematically broken)
value. Backend `sessionTimingService` also calls `calculateTPS`; same
floor applies there but the backend's window is broader so the visible
effect is smaller. No risk to persisted usage / cost calculations —
those use `usage.outputTokens / duration` from the API, not the
streaming TPS estimator.

---

_Generated with `mux` • Model: `anthropic:claude-opus-4-7` • Thinking:
`xhigh` • Cost: `$23.55`_

<!-- mux-attribution: model=anthropic:claude-opus-4-7 thinking=xhigh
costs=23.55 -->

v0.23.3-nightly.14

Toggle v0.23.3-nightly.14's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
🤖 refactor: auto-cleanup (coder#3169)

## Summary

Long-lived auto-cleanup PR that accumulates low-risk,
behavior-preserving refactors picked from recent `main` commits.

## Changes

### Use shared `isAbortError` utility in AuthTokenModal

Replace two inline `error instanceof DOMException && error.name ===
"AbortError"`
checks in `AuthTokenModal.tsx` with the existing shared `isAbortError()`
utility
from `@/browser/utils/isAbortError`, deduplicating the abort detection
logic.

### Extract `extractChunkDeltaText` helper to deduplicate advisor chunk
parsing

Pull the repeated `switch` over `chunk.type` (extracting
`chunk.textDelta` or
`chunk.argsTextDelta`) into a single `extractChunkDeltaText()` helper in
`advisorService.ts`, then call it from both `executeAdvisorStream` and
`executeAdvisorStreamWithRetry`.

### Remove unnecessary exports from `skillFileUtils`

Un-export `parseSkillFile`, `serializeSkillFile`, and `SKILL_FILENAME`
from
`src/node/services/agentSkills/skillFileUtils.ts` — all three are only
used
within the same file, so the `export` keyword was unnecessary.

### Remove dead `getCancelledCompactionKey` storage helper

Remove the `getCancelledCompactionKey` function and its entry in the
`EPHEMERAL_WORKSPACE_KEY_FUNCTIONS` array from `storage.ts` — the only
consumer
(`useResumeManager.ts`) was deleted, leaving this as dead code.

### Remove dead `quickReviewNotes` module

Remove `src/browser/utils/review/quickReviewNotes.ts` and its test file
(482 lines). The `buildQuickLineReviewNote` and
`buildQuickHunkReviewNote`
functions were never imported by any production code since their
introduction
in PR coder#2448.

### Un-export `isBashOutputTool` in messageUtils

Remove the `export` keyword from `isBashOutputTool` in
`src/browser/utils/messages/messageUtils.ts` — the type guard is only
used within the same file by `computeBashOutputGroupInfos`, so the
export was unnecessary.

### Deduplicate `hasErrorCode` in submoduleSync

Replace inline `NodeJS.ErrnoException`-like error-code checks in
`submoduleSync.ts` with calls to the existing `hasErrorCode` helper,
keeping a single canonical place where error-code narrowing lives.

### Simplify `hasCompletedDescendants` to reuse
`listCompletedDescendantAgentTaskIds`

Rewrite `hasCompletedDescendants` to delegate to the existing
`listCompletedDescendantAgentTaskIds` helper instead of re-implementing
the
traversal, collapsing the two code paths into one.

### Reuse `anthropicSupportsNativeXhigh` in Anthropic fetch wrapper

Replace the duplicated Opus 4.7+ regex inside
`wrapFetchWithAnthropicCacheControl`
(src/node/services/providerModelFactory.ts)
with a call to the existing `anthropicSupportsNativeXhigh` helper from
`src/common/types/thinking.ts`. The helper already performs the same
regex
check plus provider-prefix normalization (e.g.,
`anthropic/claude-opus-4-7` via the `ai-model-id` gateway header),
keeping the
wire-level detection and the policy-level detection in one place.

### Extract `getFetchInputUrl` helper to deduplicate URL extraction

The OpenAI/Codex and Copilot fetch wrappers in `providerModelFactory.ts`
each
contained an identical 15-line IIFE that extracted a URL string from the
`fetch` `input` argument (handling string, `URL`, and `Request`-like
shapes).
Extract the logic into a single `getFetchInputUrl` helper so both
wrappers
share one implementation. Behavior-preserving: the helper returns the
same
empty-string fallback on unrecognized inputs, so callers continue to
fall
through to normal fetch behavior without throwing.

### Extract `clonePersistedToolModelUsage` helper in streamManager

The deep-clone pattern for `PersistedToolModelUsage` (spread event,
fresh
`usage` object, conditional `providerMetadata`) was duplicated between
`recordToolModelUsage` and the stream-end tool-usage snapshot in
`streamManager.ts`. Extract a single file-local helper so both sites
share
the same implementation. Behavior-preserving: both callsites continue to
produce structurally identical clones.

### Reuse `getClosestTranscriptAncestor` in
`getTranscriptContextMenuLink`

The new `getTranscriptContextMenuLink` helper (added in coder#3188) inlined
the same "resolve event target → `element.closest(selector)` → require
both to stay within the transcript root" pattern that
`getClosestTranscriptAncestor` — defined a few lines above in the same
file — already implements. Delegate to the shared helper so the
null/contains guards live in one place. Behavior-preserving: the
helper returns null for a null/outside-root target, then
`element.closest("a[href]")`, then null again if the anchor is outside
the transcript root — identical to the previous inline checks. All 22
`transcriptContextMenu` tests continue to pass.

### Remove duplicate `gpt-5.5-pro` thinking-policy test

When coder#3192 renamed `gpt-5.4-pro` → `gpt-5.5-pro` across
`src/common/utils/thinking/policy.test.ts`, it accidentally introduced a
third `returns medium/high/xhigh for gpt-5.5-pro` test that is
byte-identical to the renamed first occurrence (the two remaining tests
are the bare-prefix and `with version suffix` variants; the deleted
block
had no version suffix and no gateway prefix). Drop the duplicate so the
suite has one canonical no-suffix test, one mux-gateway test, and one
version-suffix test. Behavior-preserving — `getThinkingPolicyForModel`
coverage for `gpt-5.5-pro` is unchanged; 63 / 63 tests in
`policy.test.ts`
continue to pass.

### Extract `getAppProxyBasePathFromRequestValue` helper in orpc server

The orpc server's public-base-path detection in
`src/node/orpc/server.ts`
repeated the pattern `parsePathnameFromRequestValue(value) →
getAppProxyBasePathFromPathname(...)` across four callsites (forwarded
headers, the `originalUrl` / `url` loop, the referer header, and the
direct app-proxy handler-prefix calculator). Extract a single
`getAppProxyBasePathFromRequestValue` helper that performs the two-step
normalize-then-classify operation, then call it from every site.
Behavior-preserving: each callsite still produces `null` when the value
is absent or yields an invalid pathname, and otherwise returns the same
parsed app-proxy base path. All 52 tests in
`src/node/orpc/server.test.ts` continue to pass.

### Inline `getRoutePathnameForBaseHref` wrapper in orpc server

The new helper added in coder#3195 was a one-line shim that simply renamed
`getPathnameFromRequestUrl(req.url)` to fit the surrounding "for base
href" naming theme. It was used in only two adjacent functions
(`shouldInjectSlashlessRootRedirect` and `getPublicBaseHref`), and the
existing `getPathnameFromRequestUrl` already conveys the intent at the
callsite. Inline both calls so the request-URL → pathname conversion
lives at the points of use, removing one layer of indirection without
changing behavior. All 52 tests in `src/node/orpc/server.test.ts`
continue to pass.

### Remove dead `AdvisorToolResultSchema` definitions

`AdvisorToolResultSchema` and its three constituent schemas
(`AdvisorToolAdviceResultSchema`, `AdvisorToolLimitResultSchema`,
`AdvisorToolErrorResultSchema`) in
`src/common/utils/tools/toolDefinitions.ts`
were introduced alongside the experimental advisor tool in coder#3157 but
were never imported anywhere — neither by `src/common/types/tools.ts`
(which derives the public advisor result shape from a different type
local to `AdvisorToolCall.tsx`), nor by the advisor tool implementation
itself, nor by any test. Unlike the analogous `TaskToolResultSchema` /
`TaskAwaitToolResultSchema` / `TaskApplyGitPatchToolResultSchema` /
`TaskTerminateToolResultSchema` (all of which are imported via
`z.infer` in `src/common/types/tools.ts`), the advisor variant had no
consumer. Drop the four dead schemas; the file shrinks by ~32 lines and
keeps `AdvisorToolInputSchema` (which is imported by `advisor.ts`)
intact. Behavior-preserving.

### Reuse `getProviderPolicy()` in custom-provider `getConfig()` loop

`ProviderService.getConfig()`'s custom-provider branch inlined the same
"if enforced, look up `providerAccess` entry → narrow to
`{ forcedBaseUrl, allowedModels }`" lookup that the existing private
`getProviderPolicy()` helper already implements (and that other
callsites such as `addCustomOpenAICompatibleProvider` use). Replace the
inline lookup with a call to `getProviderPolicy(providerId)` so the
small policy-shape projection lives in one place. Behavior-preserving:
the only structural difference is that, when policy is not enforced,
`getProviderPolicy()` returns `{}` while the inline form passed
`{ forcedBaseUrl: undefined, allowedModels: null }`, but
`buildCustomProviderConfigInfo` normalizes both via
`policy?.forcedBaseUrl ?? resolveConfigBaseUrl(...)` and
`policy?.allowedModels ?? null`, so the resulting `ProviderConfigInfo`
is byte-identical. All 74 tests in `providerService.test.ts` continue
to pass.

### Collapse task-group parent rail offset into shared helper

After coder#3199 introduced `getTaskGroupMemberDepth` and set
`TASK_GROUP_MEMBER_PARENT_RAIL_OFFSET_PX =
SIDEBAR_LEADING_SLOT_CENTER_OFFSET_PX`,
the `task-group-member` branch of `getSubAgentParentRailX` in
`src/browser/components/sidebarItemLayout.ts` reduced to
`getSidebarLeadingSlotCenterX(depth)`. Replace the inline
`getSidebarItemPaddingLeft(depth) +
TASK_GROUP_MEMBER_PARENT_RAIL_OFFSET_PX`
arithmetic with a call to the existing helper and drop the now-redundant
constant, leaving the leading-slot center offset defined exactly once.
Behavior-preserving: `getSubAgentParentRailX` still returns `38` at
`memberDepth = 2.5`, matching the pinned values in
`sidebarItemLayout.test.ts` (and the equivalent
`getSubAgentChildStatusCenterX` result). All 40 tests in
`sidebarItemLayout.test.ts`, `AgentListItem.test.tsx`, and
`ProjectSidebar.test.tsx` continue to pass.

### Remove unnecessary exports from inline-skill utilities

Un-export four interfaces in the new inline-skill helper files added in
coder#3204 — `InlineSkillSuggestionContext` and
`InlineSkillSuggestionRefreshContext` in
`src/browser/utils/agentSkills/inlineSkillSuggestions.ts`, plus
`InlineSkillCursorMatch` and `InlineSkillResolveOptions` in
`src/browser/utils/agentSkills/inlineSkillReferences.ts`. All four are
only used as parameter types within their defining files: the test
files import the value functions and pass object-literal arguments,
and the consumer call-sites in `ChatInput/index.tsx` only import the
exported functions, never the parameter type names. So the `export`
keyword was unnecessary. Behavior-preserving and type-only — TypeScript
compile passes for both browser and main configs, and the 49 tests in
`inlineSkillSuggestions.test.ts` and `inlineSkillReferences.test.ts`
continue to pass.

> The earlier "sync thinking-policy doc comments with gpt-5.5 regex"
> cleanup was dropped during rebase: coder#3192 superseded it by retiring
> `gpt-5.4` from those comments entirely, so the comment-only diff
> became redundant.

> The earlier "reuse `hasNonEmptyString` helper for apiKey checks"
> cleanup was dropped during rebase: coder#3202 restructured
> `resolveProviderCredentials` to delegate to a new
> `resolveApiKeyCandidate` helper (subsuming the inline check) and
> already updated `hasAnyConfiguredProvider` to use `hasNonEmptyString`
> directly, so the cleanup diff no longer applied cleanly and was no
> longer needed.

### Replace stale `system-1` reference in telemetry comment

The `ExperimentOverriddenPayload.experimentId` JSDoc in
`src/common/telemetry/payload.ts` used `'system-1'` as an example
experiment ID, but the System 1 feature was removed wholesale in
coder#3207 and that experiment ID no longer exists. Swap the example for
a current entry from `EXPERIMENT_IDS` (`'agent-browser'`) so the
JSDoc points readers at a real experiment. Behavior-preserving —
comment-only change.

### Extract `isLightThemeMode` helper for Shiki theme detection

Three callsites independently encoded
`themeMode === "light" || themeMode.endsWith("-light")` to map a
theme-mode string (including namespaced variants like `flexoki-light`)
to the light Shiki theme:

- `highlightDiffChunk.ts` had a private `isLightTheme(theme: ThemeMode)`
  helper.
- `HighlightedCode.tsx` and `MarkdownComponents.tsx` had it inline (the
  latter with an intermediate `isLight` local).

Promote the predicate to `isLightThemeMode` in
`src/browser/utils/highlighting/shiki-shared.ts` (next to
`SHIKI_DARK_THEME` / `SHIKI_LIGHT_THEME` and `mapToShikiLang`) and route
all three callsites through it. The suffix convention now has a single
source of truth for the light/dark mapping. Behavior-preserving.

### Remove unnecessary exports from `fileRead`

After coder#3208 removed the file explorer / file viewer flow, the only
external consumers of `src/browser/utils/fileRead.ts` are
`ImmersiveReviewView` (`buildReadFileScript`, `processFileContents`)
and the colocated test (`buildReadFileScript`,
`EXIT_CODE_TOO_LARGE`, `processFileContents`).

Un-export the helpers that are now only used inside the module itself
(`MAX_FILE_SIZE`, `shellEscape`, `base64ToUint8Array`,
`detectImageType`, `detectSvg`, `detectBinary`, `parseReadFileOutput`)
so the module surface accurately reflects its public API.
Behavior-preserving.

Auto-cleanup checkpoint: d1c0109

---

_Generated with `mux` • Model: `anthropic:claude-opus-4-7` • Thinking:
`xhigh`_

<!-- mux-attribution: model=anthropic:claude-opus-4-7 thinking=xhigh -->

---------

Co-authored-by: mux-bot[bot] <264182336+mux-bot[bot]@users.noreply.github.com>
Co-authored-by: Mux <noreply@coder.com>
Co-authored-by: mux-bot <mux-bot@coder.com>
Co-authored-by: ammar-agent <ammar+ai@ammar.io>

v0.23.3-nightly.4

Toggle v0.23.3-nightly.4's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
🤖 fix: align variant sub-agent connectors (coder#3199)

## Summary

Fixes sidebar connector alignment for expanded variants/best-of
sub-agent groups by rendering grouped members on the same indentation
grid as the task-group header icon.

## Background

Grouped sub-agents use a task-group header row with an extra disclosure
chevron before the group icon. Expanded members were only indented one
depth level below that header, which left the parent-to-child connector
rail visually offset from the grouped parent row.

## Implementation

- Added a shared task-group member depth helper in the sidebar layout
utilities.
- Applied that helper when rendering expanded task-group members from
`ProjectSidebar`.
- Added tests that assert grouped member depth/layout propagation and
rendered connector geometry.

## Validation

- `bun test src/browser/components/sidebarItemLayout.test.ts
src/browser/components/AgentListItem/AgentListItem.test.tsx
src/browser/components/ProjectSidebar/ProjectSidebar.test.tsx`
- `make typecheck`
- `make fmt-check`
- `make lint`
- `make static-check`

## Risks

Low-to-medium risk, scoped to left-sidebar task-group sub-agent row
indentation and connector geometry.

---

_Generated with `mux` • Model: `openai:gpt-5.5` • Thinking: `high` •
Cost: `$8.04`_

<!-- mux-attribution: model=openai:gpt-5.5 thinking=high costs=8.04 -->

v0.23.3-nightly.2

Toggle v0.23.3-nightly.2's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
🤖 fix: increase advisor question limit (coder#3200)

## Summary

Increase the advisor tool `question` input limit from 500 to 2000
characters so agents can include enough context for strategic tradeoff
questions while keeping the field bounded.

## Background

The advisor tool is meant for planning ambiguity and architectural
decisions, where a short one-line prompt can omit important constraints.
The previous 500-character cap was tighter than needed for a compact
brief.

## Implementation

Updated the shared advisor tool input schema to allow up to 2000
characters and documented why the bound is intentionally roomier than
before.

## Validation

- `make static-check`

## Risks

Low. This only changes validation for advisor tool input length; the
field remains bounded and the added context is small relative to the
advisor transcript.

---

_Generated with `mux` • Model: `openai:gpt-5.5` • Thinking: `xhigh` •
Cost: `$0.39`_

<!-- mux-attribution: model=openai:gpt-5.5 thinking=xhigh costs=0.39 -->

v0.23.3-nightly.0

Toggle v0.23.3-nightly.0's commit message
release: v0.23.2

v0.23.2

Toggle v0.23.2's commit message
release: v0.23.2

v0.23.2-nightly.9

Toggle v0.23.2-nightly.9's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
🤖 bench: use GPT-5.5 for tbench (coder#3193)

> Mux working on behalf of Mike.

## Summary
Updates nightly Terminal-Bench defaults to run Opus 4.7 at xhigh
thinking and GPT-5.5 at high thinking while dropping the older GPT Codex
model from the default matrix. Adds leaderboard metadata for Opus 4.7
and GPT-5.5, and refreshes TBench workflow and skill examples.

## Background
GPT-5.5 xhigh runs were timing out in TBench, so the nightly workflow
keeps GPT-5.5 at high while preserving xhigh for Opus 4.7.

## Validation
- `make static-check`
- `python3 -m py_compile
benchmarks/terminal_bench/prepare_leaderboard_submission.py`
- `go run github.com/rhysd/actionlint/cmd/actionlint@v1.7.7
.github/workflows/nightly-terminal-bench.yml
.github/workflows/terminal-bench.yml`
- `/home/coder/.local/bin/uvx ruff format --check
benchmarks/terminal_bench/prepare_leaderboard_submission.py`
- `git diff --check`

---

_Generated with `mux` • Model: `openai:gpt-5.5` • Thinking: `xhigh` •
Cost: `$16.42`_

<!-- mux-attribution: model=openai:gpt-5.5 thinking=xhigh costs=16.42
-->