Skip to content

fix(sdk): widen MessageBuilder truncation candidate set and tie budget to maxInputTokens (CLINE-2191)#10747

Open
robinnewhouse wants to merge 3 commits into
robin/cline-2185-compaction-cant-shrink-in-flight-turnfrom
robin/cline-2191-widen-messagebuilder-truncation-candidate-set
Open

fix(sdk): widen MessageBuilder truncation candidate set and tie budget to maxInputTokens (CLINE-2191)#10747
robinnewhouse wants to merge 3 commits into
robin/cline-2185-compaction-cant-shrink-in-flight-turnfrom
robin/cline-2191-widen-messagebuilder-truncation-candidate-set

Conversation

@robinnewhouse
Copy link
Copy Markdown
Contributor

@robinnewhouse robinnewhouse commented May 14, 2026

Related Issue

Issue: CLINE-2191 — Widen MessageBuilder truncation candidate set (Layer A)
Stacked on: #10740 — please merge that first.
Sibling follow-up: CLINE-2192 — Absolute hard guarantee (Layer B). This PR does NOT guarantee the request fits.

Description

After #10739 (allowlist) and #10740 (protected-tail trim), the post-compaction preservation set inside the in-flight turn still contains the typed user prompt, the last assistant message, and any in-flight tool_use verbatim. Each can individually exceed any context window. MessageBuilder.buildForApi already has the middle-truncation infrastructure (truncateMiddleByChars, truncateMiddleToBytes, an aggregate-budget pass at truncateToTotalTextBudget). It just doesn't see enough block types and runs against a hardcoded 6 MB cap rather than the model's actual maxInputTokens.

This PR is the preventive layer — the typical over-budget request stops being a 400 and becomes a truncated-but-shipped request. The brick-wall guarantee for adversarial inputs is sibling CLINE-2192, which stacks on top of this one. We are not over-selling: this PR does NOT yet meet the "Ever EVER" bar.

Changes

In sdk/packages/core/src/session/services/message-builder.ts:

  1. collectTruncationCandidates widened. In addition to the existing tool_result candidates it now collects user text, assistant text, thinking block text, and top-level file block content. redacted_thinking is skipped (placeholder string). tool_use blocks are skipped — a JSON-aware structural truncator that avoids corrupting tool_use_id or breaking the input JSON shape is Layer B's responsibility.
  2. Aggregate budget tied to maxInputTokens. buildForApi(messages, { maxInputTokens }) now derives the cap from the model's actual maxInputTokensCHARS_PER_TOKEN = 3). When maxInputTokens is not provided the constructor's hardcoded 6 MB default still applies, so legacy callers behave identically.
  3. Deterministic sort. The largest-first sort gains an insertion-order tiebreaker so equal-byte candidates always truncate in the same order. Layer B relies on the same property.

In sdk/packages/core/src/runtime/orchestration/session-runtime-orchestrator.ts:

  • createRuntimeHooks and createRuntimePrepareTurn now thread modelInfo.maxInputTokens through prepareMessagesForModelRequestprepareProviderMessagesForApibuildForApi. The cap reflects the model that is actually being called, not a 6 MB approximation.

In sdk/packages/shared/src/llms/tokens.ts + index.ts + index.browser.ts:

  • CHARS_PER_TOKEN was previously private. Exported alongside the existing estimateTokens.

Test Procedure

Seven new cases in sdk/packages/core/src/session/services/message-builder.test.ts:

  1. truncates user text blocks under the aggregate budget (CLINE-2191) — 5 MB user text, 500 KB budget, asserts truncation marker and final size.
  2. truncates assistant text and thinking blocks under the aggregate budget (CLINE-2191) — 2 MB text + 1 MB thinking, 250 KB budget, asserts both truncated.
  3. truncates top-level file blocks under the aggregate budget (CLINE-2191) — 4 MB file block, 200 KB budget.
  4. skips tool_use input bodies (CLINE-2191, deferred to Layer B) — 4 MB tool_use.input.body stays untouched. Doc-test for the deferral.
  5. derives the aggregate budget from maxInputTokens when provided (CLINE-2191)buildForApi(messages, { maxInputTokens: 100_000 }) → 300 000 byte cap.
  6. falls back to the constructor default budget when maxInputTokens is absent (CLINE-2191) — legacy callers unchanged.
  7. produces deterministic output for equal-byte-length candidates (CLINE-2191) — two equal-byte candidates → byte-identical output across two runs.

Verification:

sdk/packages/core $ bun run typecheck:smoke   # clean
sdk/packages/core $ bun run test:unit -- message-builder.test
  ✓ 16 tests passed (9 existing + 7 new)
sdk/packages/core $ bun run test:unit
  Test Files  102 passed | 1 skipped (103)
       Tests  920 passed | 4 skipped (924)

What this PR still does NOT fix (intentional, deferred to CLINE-2192)

  • tool_use.input truncation. JSON-aware structural truncator that drills into string values without corrupting tool_use_id or input shape.
  • Adversarial inputs. A pathological transcript of 50k small blocks each below the MIN_TOTAL_BUDGET_TOOL_RESULT_BYTES = 8_000 floor would still overshoot — the largest-first heuristic can't shrink below the floor. Layer B adds the final byte-cap enforcement.
  • User-visible status notice and task.emergency_truncation telemetry. Only fires when we enter genuinely degraded mode; Layer B's concern.
  • Hard-error path. Policy is "degrade, never error" — neither layer hard-errors.

Type of Change

  • 🐛 Bug fix (non-breaking change which fixes an issue)

Pre-flight Checklist

  • Changes are limited to a single feature, bugfix or chore
  • Tests are passing and code is formatted and linted
  • I have reviewed contributor guidelines

Additional Notes

The orchestrator threading is a small public-API change (MessageBuilder.buildForApi gains an optional second argument). Existing tests construct MessageBuilder directly without passing maxInputTokens and behave unchanged. The 6 MB hardcoded default stays as the fallback floor so any caller that doesn't yet know about maxInputTokens gets the pre-PR behavior.

Diagram

PR #10747 / CLINE-2191
Layer A: heuristic total text budget in MessageBuilder

Provider messages after per-block truncation:

  text blocks
  file blocks
  tool_result text
  other shrinkable text
        |
        v
  measure total text bytes/chars
        |
        v
  budget = maxInputTokens * 3 chars/token
        |
        v
  over budget?
        |
   +----+----+
   |         |
  no        yes
   |         |
   v         v
 return   choose largest shrinkable blocks first
          middle-truncate them
          skip signed/unshrinkable thinking
              |
              v
          provider payload usually fits

@linear-code
Copy link
Copy Markdown

linear-code Bot commented May 14, 2026

CLINE-2191

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 14, 2026

Greptile Summary

This PR widens MessageBuilder's truncation candidate set beyond tool_result blocks to also include top-level text and file blocks, and ties the aggregate text budget to the model's actual maxInputTokens (×3 chars/token) rather than a hardcoded 6 MB cap. modelInfo.maxInputTokens is now threaded from the orchestrator through prepareMessagesForModelRequestprepareProviderMessagesForApibuildForApi.

  • thinking and redacted_thinking blocks are intentionally excluded from candidates because truncating their text would invalidate the API signature; they are also removed from countMessageTextBytes so they no longer inflate the overflow calculation.
  • tool_use.input truncation is explicitly deferred to Layer B (CLINE-2192), and CHARS_PER_TOKEN is promoted to a public export so both layers share the same chars-per-token constant.

Confidence Score: 5/5

Safe to merge; both previously-flagged issues (thinking bytes inflating the uncollectable overflow, and thinking-text mutation invalidating the API signature) are cleanly resolved.

The two concrete defects called out in prior review rounds are both addressed: thinking blocks are removed from countMessageTextBytes so they no longer make the budget goal unreachable, and thinking blocks are skipped in collectTruncationCandidates so signatures are never invalidated. The orchestrator threading is mechanical and touches only the call sites. Remaining notes are test-coverage quality and a PR-description inaccuracy, neither of which affects runtime behaviour.

message-builder.test.ts — the file-block aggregate-budget test exercises the per-block truncation path rather than the aggregate path it describes; worth correcting before CLINE-2192 stacks on top.

Important Files Changed

Filename Overview
sdk/packages/core/src/session/services/message-builder.ts Core logic changes: countMessageTextBytes no longer counts thinking bytes (fixing the previously-flagged unkillable-bytes issue), collectTruncationCandidates now adds text and file blocks, truncateToTotalTextBudget derives budget from maxInputTokens, and the sort gains a deterministic tiebreaker.
sdk/packages/core/src/session/services/message-builder.test.ts Seven new CLINE-2191 test cases added. The file-block aggregate-budget test uses maxToolResultChars (50 K) smaller than the budget (200 K), so truncation is done by the per-block pass rather than the aggregate pass it intends to cover.
sdk/packages/core/src/runtime/orchestration/session-runtime-orchestrator.ts Threads modelInfo.maxInputTokens into createRuntimeHooks and createRuntimePrepareTurn → prepareMessagesForModelRequest → prepareProviderMessagesForApi → buildForApi. Change is mechanical and correct.
sdk/packages/shared/src/llms/tokens.ts CHARS_PER_TOKEN promoted from module-private to public export; no logic change.
sdk/packages/shared/src/index.ts Re-exports CHARS_PER_TOKEN alongside the existing estimateTokens export.
sdk/packages/shared/src/index.browser.ts Mirrors the index.ts CHARS_PER_TOKEN re-export for browser consumers.
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
sdk/packages/core/src/session/services/message-builder.test.ts:590-591
The test name promises coverage of the **aggregate** budget path for `file` blocks, but it doesn't actually exercise that path. `MessageBuilder(50_000, undefined, 200_000)` sets `maxToolResultChars = 50_000`, so `transformBlock``truncateMiddle` trims the 4 MB file to ≤50 KB during the per-block pass. Because 50 KB < 200 KB budget, `truncateToTotalTextBudget` short-circuits on the early-return guard and never enters the loop. Setting `maxToolResultChars` well above the aggregate budget (e.g., `new MessageBuilder(10_000_000, undefined, 200_000)`) would make the aggregate pass the active code path and give this test its claimed coverage.

```suggestion
	it("truncates top-level file blocks under the aggregate budget (CLINE-2191)", () => {
		// maxToolResultChars must exceed the aggregate budget so the per-block
		// pass does not truncate first; otherwise truncateToTotalTextBudget
		// short-circuits and the aggregate path is never exercised.
		const builder = new MessageBuilder(10_000_000, undefined, 200_000);
```

### Issue 2 of 2
sdk/packages/core/src/session/services/message-builder.ts:893-907
**PR description vs. implementation mismatch on `thinking` blocks**

The "Changes" section of the PR description states the candidate set "now collects … `thinking` block text", but the code (and the new tests) correctly show that all `thinking` blocks — signed or unsigned — are **skipped**. The code comment and test `"truncates assistant text but preserves signed thinking blocks"` both confirm the intentional skip.

This is purely a description inaccuracy, but it could mislead a reviewer or a developer reading the PR later into believing that `thinking` content is being middle-truncated today. Worth correcting in the PR body so the record is accurate for CLINE-2192 reviewers who will stack on top of this.

Reviews (5): Last reviewed commit: "fix(sdk): align layer-a thinking budget ..." | Re-trigger Greptile

Comment thread sdk/packages/core/src/session/services/message-builder.ts Outdated
@robinnewhouse robinnewhouse force-pushed the robin/cline-2185-compaction-cant-shrink-in-flight-turn branch from d94c054 to 3dae349 Compare May 14, 2026 20:02
@robinnewhouse robinnewhouse force-pushed the robin/cline-2191-widen-messagebuilder-truncation-candidate-set branch from 5ee594e to 945bd1e Compare May 14, 2026 20:05
@robinnewhouse
Copy link
Copy Markdown
Contributor Author

@greptileai review

@robinnewhouse robinnewhouse force-pushed the robin/cline-2185-compaction-cant-shrink-in-flight-turn branch from 3dae349 to 0438d29 Compare May 14, 2026 20:26
@robinnewhouse robinnewhouse force-pushed the robin/cline-2191-widen-messagebuilder-truncation-candidate-set branch from 945bd1e to 91f9c04 Compare May 14, 2026 20:26
@robinnewhouse
Copy link
Copy Markdown
Contributor Author

@greptileai review

@robinnewhouse robinnewhouse force-pushed the robin/cline-2191-widen-messagebuilder-truncation-candidate-set branch from 91f9c04 to fd7a192 Compare May 14, 2026 20:43
@robinnewhouse
Copy link
Copy Markdown
Contributor Author

@greptileai review

@robinnewhouse robinnewhouse force-pushed the robin/cline-2191-widen-messagebuilder-truncation-candidate-set branch from fd7a192 to 1101682 Compare May 14, 2026 21:24
…t to maxInputTokens (CLINE-2191)

CLINE-2191 Layer A. After PR #10739 (allowlist) and PR #10740 (protected-tail trim) the preservation set inside the in-flight turn still included the typed user prompt, the last assistant message, and any in-flight tool_use verbatim. Each can individually exceed any context window. This PR closes the common case.

Changes to MessageBuilder: (1) collectTruncationCandidates also collects user text, assistant text, thinking blocks, and top-level file blocks. tool_use input bodies remain untouched here — a JSON-aware structural truncator that avoids corrupting tool_use_id or breaking input JSON shape is Layer B (CLINE-2192). (2) buildForApi accepts an optional maxInputTokens; when present the aggregate budget becomes maxInputTokens * CHARS_PER_TOKEN, otherwise it falls back to the hardcoded 6 MB default so legacy callers keep their behavior. (3) The largest-first sort gains an insertion-order tiebreaker so the output is deterministic across runs (Layer B will rely on this).

Orchestrator: createRuntimeHooks and createRuntimePrepareTurn now thread modelInfo.maxInputTokens through to prepareProviderMessagesForApi -> buildForApi so the cap reflects the model that is actually being called.

Shared package: CHARS_PER_TOKEN was previously private inside @cline/shared/llms/tokens.ts. Exported alongside the existing estimateTokens.

Tests: 7 new cases in message-builder.test.ts. 913 -> 920 @cline/core unit tests pass. typecheck:smoke clean.

This is the preventive layer. It does NOT yet provide the hard guarantee "the outbound request will never exceed the context window." That guarantee lives in CLINE-2192 (Layer B) which stacks on top of this PR.
@robinnewhouse robinnewhouse force-pushed the robin/cline-2191-widen-messagebuilder-truncation-candidate-set branch from 1101682 to b9ec39c Compare May 14, 2026 21:31
@robinnewhouse
Copy link
Copy Markdown
Contributor Author

@greptileai review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant