fix(sdk): widen MessageBuilder truncation candidate set and tie budget to maxInputTokens (CLINE-2191)#10747
Conversation
Greptile SummaryThis PR widens
Confidence Score: 5/5Safe to merge; both previously-flagged issues (thinking bytes inflating the uncollectable overflow, and thinking-text mutation invalidating the API signature) are cleanly resolved. The two concrete defects called out in prior review rounds are both addressed: thinking blocks are removed from countMessageTextBytes so they no longer make the budget goal unreachable, and thinking blocks are skipped in collectTruncationCandidates so signatures are never invalidated. The orchestrator threading is mechanical and touches only the call sites. Remaining notes are test-coverage quality and a PR-description inaccuracy, neither of which affects runtime behaviour. message-builder.test.ts — the file-block aggregate-budget test exercises the per-block truncation path rather than the aggregate path it describes; worth correcting before CLINE-2192 stacks on top.
|
| Filename | Overview |
|---|---|
| sdk/packages/core/src/session/services/message-builder.ts | Core logic changes: countMessageTextBytes no longer counts thinking bytes (fixing the previously-flagged unkillable-bytes issue), collectTruncationCandidates now adds text and file blocks, truncateToTotalTextBudget derives budget from maxInputTokens, and the sort gains a deterministic tiebreaker. |
| sdk/packages/core/src/session/services/message-builder.test.ts | Seven new CLINE-2191 test cases added. The file-block aggregate-budget test uses maxToolResultChars (50 K) smaller than the budget (200 K), so truncation is done by the per-block pass rather than the aggregate pass it intends to cover. |
| sdk/packages/core/src/runtime/orchestration/session-runtime-orchestrator.ts | Threads modelInfo.maxInputTokens into createRuntimeHooks and createRuntimePrepareTurn → prepareMessagesForModelRequest → prepareProviderMessagesForApi → buildForApi. Change is mechanical and correct. |
| sdk/packages/shared/src/llms/tokens.ts | CHARS_PER_TOKEN promoted from module-private to public export; no logic change. |
| sdk/packages/shared/src/index.ts | Re-exports CHARS_PER_TOKEN alongside the existing estimateTokens export. |
| sdk/packages/shared/src/index.browser.ts | Mirrors the index.ts CHARS_PER_TOKEN re-export for browser consumers. |
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
sdk/packages/core/src/session/services/message-builder.test.ts:590-591
The test name promises coverage of the **aggregate** budget path for `file` blocks, but it doesn't actually exercise that path. `MessageBuilder(50_000, undefined, 200_000)` sets `maxToolResultChars = 50_000`, so `transformBlock` → `truncateMiddle` trims the 4 MB file to ≤50 KB during the per-block pass. Because 50 KB < 200 KB budget, `truncateToTotalTextBudget` short-circuits on the early-return guard and never enters the loop. Setting `maxToolResultChars` well above the aggregate budget (e.g., `new MessageBuilder(10_000_000, undefined, 200_000)`) would make the aggregate pass the active code path and give this test its claimed coverage.
```suggestion
it("truncates top-level file blocks under the aggregate budget (CLINE-2191)", () => {
// maxToolResultChars must exceed the aggregate budget so the per-block
// pass does not truncate first; otherwise truncateToTotalTextBudget
// short-circuits and the aggregate path is never exercised.
const builder = new MessageBuilder(10_000_000, undefined, 200_000);
```
### Issue 2 of 2
sdk/packages/core/src/session/services/message-builder.ts:893-907
**PR description vs. implementation mismatch on `thinking` blocks**
The "Changes" section of the PR description states the candidate set "now collects … `thinking` block text", but the code (and the new tests) correctly show that all `thinking` blocks — signed or unsigned — are **skipped**. The code comment and test `"truncates assistant text but preserves signed thinking blocks"` both confirm the intentional skip.
This is purely a description inaccuracy, but it could mislead a reviewer or a developer reading the PR later into believing that `thinking` content is being middle-truncated today. Worth correcting in the PR body so the record is accurate for CLINE-2192 reviewers who will stack on top of this.
Reviews (5): Last reviewed commit: "fix(sdk): align layer-a thinking budget ..." | Re-trigger Greptile
d94c054 to
3dae349
Compare
5ee594e to
945bd1e
Compare
|
@greptileai review |
3dae349 to
0438d29
Compare
945bd1e to
91f9c04
Compare
|
@greptileai review |
91f9c04 to
fd7a192
Compare
|
@greptileai review |
fd7a192 to
1101682
Compare
…t to maxInputTokens (CLINE-2191) CLINE-2191 Layer A. After PR #10739 (allowlist) and PR #10740 (protected-tail trim) the preservation set inside the in-flight turn still included the typed user prompt, the last assistant message, and any in-flight tool_use verbatim. Each can individually exceed any context window. This PR closes the common case. Changes to MessageBuilder: (1) collectTruncationCandidates also collects user text, assistant text, thinking blocks, and top-level file blocks. tool_use input bodies remain untouched here — a JSON-aware structural truncator that avoids corrupting tool_use_id or breaking input JSON shape is Layer B (CLINE-2192). (2) buildForApi accepts an optional maxInputTokens; when present the aggregate budget becomes maxInputTokens * CHARS_PER_TOKEN, otherwise it falls back to the hardcoded 6 MB default so legacy callers keep their behavior. (3) The largest-first sort gains an insertion-order tiebreaker so the output is deterministic across runs (Layer B will rely on this). Orchestrator: createRuntimeHooks and createRuntimePrepareTurn now thread modelInfo.maxInputTokens through to prepareProviderMessagesForApi -> buildForApi so the cap reflects the model that is actually being called. Shared package: CHARS_PER_TOKEN was previously private inside @cline/shared/llms/tokens.ts. Exported alongside the existing estimateTokens. Tests: 7 new cases in message-builder.test.ts. 913 -> 920 @cline/core unit tests pass. typecheck:smoke clean. This is the preventive layer. It does NOT yet provide the hard guarantee "the outbound request will never exceed the context window." That guarantee lives in CLINE-2192 (Layer B) which stacks on top of this PR.
1101682 to
b9ec39c
Compare
|
@greptileai review |
Related Issue
Issue: CLINE-2191 — Widen MessageBuilder truncation candidate set (Layer A)
Stacked on: #10740 — please merge that first.
Sibling follow-up: CLINE-2192 — Absolute hard guarantee (Layer B). This PR does NOT guarantee the request fits.
Description
After #10739 (allowlist) and #10740 (protected-tail trim), the post-compaction preservation set inside the in-flight turn still contains the typed user prompt, the last assistant message, and any in-flight
tool_useverbatim. Each can individually exceed any context window.MessageBuilder.buildForApialready has the middle-truncation infrastructure (truncateMiddleByChars,truncateMiddleToBytes, an aggregate-budget pass attruncateToTotalTextBudget). It just doesn't see enough block types and runs against a hardcoded 6 MB cap rather than the model's actualmaxInputTokens.This PR is the preventive layer — the typical over-budget request stops being a 400 and becomes a truncated-but-shipped request. The brick-wall guarantee for adversarial inputs is sibling CLINE-2192, which stacks on top of this one. We are not over-selling: this PR does NOT yet meet the "Ever EVER" bar.
Changes
In
sdk/packages/core/src/session/services/message-builder.ts:collectTruncationCandidateswidened. In addition to the existingtool_resultcandidates it now collects usertext, assistanttext,thinkingblock text, and top-levelfileblock content.redacted_thinkingis skipped (placeholder string).tool_useblocks are skipped — a JSON-aware structural truncator that avoids corruptingtool_use_idor breaking theinputJSON shape is Layer B's responsibility.maxInputTokens.buildForApi(messages, { maxInputTokens })now derives the cap from the model's actualmaxInputTokens(×CHARS_PER_TOKEN= 3). WhenmaxInputTokensis not provided the constructor's hardcoded 6 MB default still applies, so legacy callers behave identically.In
sdk/packages/core/src/runtime/orchestration/session-runtime-orchestrator.ts:createRuntimeHooksandcreateRuntimePrepareTurnnow threadmodelInfo.maxInputTokensthroughprepareMessagesForModelRequest→prepareProviderMessagesForApi→buildForApi. The cap reflects the model that is actually being called, not a 6 MB approximation.In
sdk/packages/shared/src/llms/tokens.ts+index.ts+index.browser.ts:CHARS_PER_TOKENwas previously private. Exported alongside the existingestimateTokens.Test Procedure
Seven new cases in
sdk/packages/core/src/session/services/message-builder.test.ts:truncates user text blocks under the aggregate budget (CLINE-2191)— 5 MB user text, 500 KB budget, asserts truncation marker and final size.truncates assistant text and thinking blocks under the aggregate budget (CLINE-2191)— 2 MB text + 1 MB thinking, 250 KB budget, asserts both truncated.truncates top-level file blocks under the aggregate budget (CLINE-2191)— 4 MB file block, 200 KB budget.skips tool_use input bodies (CLINE-2191, deferred to Layer B)— 4 MBtool_use.input.bodystays untouched. Doc-test for the deferral.derives the aggregate budget from maxInputTokens when provided (CLINE-2191)—buildForApi(messages, { maxInputTokens: 100_000 })→ 300 000 byte cap.falls back to the constructor default budget when maxInputTokens is absent (CLINE-2191)— legacy callers unchanged.produces deterministic output for equal-byte-length candidates (CLINE-2191)— two equal-byte candidates → byte-identical output across two runs.Verification:
What this PR still does NOT fix (intentional, deferred to CLINE-2192)
tool_use.inputtruncation. JSON-aware structural truncator that drills into string values without corruptingtool_use_idorinputshape.MIN_TOTAL_BUDGET_TOOL_RESULT_BYTES = 8_000floor would still overshoot — the largest-first heuristic can't shrink below the floor. Layer B adds the final byte-cap enforcement.task.emergency_truncationtelemetry. Only fires when we enter genuinely degraded mode; Layer B's concern.Type of Change
Pre-flight Checklist
Additional Notes
The orchestrator threading is a small public-API change (
MessageBuilder.buildForApigains an optional second argument). Existing tests constructMessageBuilderdirectly without passingmaxInputTokensand behave unchanged. The 6 MB hardcoded default stays as the fallback floor so any caller that doesn't yet know aboutmaxInputTokensgets the pre-PR behavior.Diagram