fix(sdk): trim oldest completed tool pairs from protected tail in basic compaction (CLINE-2185)#10740
Conversation
Greptile SummaryThis PR fixes the unbounded protected-tail problem (CLINE-2185) by adding
Confidence Score: 4/5Safe to merge; the tail-trim path is well-guarded and the new test suite covers all preservation invariants including the compaction-summary retention case flagged in the previous review thread. The previous review thread raised a concern about silently dropping compaction summaries when the second early bail was removed. That concern is addressed by reconstructPrefixMessages, which explicitly re-inserts any isCompactionSummaryMessage entry from compactable that was not a basic candidate, and a dedicated test verifies the round-trip. The logic is carefully structured: removeTailCandidatesByPredicate correctly aborts closures that would pull a preserved candidate into the removal set, and all atomic-pair invariants are verified by the new tests. sdk/packages/core/src/extensions/context/basic-compaction.ts — specifically the reconstructPrefixMessages output-assembly logic and the removal of the candidates.length === 0 bail.
|
| Filename | Overview |
|---|---|
| sdk/packages/core/src/extensions/context/basic-compaction.ts | Adds protected-tail trimming via trimProtectedTail/removeTailCandidatesByPredicate, generalises closure helpers to MinimalCandidate, replaces direct candidate-map with reconstructPrefixMessages (preserves compaction summaries), and removes both early bails to let tail-trim run on prefix-less turns. |
| sdk/packages/core/src/extensions/context/compaction.test.ts | Adds 7 new test cases for CLINE-2185 covering tail trim, in-flight tool_use preservation, aggressive budget, unchanged-preservation round-trip, prefix regression, compaction-summary retention, and removed-prefix-candidate non-resurrection. |
Reviews (6): Last reviewed commit: "style(sdk): format prefix reconstruction..." | Re-trigger Greptile
|
Follow-up #10747 (stacked on this PR) implements Layer A of the hard-guarantee work: widens the MessageBuilder truncation candidate set, ties the aggregate budget to the actual modelInfo.maxInputTokens. Layer B (CLINE-2192, brick-wall enforce-budget pass + status notice + telemetry) stacks on #10747. |
d94c054 to
3dae349
Compare
…t to maxInputTokens (CLINE-2191) CLINE-2191 Layer A. After PR #10739 (allowlist) and PR #10740 (protected-tail trim) the preservation set inside the in-flight turn still included the typed user prompt, the last assistant message, and any in-flight tool_use verbatim. Each can individually exceed any context window. This PR closes the common case. Changes to MessageBuilder: (1) collectTruncationCandidates also collects user text, assistant text, thinking blocks, and top-level file blocks. tool_use input bodies remain untouched here — a JSON-aware structural truncator that avoids corrupting tool_use_id or breaking input JSON shape is Layer B (CLINE-2192). (2) buildForApi accepts an optional maxInputTokens; when present the aggregate budget becomes maxInputTokens * CHARS_PER_TOKEN, otherwise it falls back to the hardcoded 6 MB default so legacy callers keep their behavior. (3) The largest-first sort gains an insertion-order tiebreaker so the output is deterministic across runs (Layer B will rely on this). Orchestrator: createRuntimeHooks and createRuntimePrepareTurn now thread modelInfo.maxInputTokens through to prepareProviderMessagesForApi -> buildForApi so the cap reflects the model that is actually being called. Shared package: CHARS_PER_TOKEN was previously private inside @cline/shared/llms/tokens.ts. Exported alongside the existing estimateTokens. Tests: 7 new cases in message-builder.test.ts. 913 -> 920 @cline/core unit tests pass. typecheck:smoke clean. This is the preventive layer. It does NOT yet provide the hard guarantee "the outbound request will never exceed the context window." That guarantee lives in CLINE-2192 (Layer B) which stacks on top of this PR.
|
@greptileai review |
…ic compaction (CLINE-2185) CLINE-2136 introduced a deliberate invariant: both runBasicCompaction and runAgenticCompaction refuse to touch the latest turn. That fix prevented orphaned tool_use/tool_result pairs, but it also left the in-flight turn unbounded. A turn whose tool calls accumulated large outputs (editor patches, fetch_web_content, etc.) could push a single request past any context window, which is what CLINE-2183 surfaced on Opus 4.7 against OpenRouter. PR #10739 widened MessageBuilder.buildForApi's 6 MB allowlist as a triage fix. This PR is the structural follow-up: when the protected tail alone exceeds the compaction target, runBasicCompaction now runs an atomic-pair removal pass on the tail, dropping oldest completed pairs first. Preservation set inside the tail: (a) the typed user prompt (turn-start), (b) the most recent assistant message, (c) any assistant carrying a tool_use whose tool_result has not yet arrived (in-flight pair). The atomic-pair closure (collectAtomicRemovalIndexes) is reused unchanged. Implementation notes (deviations from plan.md): (1) Added a new removeTailCandidatesByPredicate that aborts a seed when its atomic closure crosses the preservation set, rather than generic-izing the existing function. Prefix codepath stays byte-identical. (2) Removed the compactable.length === 0 early bail so a transcript that is entirely one in-progress turn still runs the tail trim. Tests: 5 new compaction.test.ts cases. Existing 27 compaction tests pass unchanged. 913 @cline/core unit tests pass. typecheck:smoke clean.
3dae349 to
0438d29
Compare
…t to maxInputTokens (CLINE-2191) CLINE-2191 Layer A. After PR #10739 (allowlist) and PR #10740 (protected-tail trim) the preservation set inside the in-flight turn still included the typed user prompt, the last assistant message, and any in-flight tool_use verbatim. Each can individually exceed any context window. This PR closes the common case. Changes to MessageBuilder: (1) collectTruncationCandidates also collects user text, assistant text, thinking blocks, and top-level file blocks. tool_use input bodies remain untouched here — a JSON-aware structural truncator that avoids corrupting tool_use_id or breaking input JSON shape is Layer B (CLINE-2192). (2) buildForApi accepts an optional maxInputTokens; when present the aggregate budget becomes maxInputTokens * CHARS_PER_TOKEN, otherwise it falls back to the hardcoded 6 MB default so legacy callers keep their behavior. (3) The largest-first sort gains an insertion-order tiebreaker so the output is deterministic across runs (Layer B will rely on this). Orchestrator: createRuntimeHooks and createRuntimePrepareTurn now thread modelInfo.maxInputTokens through to prepareProviderMessagesForApi -> buildForApi so the cap reflects the model that is actually being called. Shared package: CHARS_PER_TOKEN was previously private inside @cline/shared/llms/tokens.ts. Exported alongside the existing estimateTokens. Tests: 7 new cases in message-builder.test.ts. 913 -> 920 @cline/core unit tests pass. typecheck:smoke clean. This is the preventive layer. It does NOT yet provide the hard guarantee "the outbound request will never exceed the context window." That guarantee lives in CLINE-2192 (Layer B) which stacks on top of this PR.
|
@greptileai review |
…t to maxInputTokens (CLINE-2191) CLINE-2191 Layer A. After PR #10739 (allowlist) and PR #10740 (protected-tail trim) the preservation set inside the in-flight turn still included the typed user prompt, the last assistant message, and any in-flight tool_use verbatim. Each can individually exceed any context window. This PR closes the common case. Changes to MessageBuilder: (1) collectTruncationCandidates also collects user text, assistant text, thinking blocks, and top-level file blocks. tool_use input bodies remain untouched here — a JSON-aware structural truncator that avoids corrupting tool_use_id or breaking input JSON shape is Layer B (CLINE-2192). (2) buildForApi accepts an optional maxInputTokens; when present the aggregate budget becomes maxInputTokens * CHARS_PER_TOKEN, otherwise it falls back to the hardcoded 6 MB default so legacy callers keep their behavior. (3) The largest-first sort gains an insertion-order tiebreaker so the output is deterministic across runs (Layer B will rely on this). Orchestrator: createRuntimeHooks and createRuntimePrepareTurn now thread modelInfo.maxInputTokens through to prepareProviderMessagesForApi -> buildForApi so the cap reflects the model that is actually being called. Shared package: CHARS_PER_TOKEN was previously private inside @cline/shared/llms/tokens.ts. Exported alongside the existing estimateTokens. Tests: 7 new cases in message-builder.test.ts. 913 -> 920 @cline/core unit tests pass. typecheck:smoke clean. This is the preventive layer. It does NOT yet provide the hard guarantee "the outbound request will never exceed the context window." That guarantee lives in CLINE-2192 (Layer B) which stacks on top of this PR.
|
@greptileai review |
…t to maxInputTokens (CLINE-2191) CLINE-2191 Layer A. After PR #10739 (allowlist) and PR #10740 (protected-tail trim) the preservation set inside the in-flight turn still included the typed user prompt, the last assistant message, and any in-flight tool_use verbatim. Each can individually exceed any context window. This PR closes the common case. Changes to MessageBuilder: (1) collectTruncationCandidates also collects user text, assistant text, thinking blocks, and top-level file blocks. tool_use input bodies remain untouched here — a JSON-aware structural truncator that avoids corrupting tool_use_id or breaking input JSON shape is Layer B (CLINE-2192). (2) buildForApi accepts an optional maxInputTokens; when present the aggregate budget becomes maxInputTokens * CHARS_PER_TOKEN, otherwise it falls back to the hardcoded 6 MB default so legacy callers keep their behavior. (3) The largest-first sort gains an insertion-order tiebreaker so the output is deterministic across runs (Layer B will rely on this). Orchestrator: createRuntimeHooks and createRuntimePrepareTurn now thread modelInfo.maxInputTokens through to prepareProviderMessagesForApi -> buildForApi so the cap reflects the model that is actually being called. Shared package: CHARS_PER_TOKEN was previously private inside @cline/shared/llms/tokens.ts. Exported alongside the existing estimateTokens. Tests: 7 new cases in message-builder.test.ts. 913 -> 920 @cline/core unit tests pass. typecheck:smoke clean. This is the preventive layer. It does NOT yet provide the hard guarantee "the outbound request will never exceed the context window." That guarantee lives in CLINE-2192 (Layer B) which stacks on top of this PR.
…t to maxInputTokens (CLINE-2191) CLINE-2191 Layer A. After PR #10739 (allowlist) and PR #10740 (protected-tail trim) the preservation set inside the in-flight turn still included the typed user prompt, the last assistant message, and any in-flight tool_use verbatim. Each can individually exceed any context window. This PR closes the common case. Changes to MessageBuilder: (1) collectTruncationCandidates also collects user text, assistant text, thinking blocks, and top-level file blocks. tool_use input bodies remain untouched here — a JSON-aware structural truncator that avoids corrupting tool_use_id or breaking input JSON shape is Layer B (CLINE-2192). (2) buildForApi accepts an optional maxInputTokens; when present the aggregate budget becomes maxInputTokens * CHARS_PER_TOKEN, otherwise it falls back to the hardcoded 6 MB default so legacy callers keep their behavior. (3) The largest-first sort gains an insertion-order tiebreaker so the output is deterministic across runs (Layer B will rely on this). Orchestrator: createRuntimeHooks and createRuntimePrepareTurn now thread modelInfo.maxInputTokens through to prepareProviderMessagesForApi -> buildForApi so the cap reflects the model that is actually being called. Shared package: CHARS_PER_TOKEN was previously private inside @cline/shared/llms/tokens.ts. Exported alongside the existing estimateTokens. Tests: 7 new cases in message-builder.test.ts. 913 -> 920 @cline/core unit tests pass. typecheck:smoke clean. This is the preventive layer. It does NOT yet provide the hard guarantee "the outbound request will never exceed the context window." That guarantee lives in CLINE-2192 (Layer B) which stacks on top of this PR.
|
@greptileai review |
1 similar comment
|
@greptileai review |
Related Issue
Issue: CLINE-2185 — Compaction can't shrink the in-flight turn (protected-tail trim)
Stacked on: #10739 — please merge that first.
Triggering report: CLINE-2183 (Opus 4.7 request can exceed context limit after a few prompts)
Description
This is the structural follow-up to #10739. The triage PR widened
MessageBuilder.buildForApi's 6 MB allowlist so the safety net covers the realistic large-output default tools. That bounds the worst case at ~2M tokens — still above a 1M Opus 4.7 ceiling. This PR closes the gap.The real root cause of CLINE-2183 is the protected-tail carve-out introduced in CLINE-2136:
runBasicCompaction.splitLatestTurncarves outmessages.slice(findLastTurnStartIndex(...))asprotectedTailand concatenates it verbatim into the result.runAgenticCompaction'sfindCutIndexsnaps to the same turn-start boundary.Both are correct — they prevent orphaned
tool_use/tool_resultpairs that providers reject — but together they leave the in-flight turn unbounded. A coding-heavy turn with many largeeditorpatches, fetched web content, or skill results can grow the tail past any context window, and basic compaction returns the same bloated tail no matter how aggressive the budget.The atomic-pair removal infrastructure (
collectAtomicRemovalIndexes) added in CLINE-2136 already proves the closure invariant: a removal expands across the wholetool_use_idgraph. We can reuse that to drop oldest completed pairs from inside the tail, as long as we preserve a small floor:tool_usewhosetool_resulthas not yet arrived (an "in-flight pair"). This is the new exception that prevents the "Tool execution was interrupted before a result was produced" failure mode CLINE-2136 fixed for the historical prefix.Everything else inside the tail is candidate for removal, oldest first.
Implementation notes (deviations from plan.md)
Two design choices that surfaced during implementation:
New
removeTailCandidatesByPredicate, not a generic-ized version of the existing one. plan.md suggested generic-izingremoveCandidatesByPredicatewith aMinimalCandidateinterface. I generic-ized the closure helpers (collectAtomicRemovalIndexes,buildToolPairCandidateIndex,collectToolPairIds) but keptremoveCandidatesByPredicatefor the prefix and added a siblingremoveTailCandidatesByPredicatefor the tail. The new helper has one key difference: if the atomic closure of a seed candidate touches ANY candidate that fails the predicate, the seed is aborted. This is what prevents the closure from dragging the preserved last-assistant into the removal set when itstool_resultis the seed. Prefix codepath stays byte-identical.Removed the
compactable.length === 0early bail inrunBasicCompaction. The CLINE-2183 shape is "one in-progress turn, no historical prefix yet" — exactly the case wherecompactableis empty. The old early-bail would skip the tail trim there, the worst possible time. NowrunBasicCompactionproceeds with an empty prefix candidate list (prefix passes are no-ops) and lets the tail-trim path run.Test Procedure
Five new cases in
sdk/packages/core/src/extensions/context/compaction.test.ts:trims completed tool pairs inside the protected tail when the tail alone exceeds the trigger (CLINE-2185)— three completed pairs in a single turn, force a tight budget, assert oldest two pairs dropped, last pair + typed prompt preserved, no orphaned pairs.preserves an in-flight tool_use whose result has not yet arrived (CLINE-2185)— older completed pair + newer in-flighttool_use. Asserts older pair dropped, in-flight pair preserved.preserves the typed prompt and the last assistant message under aggressive tail compaction (CLINE-2185)— minimal transcript that is entirely preservation set. Asserts unchanged output.returns the tail unchanged when the entire tail is in the preservation set (CLINE-2185)— typed prompt + one in-flighttool_use, no result yet. Asserts no change.still compacts the historical prefix when the tail is small (CLINE-2185)— regression check: old[user, asst, user, ...tail]shape with a small tail still drops the old assistant message.Verification:
What could break:
collectAtomicRemovalIndexes(function body unchanged; only signature widens to<C extends MinimalCandidate>). The newremoveTailCandidatesByPredicateis strictly more conservative than the original — it only commits a removal when the entire closure is removable.compactable.length === 0path: previously this returnedundefined, now it returns either a tail-trim result orundefined(whenhaveMessagesChangedreports no change). Callers handle both cases.What this PR still does NOT fix (intentional non-goals, per goals.md)
editorpatch inside the preserved last assistant).MessageBuilder.buildForApialready does per-block truncation; we don't reach inside the message here.thinkingblocks are still invisible to the aggregate budget.trimCandidatesToBudget; not touched.runAgenticCompactiontail trim.findCutIndexhas the same protected-tail issue. Agentic compaction is more invasive (materializes a summary message via the LLM) so it warrants its own PR.Worst-case bound after this PR: a single in-flight tool pair plus the typed prompt + last assistant message. Anything larger surfacing as the bloat source is the next escalation, and it should land in a future PR rather than be jammed in here.
Type of Change
Pre-flight Checklist
npm test) and code is formatted and linted (npm run format && npm run lint)Screenshots
N/A — backend data-shaping change. Test counts above are the evidence.
Additional Notes
The
logger?.debug("Performed basic compaction", ...)line gains three new fields (tailTokensBefore,tailMessagesBefore,tailMessagesAfter) so the tail path is visible in diagnostics. Thetask.compaction_executedtelemetry schema is unchanged.Diagram