Skip to content

fix(sdk): trim oldest completed tool pairs from protected tail in basic compaction (CLINE-2185)#10740

Open
robinnewhouse wants to merge 5 commits into
robin/cline-2183-opus-47-request-can-exceed-context-limit-after-a-few-promptsfrom
robin/cline-2185-compaction-cant-shrink-in-flight-turn
Open

fix(sdk): trim oldest completed tool pairs from protected tail in basic compaction (CLINE-2185)#10740
robinnewhouse wants to merge 5 commits into
robin/cline-2183-opus-47-request-can-exceed-context-limit-after-a-few-promptsfrom
robin/cline-2185-compaction-cant-shrink-in-flight-turn

Conversation

@robinnewhouse
Copy link
Copy Markdown
Contributor

@robinnewhouse robinnewhouse commented May 14, 2026

Related Issue

Issue: CLINE-2185 — Compaction can't shrink the in-flight turn (protected-tail trim)
Stacked on: #10739 — please merge that first.
Triggering report: CLINE-2183 (Opus 4.7 request can exceed context limit after a few prompts)

Description

This is the structural follow-up to #10739. The triage PR widened MessageBuilder.buildForApi's 6 MB allowlist so the safety net covers the realistic large-output default tools. That bounds the worst case at ~2M tokens — still above a 1M Opus 4.7 ceiling. This PR closes the gap.

The real root cause of CLINE-2183 is the protected-tail carve-out introduced in CLINE-2136:

  • runBasicCompaction.splitLatestTurn carves out messages.slice(findLastTurnStartIndex(...)) as protectedTail and concatenates it verbatim into the result.
  • runAgenticCompaction's findCutIndex snaps to the same turn-start boundary.

Both are correct — they prevent orphaned tool_use / tool_result pairs that providers reject — but together they leave the in-flight turn unbounded. A coding-heavy turn with many large editor patches, fetched web content, or skill results can grow the tail past any context window, and basic compaction returns the same bloated tail no matter how aggressive the budget.

The atomic-pair removal infrastructure (collectAtomicRemovalIndexes) added in CLINE-2136 already proves the closure invariant: a removal expands across the whole tool_use_id graph. We can reuse that to drop oldest completed pairs from inside the tail, as long as we preserve a small floor:

  • The typed user prompt (always at tail index 0 by construction).
  • The most recent assistant message.
  • Any assistant message carrying a tool_use whose tool_result has not yet arrived (an "in-flight pair"). This is the new exception that prevents the "Tool execution was interrupted before a result was produced" failure mode CLINE-2136 fixed for the historical prefix.

Everything else inside the tail is candidate for removal, oldest first.

Implementation notes (deviations from plan.md)

Two design choices that surfaced during implementation:

  1. New removeTailCandidatesByPredicate, not a generic-ized version of the existing one. plan.md suggested generic-izing removeCandidatesByPredicate with a MinimalCandidate interface. I generic-ized the closure helpers (collectAtomicRemovalIndexes, buildToolPairCandidateIndex, collectToolPairIds) but kept removeCandidatesByPredicate for the prefix and added a sibling removeTailCandidatesByPredicate for the tail. The new helper has one key difference: if the atomic closure of a seed candidate touches ANY candidate that fails the predicate, the seed is aborted. This is what prevents the closure from dragging the preserved last-assistant into the removal set when its tool_result is the seed. Prefix codepath stays byte-identical.

  2. Removed the compactable.length === 0 early bail in runBasicCompaction. The CLINE-2183 shape is "one in-progress turn, no historical prefix yet" — exactly the case where compactable is empty. The old early-bail would skip the tail trim there, the worst possible time. Now runBasicCompaction proceeds with an empty prefix candidate list (prefix passes are no-ops) and lets the tail-trim path run.

Test Procedure

Five new cases in sdk/packages/core/src/extensions/context/compaction.test.ts:

  1. trims completed tool pairs inside the protected tail when the tail alone exceeds the trigger (CLINE-2185) — three completed pairs in a single turn, force a tight budget, assert oldest two pairs dropped, last pair + typed prompt preserved, no orphaned pairs.
  2. preserves an in-flight tool_use whose result has not yet arrived (CLINE-2185) — older completed pair + newer in-flight tool_use. Asserts older pair dropped, in-flight pair preserved.
  3. preserves the typed prompt and the last assistant message under aggressive tail compaction (CLINE-2185) — minimal transcript that is entirely preservation set. Asserts unchanged output.
  4. returns the tail unchanged when the entire tail is in the preservation set (CLINE-2185) — typed prompt + one in-flight tool_use, no result yet. Asserts no change.
  5. still compacts the historical prefix when the tail is small (CLINE-2185) — regression check: old [user, asst, user, ...tail] shape with a small tail still drops the old assistant message.

Verification:

sdk/packages/core $ bun run typecheck:smoke    # clean
sdk/packages/core $ bun run test:unit -- compaction.test
  ✓ 32 tests passed (27 existing + 5 new)
sdk/packages/core $ bun run test:unit
  Test Files  102 passed | 1 skipped (103)
       Tests  913 passed | 4 skipped (917)

What could break:

  • The atomic-pair closure invariant is preserved by reuse of collectAtomicRemovalIndexes (function body unchanged; only signature widens to <C extends MinimalCandidate>). The new removeTailCandidatesByPredicate is strictly more conservative than the original — it only commits a removal when the entire closure is removable.
  • The new compactable.length === 0 path: previously this returned undefined, now it returns either a tail-trim result or undefined (when haveMessagesChanged reports no change). Callers handle both cases.

What this PR still does NOT fix (intentional non-goals, per goals.md)

  • Intra-message truncation of an oversized assistant message (e.g. a single huge editor patch inside the preserved last assistant). MessageBuilder.buildForApi already does per-block truncation; we don't reach inside the message here.
  • Anthropic thinking blocks are still invisible to the aggregate budget.
  • Oversized first-user prompt is handled by the existing prefix trimCandidatesToBudget; not touched.
  • runAgenticCompaction tail trim. findCutIndex has the same protected-tail issue. Agentic compaction is more invasive (materializes a summary message via the LLM) so it warrants its own PR.
  • MCP tool result allowlisting. Names are dynamic; needs a size-based fallback that doesn't exist yet.

Worst-case bound after this PR: a single in-flight tool pair plus the typed prompt + last assistant message. Anything larger surfacing as the bloat source is the next escalation, and it should land in a future PR rather than be jammed in here.

Type of Change

  • 🐛 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • ♻️ Refactor Changes
  • 💅 Cosmetic Changes
  • 📚 Documentation update
  • 🏃 Workflow Changes

Pre-flight Checklist

  • Changes are limited to a single feature, bugfix or chore (split larger changes into separate PRs)
  • Tests are passing (npm test) and code is formatted and linted (npm run format && npm run lint)
  • I have reviewed contributor guidelines

Screenshots

N/A — backend data-shaping change. Test counts above are the evidence.

Additional Notes

The logger?.debug("Performed basic compaction", ...) line gains three new fields (tailTokensBefore, tailMessagesBefore, tailMessagesAfter) so the tail path is visible in diagnostics. The task.compaction_executed telemetry schema is unchanged.

Diagram

PR #10740 / CLINE-2185
Layer 1: trim completed tool-call pairs inside the protected tail

Conversation before API request:

  historical prefix        latest in-flight turn / protected tail
  -----------------        -------------------------------------
  old messages             user prompt
  old tool pairs           assistant tool_use A
  old text                 tool_result A
                           assistant tool_use B
                           tool_result B
                           assistant currently in progress

Old behavior:
  compact historical prefix only
        |
        v
  protected tail kept verbatim
        |
        v
  if tail alone is huge, request still too large


New behavior:
  compact historical prefix
        |
        v
  if still too large:
        |
        v
  remove oldest completed tool_use/tool_result pairs from tail
        |
        v
  preserve:
    - typed user prompt
    - latest assistant context
    - in-progress tool_use blocks
    - atomic tool_use/tool_result consistency
        |
        v
  coherent but smaller request

@linear-code
Copy link
Copy Markdown

linear-code Bot commented May 14, 2026

CLINE-2185

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 14, 2026

Greptile Summary

This PR fixes the unbounded protected-tail problem (CLINE-2185) by adding trimProtectedTail to runBasicCompaction: when the in-flight turn alone exceeds the compaction budget, oldest completed tool_use/tool_result pairs are dropped from inside the tail while preserving the typed user prompt, the last assistant message, and any assistant carrying an in-flight (result-less) tool_use. A parallel refactor introduces reconstructPrefixMessages, which replaces the direct candidate-map and correctly re-injects compaction summary messages that sanitizeMessageForBasic had previously silently dropped from the output.

  • Tail trimming (trimProtectedTail + removeTailCandidatesByPredicate): aborts any atomic closure that would pull a preserved candidate into the removal set.
  • reconstructPrefixMessages: replaces candidates.map(c => c.message); compaction summary messages in compactable are now faithfully forwarded.
  • Removal of both early bails: compactable.length === 0 bail dropped so tail-trim runs on prefix-less turns; candidates.length === 0 bail superseded by reconstructPrefixMessages.

Confidence Score: 4/5

Safe to merge; the tail-trim path is well-guarded and the new test suite covers all preservation invariants including the compaction-summary retention case flagged in the previous review thread.

The previous review thread raised a concern about silently dropping compaction summaries when the second early bail was removed. That concern is addressed by reconstructPrefixMessages, which explicitly re-inserts any isCompactionSummaryMessage entry from compactable that was not a basic candidate, and a dedicated test verifies the round-trip. The logic is carefully structured: removeTailCandidatesByPredicate correctly aborts closures that would pull a preserved candidate into the removal set, and all atomic-pair invariants are verified by the new tests.

sdk/packages/core/src/extensions/context/basic-compaction.ts — specifically the reconstructPrefixMessages output-assembly logic and the removal of the candidates.length === 0 bail.

Important Files Changed

Filename Overview
sdk/packages/core/src/extensions/context/basic-compaction.ts Adds protected-tail trimming via trimProtectedTail/removeTailCandidatesByPredicate, generalises closure helpers to MinimalCandidate, replaces direct candidate-map with reconstructPrefixMessages (preserves compaction summaries), and removes both early bails to let tail-trim run on prefix-less turns.
sdk/packages/core/src/extensions/context/compaction.test.ts Adds 7 new test cases for CLINE-2185 covering tail trim, in-flight tool_use preservation, aggressive budget, unchanged-preservation round-trip, prefix regression, compaction-summary retention, and removed-prefix-candidate non-resurrection.

Reviews (6): Last reviewed commit: "style(sdk): format prefix reconstruction..." | Re-trigger Greptile

Comment thread sdk/packages/core/src/extensions/context/basic-compaction.ts Outdated
@robinnewhouse
Copy link
Copy Markdown
Contributor Author

Follow-up #10747 (stacked on this PR) implements Layer A of the hard-guarantee work: widens the MessageBuilder truncation candidate set, ties the aggregate budget to the actual modelInfo.maxInputTokens. Layer B (CLINE-2192, brick-wall enforce-budget pass + status notice + telemetry) stacks on #10747.

@robinnewhouse robinnewhouse force-pushed the robin/cline-2185-compaction-cant-shrink-in-flight-turn branch from d94c054 to 3dae349 Compare May 14, 2026 20:02
robinnewhouse added a commit that referenced this pull request May 14, 2026
…t to maxInputTokens (CLINE-2191)

CLINE-2191 Layer A. After PR #10739 (allowlist) and PR #10740 (protected-tail trim) the preservation set inside the in-flight turn still included the typed user prompt, the last assistant message, and any in-flight tool_use verbatim. Each can individually exceed any context window. This PR closes the common case.

Changes to MessageBuilder: (1) collectTruncationCandidates also collects user text, assistant text, thinking blocks, and top-level file blocks. tool_use input bodies remain untouched here — a JSON-aware structural truncator that avoids corrupting tool_use_id or breaking input JSON shape is Layer B (CLINE-2192). (2) buildForApi accepts an optional maxInputTokens; when present the aggregate budget becomes maxInputTokens * CHARS_PER_TOKEN, otherwise it falls back to the hardcoded 6 MB default so legacy callers keep their behavior. (3) The largest-first sort gains an insertion-order tiebreaker so the output is deterministic across runs (Layer B will rely on this).

Orchestrator: createRuntimeHooks and createRuntimePrepareTurn now thread modelInfo.maxInputTokens through to prepareProviderMessagesForApi -> buildForApi so the cap reflects the model that is actually being called.

Shared package: CHARS_PER_TOKEN was previously private inside @cline/shared/llms/tokens.ts. Exported alongside the existing estimateTokens.

Tests: 7 new cases in message-builder.test.ts. 913 -> 920 @cline/core unit tests pass. typecheck:smoke clean.

This is the preventive layer. It does NOT yet provide the hard guarantee "the outbound request will never exceed the context window." That guarantee lives in CLINE-2192 (Layer B) which stacks on top of this PR.
@robinnewhouse
Copy link
Copy Markdown
Contributor Author

@greptileai review

…ic compaction (CLINE-2185)

CLINE-2136 introduced a deliberate invariant: both runBasicCompaction and runAgenticCompaction refuse to touch the latest turn. That fix prevented orphaned tool_use/tool_result pairs, but it also left the in-flight turn unbounded. A turn whose tool calls accumulated large outputs (editor patches, fetch_web_content, etc.) could push a single request past any context window, which is what CLINE-2183 surfaced on Opus 4.7 against OpenRouter.

PR #10739 widened MessageBuilder.buildForApi's 6 MB allowlist as a triage fix. This PR is the structural follow-up: when the protected tail alone exceeds the compaction target, runBasicCompaction now runs an atomic-pair removal pass on the tail, dropping oldest completed pairs first.

Preservation set inside the tail: (a) the typed user prompt (turn-start), (b) the most recent assistant message, (c) any assistant carrying a tool_use whose tool_result has not yet arrived (in-flight pair). The atomic-pair closure (collectAtomicRemovalIndexes) is reused unchanged.

Implementation notes (deviations from plan.md): (1) Added a new removeTailCandidatesByPredicate that aborts a seed when its atomic closure crosses the preservation set, rather than generic-izing the existing function. Prefix codepath stays byte-identical. (2) Removed the compactable.length === 0 early bail so a transcript that is entirely one in-progress turn still runs the tail trim.

Tests: 5 new compaction.test.ts cases. Existing 27 compaction tests pass unchanged. 913 @cline/core unit tests pass. typecheck:smoke clean.
@robinnewhouse robinnewhouse force-pushed the robin/cline-2185-compaction-cant-shrink-in-flight-turn branch from 3dae349 to 0438d29 Compare May 14, 2026 20:26
robinnewhouse added a commit that referenced this pull request May 14, 2026
…t to maxInputTokens (CLINE-2191)

CLINE-2191 Layer A. After PR #10739 (allowlist) and PR #10740 (protected-tail trim) the preservation set inside the in-flight turn still included the typed user prompt, the last assistant message, and any in-flight tool_use verbatim. Each can individually exceed any context window. This PR closes the common case.

Changes to MessageBuilder: (1) collectTruncationCandidates also collects user text, assistant text, thinking blocks, and top-level file blocks. tool_use input bodies remain untouched here — a JSON-aware structural truncator that avoids corrupting tool_use_id or breaking input JSON shape is Layer B (CLINE-2192). (2) buildForApi accepts an optional maxInputTokens; when present the aggregate budget becomes maxInputTokens * CHARS_PER_TOKEN, otherwise it falls back to the hardcoded 6 MB default so legacy callers keep their behavior. (3) The largest-first sort gains an insertion-order tiebreaker so the output is deterministic across runs (Layer B will rely on this).

Orchestrator: createRuntimeHooks and createRuntimePrepareTurn now thread modelInfo.maxInputTokens through to prepareProviderMessagesForApi -> buildForApi so the cap reflects the model that is actually being called.

Shared package: CHARS_PER_TOKEN was previously private inside @cline/shared/llms/tokens.ts. Exported alongside the existing estimateTokens.

Tests: 7 new cases in message-builder.test.ts. 913 -> 920 @cline/core unit tests pass. typecheck:smoke clean.

This is the preventive layer. It does NOT yet provide the hard guarantee "the outbound request will never exceed the context window." That guarantee lives in CLINE-2192 (Layer B) which stacks on top of this PR.
@robinnewhouse
Copy link
Copy Markdown
Contributor Author

@greptileai review

Comment thread sdk/packages/core/src/extensions/context/basic-compaction.ts
robinnewhouse added a commit that referenced this pull request May 14, 2026
…t to maxInputTokens (CLINE-2191)

CLINE-2191 Layer A. After PR #10739 (allowlist) and PR #10740 (protected-tail trim) the preservation set inside the in-flight turn still included the typed user prompt, the last assistant message, and any in-flight tool_use verbatim. Each can individually exceed any context window. This PR closes the common case.

Changes to MessageBuilder: (1) collectTruncationCandidates also collects user text, assistant text, thinking blocks, and top-level file blocks. tool_use input bodies remain untouched here — a JSON-aware structural truncator that avoids corrupting tool_use_id or breaking input JSON shape is Layer B (CLINE-2192). (2) buildForApi accepts an optional maxInputTokens; when present the aggregate budget becomes maxInputTokens * CHARS_PER_TOKEN, otherwise it falls back to the hardcoded 6 MB default so legacy callers keep their behavior. (3) The largest-first sort gains an insertion-order tiebreaker so the output is deterministic across runs (Layer B will rely on this).

Orchestrator: createRuntimeHooks and createRuntimePrepareTurn now thread modelInfo.maxInputTokens through to prepareProviderMessagesForApi -> buildForApi so the cap reflects the model that is actually being called.

Shared package: CHARS_PER_TOKEN was previously private inside @cline/shared/llms/tokens.ts. Exported alongside the existing estimateTokens.

Tests: 7 new cases in message-builder.test.ts. 913 -> 920 @cline/core unit tests pass. typecheck:smoke clean.

This is the preventive layer. It does NOT yet provide the hard guarantee "the outbound request will never exceed the context window." That guarantee lives in CLINE-2192 (Layer B) which stacks on top of this PR.
@robinnewhouse
Copy link
Copy Markdown
Contributor Author

@greptileai review

robinnewhouse added a commit that referenced this pull request May 14, 2026
…t to maxInputTokens (CLINE-2191)

CLINE-2191 Layer A. After PR #10739 (allowlist) and PR #10740 (protected-tail trim) the preservation set inside the in-flight turn still included the typed user prompt, the last assistant message, and any in-flight tool_use verbatim. Each can individually exceed any context window. This PR closes the common case.

Changes to MessageBuilder: (1) collectTruncationCandidates also collects user text, assistant text, thinking blocks, and top-level file blocks. tool_use input bodies remain untouched here — a JSON-aware structural truncator that avoids corrupting tool_use_id or breaking input JSON shape is Layer B (CLINE-2192). (2) buildForApi accepts an optional maxInputTokens; when present the aggregate budget becomes maxInputTokens * CHARS_PER_TOKEN, otherwise it falls back to the hardcoded 6 MB default so legacy callers keep their behavior. (3) The largest-first sort gains an insertion-order tiebreaker so the output is deterministic across runs (Layer B will rely on this).

Orchestrator: createRuntimeHooks and createRuntimePrepareTurn now thread modelInfo.maxInputTokens through to prepareProviderMessagesForApi -> buildForApi so the cap reflects the model that is actually being called.

Shared package: CHARS_PER_TOKEN was previously private inside @cline/shared/llms/tokens.ts. Exported alongside the existing estimateTokens.

Tests: 7 new cases in message-builder.test.ts. 913 -> 920 @cline/core unit tests pass. typecheck:smoke clean.

This is the preventive layer. It does NOT yet provide the hard guarantee "the outbound request will never exceed the context window." That guarantee lives in CLINE-2192 (Layer B) which stacks on top of this PR.
robinnewhouse added a commit that referenced this pull request May 14, 2026
…t to maxInputTokens (CLINE-2191)

CLINE-2191 Layer A. After PR #10739 (allowlist) and PR #10740 (protected-tail trim) the preservation set inside the in-flight turn still included the typed user prompt, the last assistant message, and any in-flight tool_use verbatim. Each can individually exceed any context window. This PR closes the common case.

Changes to MessageBuilder: (1) collectTruncationCandidates also collects user text, assistant text, thinking blocks, and top-level file blocks. tool_use input bodies remain untouched here — a JSON-aware structural truncator that avoids corrupting tool_use_id or breaking input JSON shape is Layer B (CLINE-2192). (2) buildForApi accepts an optional maxInputTokens; when present the aggregate budget becomes maxInputTokens * CHARS_PER_TOKEN, otherwise it falls back to the hardcoded 6 MB default so legacy callers keep their behavior. (3) The largest-first sort gains an insertion-order tiebreaker so the output is deterministic across runs (Layer B will rely on this).

Orchestrator: createRuntimeHooks and createRuntimePrepareTurn now thread modelInfo.maxInputTokens through to prepareProviderMessagesForApi -> buildForApi so the cap reflects the model that is actually being called.

Shared package: CHARS_PER_TOKEN was previously private inside @cline/shared/llms/tokens.ts. Exported alongside the existing estimateTokens.

Tests: 7 new cases in message-builder.test.ts. 913 -> 920 @cline/core unit tests pass. typecheck:smoke clean.

This is the preventive layer. It does NOT yet provide the hard guarantee "the outbound request will never exceed the context window." That guarantee lives in CLINE-2192 (Layer B) which stacks on top of this PR.
@robinnewhouse
Copy link
Copy Markdown
Contributor Author

@greptileai review

1 similar comment
@robinnewhouse
Copy link
Copy Markdown
Contributor Author

@greptileai review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant