test(inkless:switch): Add invariant tests for sealed partition recovery#609
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new parameterized unit test suite to validate recovery invariants for “classic → diskless switch” partitions across different on-disk high-watermark checkpoint states, ensuring sealed partitions behave correctly after applyDelta.
Changes:
- Introduces parameterized scenarios for sealed leaders to validate high-watermark behavior after restart with stale/partial/fresh checkpoints.
- Introduces parameterized scenarios for sealed followers to validate that a catch-up fetcher is scheduled when the local HW is below the seal offset.
- Adds local helpers to construct a minimal
ReplicaManager, seed logs to a chosen LEO, and persist an on-disk HW checkpoint.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // No switch (-1): partition is sealed but HW is not advanced (no seal offset to target). | ||
| Arguments.of(0L: java.lang.Long, NO_CLASSIC_TO_DISKLESS_START_OFFSET: java.lang.Long, 0L: java.lang.Long), | ||
| Arguments.of(5L: java.lang.Long, NO_CLASSIC_TO_DISKLESS_START_OFFSET: java.lang.Long, 0L: java.lang.Long), |
There was a problem hiding this comment.
I'm not sure about these cases: if classicToDisklessStartOffset == NO_CLASSIC_TO_DISKLESS_START_OFFSET then sealing does not even happen
There was a problem hiding this comment.
Probably not realistic but still valid based on the implementation. If eventually diskless without TS start using local logs as caching, this can become a valid scenario. It can help to document this invariant here.
Adding a comment -- unless we prefer to drop them for now?
There was a problem hiding this comment.
Let's just add a comment
Parameterized tests that verify invariants across a matrix of checkpoint states (stale/partial/fresh) for both leader and follower roles after restart: - Sealed leader: HW must be >= seal offset regardless of checkpoint - Sealed follower: catch-up fetcher must be scheduled when HW < seal These invariants catch bugs where makeLeader reloads a stale HW from the on-disk checkpoint and seal() permanently prevents natural advancement. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…-switch seal states Adds scenarios for CLASSIC_TO_DISKLESS_SWITCH_PENDING (-2) and NO_CLASSIC_TO_DISKLESS_START_OFFSET (-1) to both leader and follower invariant tests, matching the documented test matrix. Key findings codified: - Pending/none seals: partition is sealed but HW is not advanced (the post-restart path only advances HW when sealOffset >= 0) - Pending seal follower: fetcher starts on new leader epoch - No-switch follower: no fetcher needed Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e reference - Assert partition.isLeader == false in follower test to guard against accidental leader assignment making fetcher expectations vacuous - Replace hard-coded "line 3050" with method name reference (applyLocalLeadersDelta) that won't drift with code changes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These cases aren't realistic today but document the defensive behavior of the post-restart path which seals unconditionally. They become relevant if diskless topics eventually use local logs as a read cache. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
a7c7ec0 to
d74e2e7
Compare
…ry (#609) * test(inkless:switch): Add invariant tests for sealed partition recovery Parameterized tests that verify invariants across a matrix of checkpoint states (stale/partial/fresh) for both leader and follower roles after restart: - Sealed leader: HW must be >= seal offset regardless of checkpoint - Sealed follower: catch-up fetcher must be scheduled when HW < seal These invariants catch bugs where makeLeader reloads a stale HW from the on-disk checkpoint and seal() permanently prevents natural advancement. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test(inkless:switch): Expand invariant matrix to cover pending and no-switch seal states Adds scenarios for CLASSIC_TO_DISKLESS_SWITCH_PENDING (-2) and NO_CLASSIC_TO_DISKLESS_START_OFFSET (-1) to both leader and follower invariant tests, matching the documented test matrix. Key findings codified: - Pending/none seals: partition is sealed but HW is not advanced (the post-restart path only advances HW when sealOffset >= 0) - Pending seal follower: fetcher starts on new leader epoch - No-switch follower: no fetcher needed Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test(inkless:switch): Add follower role assertion and fix brittle line reference - Assert partition.isLeader == false in follower test to guard against accidental leader assignment making fetcher expectations vacuous - Replace hard-coded "line 3050" with method name reference (applyLocalLeadersDelta) that won't drift with code changes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test(inkless:switch): Document rationale for no-switch (-1) scenarios These cases aren't realistic today but document the defensive behavior of the post-restart path which seals unconditionally. They become relevant if diskless topics eventually use local logs as a read cache. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…ry (#609) * test(inkless:switch): Add invariant tests for sealed partition recovery Parameterized tests that verify invariants across a matrix of checkpoint states (stale/partial/fresh) for both leader and follower roles after restart: - Sealed leader: HW must be >= seal offset regardless of checkpoint - Sealed follower: catch-up fetcher must be scheduled when HW < seal These invariants catch bugs where makeLeader reloads a stale HW from the on-disk checkpoint and seal() permanently prevents natural advancement. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test(inkless:switch): Expand invariant matrix to cover pending and no-switch seal states Adds scenarios for CLASSIC_TO_DISKLESS_SWITCH_PENDING (-2) and NO_CLASSIC_TO_DISKLESS_START_OFFSET (-1) to both leader and follower invariant tests, matching the documented test matrix. Key findings codified: - Pending/none seals: partition is sealed but HW is not advanced (the post-restart path only advances HW when sealOffset >= 0) - Pending seal follower: fetcher starts on new leader epoch - No-switch follower: no fetcher needed Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test(inkless:switch): Add follower role assertion and fix brittle line reference - Assert partition.isLeader == false in follower test to guard against accidental leader assignment making fetcher expectations vacuous - Replace hard-coded "line 3050" with method name reference (applyLocalLeadersDelta) that won't drift with code changes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test(inkless:switch): Document rationale for no-switch (-1) scenarios These cases aren't realistic today but document the defensive behavior of the post-restart path which seals unconditionally. They become relevant if diskless topics eventually use local logs as a read cache. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Parameterized tests that verify invariants across a matrix of checkpoint states (stale/partial/fresh) for both leader and follower roles after restart:
These invariants catch bugs where makeLeader reloads a stale HW from the on-disk checkpoint and seal() permanently prevents natural advancement.