Skip to content

test(inkless:switch): Add invariant tests for sealed partition recovery#609

Merged
giuseppelillo merged 4 commits into
mainfrom
jeqo/topic-switch-invariant
May 22, 2026
Merged

test(inkless:switch): Add invariant tests for sealed partition recovery#609
giuseppelillo merged 4 commits into
mainfrom
jeqo/topic-switch-invariant

Conversation

@jeqo

@jeqo jeqo commented May 22, 2026

Copy link
Copy Markdown
Contributor

Parameterized tests that verify invariants across a matrix of checkpoint states (stale/partial/fresh) for both leader and follower roles after restart:

  • Sealed leader: HW must be >= seal offset regardless of checkpoint
  • Sealed follower: catch-up fetcher must be scheduled when HW < seal

These invariants catch bugs where makeLeader reloads a stale HW from the on-disk checkpoint and seal() permanently prevents natural advancement.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new parameterized unit test suite to validate recovery invariants for “classic → diskless switch” partitions across different on-disk high-watermark checkpoint states, ensuring sealed partitions behave correctly after applyDelta.

Changes:

  • Introduces parameterized scenarios for sealed leaders to validate high-watermark behavior after restart with stale/partial/fresh checkpoints.
  • Introduces parameterized scenarios for sealed followers to validate that a catch-up fetcher is scheduled when the local HW is below the seal offset.
  • Adds local helpers to construct a minimal ReplicaManager, seed logs to a chosen LEO, and persist an on-disk HW checkpoint.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread core/src/test/scala/unit/kafka/server/DisklessSwitchInvariantsTest.scala Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

Comment thread core/src/test/scala/unit/kafka/server/DisklessSwitchInvariantsTest.scala Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comment thread core/src/test/scala/unit/kafka/server/DisklessSwitchInvariantsTest.scala Outdated
@jeqo jeqo marked this pull request as ready for review May 22, 2026 12:27
@jeqo jeqo requested a review from giuseppelillo May 22, 2026 12:35
Comment on lines +279 to +281
// No switch (-1): partition is sealed but HW is not advanced (no seal offset to target).
Arguments.of(0L: java.lang.Long, NO_CLASSIC_TO_DISKLESS_START_OFFSET: java.lang.Long, 0L: java.lang.Long),
Arguments.of(5L: java.lang.Long, NO_CLASSIC_TO_DISKLESS_START_OFFSET: java.lang.Long, 0L: java.lang.Long),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about these cases: if classicToDisklessStartOffset == NO_CLASSIC_TO_DISKLESS_START_OFFSET then sealing does not even happen

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not realistic but still valid based on the implementation. If eventually diskless without TS start using local logs as caching, this can become a valid scenario. It can help to document this invariant here.

Adding a comment -- unless we prefer to drop them for now?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just add a comment

jeqo and others added 4 commits May 22, 2026 16:07
Parameterized tests that verify invariants across a matrix of
checkpoint states (stale/partial/fresh) for both leader and follower
roles after restart:

- Sealed leader: HW must be >= seal offset regardless of checkpoint
- Sealed follower: catch-up fetcher must be scheduled when HW < seal

These invariants catch bugs where makeLeader reloads a stale HW from
the on-disk checkpoint and seal() permanently prevents natural
advancement.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…-switch seal states

Adds scenarios for CLASSIC_TO_DISKLESS_SWITCH_PENDING (-2) and
NO_CLASSIC_TO_DISKLESS_START_OFFSET (-1) to both leader and follower
invariant tests, matching the documented test matrix.

Key findings codified:
- Pending/none seals: partition is sealed but HW is not advanced
  (the post-restart path only advances HW when sealOffset >= 0)
- Pending seal follower: fetcher starts on new leader epoch
- No-switch follower: no fetcher needed

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e reference

- Assert partition.isLeader == false in follower test to guard against
  accidental leader assignment making fetcher expectations vacuous
- Replace hard-coded "line 3050" with method name reference
  (applyLocalLeadersDelta) that won't drift with code changes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These cases aren't realistic today but document the defensive behavior
of the post-restart path which seals unconditionally. They become
relevant if diskless topics eventually use local logs as a read cache.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jeqo jeqo force-pushed the jeqo/topic-switch-invariant branch from a7c7ec0 to d74e2e7 Compare May 22, 2026 13:15
@giuseppelillo giuseppelillo merged commit 81cc74c into main May 22, 2026
4 checks passed
@giuseppelillo giuseppelillo deleted the jeqo/topic-switch-invariant branch May 22, 2026 13:49
giuseppelillo pushed a commit that referenced this pull request May 29, 2026
…ry (#609)

* test(inkless:switch): Add invariant tests for sealed partition recovery

Parameterized tests that verify invariants across a matrix of
checkpoint states (stale/partial/fresh) for both leader and follower
roles after restart:

- Sealed leader: HW must be >= seal offset regardless of checkpoint
- Sealed follower: catch-up fetcher must be scheduled when HW < seal

These invariants catch bugs where makeLeader reloads a stale HW from
the on-disk checkpoint and seal() permanently prevents natural
advancement.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test(inkless:switch): Expand invariant matrix to cover pending and no-switch seal states

Adds scenarios for CLASSIC_TO_DISKLESS_SWITCH_PENDING (-2) and
NO_CLASSIC_TO_DISKLESS_START_OFFSET (-1) to both leader and follower
invariant tests, matching the documented test matrix.

Key findings codified:
- Pending/none seals: partition is sealed but HW is not advanced
  (the post-restart path only advances HW when sealOffset >= 0)
- Pending seal follower: fetcher starts on new leader epoch
- No-switch follower: no fetcher needed

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test(inkless:switch): Add follower role assertion and fix brittle line reference

- Assert partition.isLeader == false in follower test to guard against
  accidental leader assignment making fetcher expectations vacuous
- Replace hard-coded "line 3050" with method name reference
  (applyLocalLeadersDelta) that won't drift with code changes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test(inkless:switch): Document rationale for no-switch (-1) scenarios

These cases aren't realistic today but document the defensive behavior
of the post-restart path which seals unconditionally. They become
relevant if diskless topics eventually use local logs as a read cache.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
giuseppelillo pushed a commit that referenced this pull request May 29, 2026
…ry (#609)

* test(inkless:switch): Add invariant tests for sealed partition recovery

Parameterized tests that verify invariants across a matrix of
checkpoint states (stale/partial/fresh) for both leader and follower
roles after restart:

- Sealed leader: HW must be >= seal offset regardless of checkpoint
- Sealed follower: catch-up fetcher must be scheduled when HW < seal

These invariants catch bugs where makeLeader reloads a stale HW from
the on-disk checkpoint and seal() permanently prevents natural
advancement.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test(inkless:switch): Expand invariant matrix to cover pending and no-switch seal states

Adds scenarios for CLASSIC_TO_DISKLESS_SWITCH_PENDING (-2) and
NO_CLASSIC_TO_DISKLESS_START_OFFSET (-1) to both leader and follower
invariant tests, matching the documented test matrix.

Key findings codified:
- Pending/none seals: partition is sealed but HW is not advanced
  (the post-restart path only advances HW when sealOffset >= 0)
- Pending seal follower: fetcher starts on new leader epoch
- No-switch follower: no fetcher needed

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test(inkless:switch): Add follower role assertion and fix brittle line reference

- Assert partition.isLeader == false in follower test to guard against
  accidental leader assignment making fetcher expectations vacuous
- Replace hard-coded "line 3050" with method name reference
  (applyLocalLeadersDelta) that won't drift with code changes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test(inkless:switch): Document rationale for no-switch (-1) scenarios

These cases aren't realistic today but document the defensive behavior
of the post-restart path which seals unconditionally. They become
relevant if diskless topics eventually use local logs as a read cache.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants