Skip to content

fix(inkless:switch): bump leader epoch on classic-to-diskless switch#626

Merged
viktorsomogyi merged 1 commit into
mainfrom
jeqo/fix-leader-epoch-on-switch
Jun 3, 2026
Merged

fix(inkless:switch): bump leader epoch on classic-to-diskless switch#626
viktorsomogyi merged 1 commit into
mainfrom
jeqo/fix-leader-epoch-on-switch

Conversation

@jeqo

@jeqo jeqo commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

markClassicToDisklessSwitchStarted emits a PartitionChangeRecord without setting the leader field, so PartitionRegistration.merge() sees NO_LEADER_CHANGE and skips the leaderEpoch bump. Brokers only call makeLeader on epoch changes, leaving the partition stuck in PENDING.

This was latent since the method was introduced, but was masked by BrokerMetadataPublisher.sealExistingLeadersOfTopicsSwitchedToDiskless() which triggered the switch via config-delta scan — independent of epoch. PR #604 (ad647a4) removed that path and moved all switch logic into applyLocalLeadersDelta, which requires isNewLeaderEpoch == true. The controller was never updated to provide it.

Fix: set .setLeader(partition.leader) to force an epoch bump. The broker sees the new epoch, enters makeLeader, seals the log, and registers with InitDisklessLogManager. Also warn when the partition has no leader at switch time — the PENDING state persists and completes once a leader is elected.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes classic-to-diskless topic switches getting stuck in CLASSIC_TO_DISKLESS_SWITCH_PENDING by ensuring the controller emits PartitionChangeRecords that force a leader-epoch bump (which brokers use to enter makeLeader and complete sealing/registration).

Changes:

  • Set PartitionChangeRecord.leader in markClassicToDisklessSwitchStarted so PartitionRegistration.merge() bumps leaderEpoch.
  • Add a warning when a partition has NO_LEADER at switch time (switch stays pending until a leader is elected).
  • Extend the controller unit test to validate that emitted records include a leader and that leaderEpoch increments after replay.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java Forces leaderEpoch bump by setting leader on the emitted PartitionChangeRecord; logs when no leader exists.
metadata/src/test/java/org/apache/kafka/controller/ReplicationControlManagerTest.java Asserts leader is set on the emitted record(s) and validates the leaderEpoch bump after replay.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

markClassicToDisklessSwitchStarted emits a PartitionChangeRecord without
setting the leader field, so PartitionRegistration.merge() sees
NO_LEADER_CHANGE and skips the leaderEpoch bump. Brokers only call
makeLeader on epoch changes, leaving the partition stuck in PENDING.

This was latent since the method was introduced, but was masked by
BrokerMetadataPublisher.sealExistingLeadersOfTopicsSwitchedToDiskless()
which triggered the switch via config-delta scan — independent of epoch.
PR #604 (ad647a4) removed that path and moved all switch logic into
applyLocalLeadersDelta, which requires isNewLeaderEpoch == true. The
controller was never updated to provide it.

Fix: set .setLeader(partition.leader) to force an epoch bump. The broker
sees the new epoch, enters makeLeader, seals the log, and registers with
InitDisklessLogManager. Also warn when the partition has no leader at
switch time — the PENDING state persists and completes once a leader is
elected.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jeqo jeqo force-pushed the jeqo/fix-leader-epoch-on-switch branch from 626d6f2 to 98f77b0 Compare June 2, 2026 17:35
@jeqo jeqo requested a review from Copilot June 2, 2026 17:36

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

@jeqo jeqo marked this pull request as ready for review June 2, 2026 17:40
@jeqo jeqo requested a review from viktorsomogyi June 2, 2026 17:41

@viktorsomogyi viktorsomogyi left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! And it is actually needed for my leader epoch issue as well 🙂
(LGTM, didn't find any issues, this is a small and focused PR)

@viktorsomogyi viktorsomogyi merged commit 709c5bf into main Jun 3, 2026
10 checks passed
@viktorsomogyi viktorsomogyi deleted the jeqo/fix-leader-epoch-on-switch branch June 3, 2026 08:14
giuseppelillo pushed a commit that referenced this pull request Jun 4, 2026
…626)

markClassicToDisklessSwitchStarted emits a PartitionChangeRecord without
setting the leader field, so PartitionRegistration.merge() sees
NO_LEADER_CHANGE and skips the leaderEpoch bump. Brokers only call
makeLeader on epoch changes, leaving the partition stuck in PENDING.

This was latent since the method was introduced, but was masked by
BrokerMetadataPublisher.sealExistingLeadersOfTopicsSwitchedToDiskless()
which triggered the switch via config-delta scan — independent of epoch.
PR #604 (ad647a4) removed that path and moved all switch logic into
applyLocalLeadersDelta, which requires isNewLeaderEpoch == true. The
controller was never updated to provide it.

Fix: set .setLeader(partition.leader) to force an epoch bump. The broker
sees the new epoch, enters makeLeader, seals the log, and registers with
InitDisklessLogManager. Also warn when the partition has no leader at
switch time — the PENDING state persists and completes once a leader is
elected.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
gqmelo pushed a commit that referenced this pull request Jun 4, 2026
…626)

markClassicToDisklessSwitchStarted emits a PartitionChangeRecord without
setting the leader field, so PartitionRegistration.merge() sees
NO_LEADER_CHANGE and skips the leaderEpoch bump. Brokers only call
makeLeader on epoch changes, leaving the partition stuck in PENDING.

This was latent since the method was introduced, but was masked by
BrokerMetadataPublisher.sealExistingLeadersOfTopicsSwitchedToDiskless()
which triggered the switch via config-delta scan — independent of epoch.
PR #604 (ad647a4) removed that path and moved all switch logic into
applyLocalLeadersDelta, which requires isNewLeaderEpoch == true. The
controller was never updated to provide it.

Fix: set .setLeader(partition.leader) to force an epoch bump. The broker
sees the new epoch, enters makeLeader, seals the log, and registers with
InitDisklessLogManager. Also warn when the partition has no leader at
switch time — the PENDING state persists and completes once a leader is
elected.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants