Skip to content

feat(diskless): add managed replicas routing to metadata transformer#504

Draft
jeqo wants to merge 3 commits intojeqo/pod-2001-controller-topic-metricsfrom
jeqo/pod-2121-transformer-replicas
Draft

feat(diskless): add managed replicas routing to metadata transformer#504
jeqo wants to merge 3 commits intojeqo/pod-2001-controller-topic-metricsfrom
jeqo/pod-2121-transformer-replicas

Conversation

@jeqo
Copy link
Contributor

@jeqo jeqo commented Feb 9, 2026

Adds AZ-aware routing support for diskless topics with managed replicas (RF > 1). The transformer now routes metadata requests based on replica assignments and remote storage configuration.

Changes

  • Refactor: Change MetadataView.getAliveBrokerNodes() to return List<Node> for consistent API
  • Routing Logic: Add isRemoteStorageEnabled check to determine routing constraints:
    • Tiered mode (remote storage enabled): same-AZ replica → cross-AZ replica → unavailable
    • Diskless-only mode: same-AZ replica → same-AZ any broker → cross-AZ replica → cross-AZ any broker
  • Metrics: Add transformer metrics for observability:
    • fallback-total: Count of fallbacks to non-replica brokers
    • offline-replicas-routed-around: Routing decisions when some replicas are offline
    • cross-az-routing-total: Requests routed to different AZ than client

Routing Priority

Mode Priority
Tiered same-AZ replica → cross-AZ replica → unavailable
Diskless-only same-AZ replica → same-AZ any broker → cross-AZ replica → cross-AZ any broker

For unmanaged replicas (RF = 1), the legacy hash-based selection from all brokers is preserved.

Test plan

  • Unit tests for managed replica routing logic
  • Unit tests for tiered vs diskless-only mode behavior
  • Unit tests for transformer metrics
  • Manual testing with Docker compose (3-AZ cluster)

@jeqo jeqo force-pushed the jeqo/pod-2001-controller-topic-metrics branch from 6ddf9d8 to 742bb4f Compare February 9, 2026 14:48
@jeqo jeqo force-pushed the jeqo/pod-2121-transformer-replicas branch from deb173b to 2d516d0 Compare February 9, 2026 14:48
@jeqo jeqo force-pushed the jeqo/pod-2001-controller-topic-metrics branch from 742bb4f to 4de2bb4 Compare February 9, 2026 19:36
jeqo added 3 commits February 9, 2026 21:37
… List

Change return type from Iterable<Node> to List<Node> for simpler
downstream usage.
The underlying KRaftMetadataCache already returns a List, so this
removes unnecessary abstraction.
Add support for managed replicas (RF > 1) in diskless topic metadata
transformation:

- Add MetadataView.isRemoteStorageEnabled() to check topic tiering
config
- For managed replicas, routing priority depends on remote storage:
  - Tiered (remote storage enabled): same-AZ replica > cross-AZ replica
> unavailable
  - Diskless-only: same-AZ replica > same-AZ any broker > cross-AZ
replica > cross-AZ any broker
- For unmanaged replicas (RF=1), preserve legacy hash-based selection

Update existing tests to use RF=1 to preserve original unmanaged
behavior.
Add metrics to track routing behavior in diskless topic metadata
transformation:

- fallback-total: Count of fallbacks to non-replica brokers
(diskless-only)
- offline-replicas-routed-around: Routing decisions when some replicas
are offline
- cross-az-routing-total: Requests routed to a different AZ than the
client

These metrics help operators monitor:
- How often routing falls back from assigned replicas to any available
broker
- Impact of replica availability on routing decisions
- Cross-AZ traffic patterns for capacity planning
@jeqo jeqo force-pushed the jeqo/pod-2121-transformer-replicas branch from 2d516d0 to 8763a1f Compare February 9, 2026 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant