Skip to content

Use cluster default shard transfer method for fallback#9120

Merged
generall merged 3 commits into
devfrom
fix-shard-transfer-fallback-default-method
May 21, 2026
Merged

Use cluster default shard transfer method for fallback#9120
generall merged 3 commits into
devfrom
fix-shard-transfer-fallback-default-method

Conversation

@qdrant-cloud-bot

Copy link
Copy Markdown
Contributor

What

When a WalDelta automatic transfer fails, the driver falls back to the method passed in fallback_method (see transfer_shard_fallback_default in lib/collection/src/shards/transfer/driver.rs). Until now this was hard-coded in Collection::send_shard to:

let fallback_method = if self.is_prevent_unoptimized().await {
    ShardTransferMethod::Snapshot
} else {
    ShardTransferMethod::StreamRecords
};

This is inconsistent with the change in #8784 that made Snapshot the default transfer method for Qdrant 1.18.0+, and it also ignores any cluster-level default_shard_transfer_method override. As a result, when a WAL delta auto-recovery fails (e.g. recovery point has no clocks to resolve delta for), users on a clean 1.18 install see:

WARN collection::shards::transfer::driver: Failed to do shard diff transfer, falling back to default method StreamRecords: ...

…even though the cluster default is Snapshot.

Change

Use Collection::default_shard_transfer_method() for the fallback. This returns the configured default_shard_transfer_method or Snapshot for 1.18.0+.

prevent_unoptimized still pins to Snapshot to preserve deferred point state exactly (raw segment copy); StreamRecords would send deferred points but they would not be deferred on the target.

The existing safety in transfer_shard_fallback_default already refuses to fall back to the same method, so the case of WalDelta → fallback WalDelta is correctly handled (it aborts the fallback).

Test plan

  • CI green
  • Manual: trigger an automatic recovery on a 1.18 cluster with a WAL gap; confirm the fallback log now says Snapshot (or the configured default) instead of StreamRecords

All Submissions:

  • Contributions should target the dev branch.

Made with Cursor

Cursor Agent and others added 2 commits May 21, 2026 13:04
When a WAL delta automatic transfer fails, the driver falls back to the
method passed via `fallback_method`. This was hard-coded to
`StreamRecords` (unless `prevent_unoptimized` was enabled), which is
inconsistent with the 1.18.0+ default of `Snapshot` and ignores any
configured `default_shard_transfer_method`.

Use `Collection::default_shard_transfer_method()` instead, so the
fallback matches the cluster default. With `prevent_unoptimized` we
still pin to `Snapshot` to preserve deferred point state exactly (raw
segment copy); stream_records would send deferred points but they
would not be deferred on the target.

Co-authored-by: Cursor <cursoragent@cursor.com>
If the cluster default transfer method is wal_delta, the same-method
fallback would be refused by the driver. Use snapshot as a safe fallback
in that case; snapshot is also the 1.18.0+ default.

Update test_shard_wal_delta_transfer_fallback to assert the new
snapshot fallback (was stream_records).

Co-authored-by: Cursor <cursoragent@cursor.com>
coderabbitai[bot]

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai Bot May 21, 2026
Co-authored-by: Cursor <cursoragent@cursor.com>
@qdrant qdrant deleted a comment from coderabbitai Bot May 21, 2026
@timvisee timvisee added bug Something isn't working release:1.18.1 labels May 21, 2026
@generall generall merged commit c735c11 into dev May 21, 2026
15 checks passed
@generall generall deleted the fix-shard-transfer-fallback-default-method branch May 21, 2026 14:27
generall pushed a commit that referenced this pull request May 22, 2026
* Use cluster default shard transfer method for fallback

When a WAL delta automatic transfer fails, the driver falls back to the
method passed via `fallback_method`. This was hard-coded to
`StreamRecords` (unless `prevent_unoptimized` was enabled), which is
inconsistent with the 1.18.0+ default of `Snapshot` and ignores any
configured `default_shard_transfer_method`.

Use `Collection::default_shard_transfer_method()` instead, so the
fallback matches the cluster default. With `prevent_unoptimized` we
still pin to `Snapshot` to preserve deferred point state exactly (raw
segment copy); stream_records would send deferred points but they
would not be deferred on the target.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Avoid wal_delta fallback, update fallback test

If the cluster default transfer method is wal_delta, the same-method
fallback would be refused by the driver. Use snapshot as a safe fallback
in that case; snapshot is also the 1.18.0+ default.

Update test_shard_wal_delta_transfer_fallback to assert the new
snapshot fallback (was stream_records).

Co-authored-by: Cursor <cursoragent@cursor.com>

* Address clippy wildcard_enum_match_arm

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor Agent <agent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working release:1.18.1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants