Skip to content

fix: reject source-superset schema mismatch at SegmentBuilder#9110

Merged
generall merged 5 commits into
devfrom
fix/serialize-optimize-schema-change
May 21, 2026
Merged

fix: reject source-superset schema mismatch at SegmentBuilder#9110
generall merged 5 commits into
devfrom
fix/serialize-optimize-schema-change

Conversation

@agourlay

@agourlay agourlay commented May 20, 2026

Copy link
Copy Markdown
Member

Alternative to #9053.

Bug

SegmentBuilder::update iterates the target's vector_data and ignores any source vector not in target. Combined with the optimizer-vs-CreateVectorName(V) race, this silently drops V's data and produces a merged segment without V at version ≥ V_opnum.

The next optimization round picks that segment as a source and fails with ServiceError "missing vector name V", flipping the shard to RED until manual intervention. The symmetric DeleteVectorName race triggers the existing "missing vector name" check, also as ServiceError.

What this PR does

  • Adds a source-superset check at the start of SegmentBuilder::update so the corrupting merge cannot occur.
  • Changes both the new check and the pre-existing symmetric check to return OperationError::Cancelled instead of ServiceError.

Behavior change

Race Before After
CreateVectorName(V) mid-optimization Silent drop → broken segment C → future optimizations fail with ServiceError → RED status, stuck. New check aborts the optimization with Cancelled before C is produced. Debug log, no RED. Next round with refreshed config merges cleanly.
DeleteVectorName(V) mid-optimization Existing check fires with ServiceError → RED, stuck. Same check returns Cancelled. Debug log, no RED. Next round merges cleanly.

What this does not do

  • Does not fix the race itself. The optimizer's frozen target_config can still diverge from source segments. We catch the divergence at the merge boundary instead of letting it produce a broken segment.
  • Does not heal pre-existing damage. Segments already broken on disk from prior versions of the bug stay broken; this fix is forward-only. fix: segment merge tolerates source missing a named vector #9053's tolerance approach is what would heal them.

Trade-off vs #9053: this PR rejects the mismatch and relies on recreate_optimizers_blocking to make the retry succeed. Smaller surface, no new code path in the builder. If field damage from past versions is a concern, #9053 (or its tolerance behavior on top of this PR as a one-release heal-on-upgrade) is the better fit.

Why not a lock (Tim's suggestion in #9053)

Tried it; reverted. Two reasons:

  1. Only closed the window where V arrived during proxy install — not where V's apply_segments completed before the optimizer's lock acquisition.
  2. Deadlocked test_continuous_snapshot. snapshot::proxy_all_segments_and_apply holds upgradable_read for the snapshot duration and only takes acquire_updates_lock tactically inside; the lock extension required the opposite order. Reversing snapshot's order would hold updates_lock for the entire snapshot, blocking writes — not viable.

The schema check catches both race windows and has no lock-order surface.

Tests

  • test_segment_builder_rejects_target_with_extra_vector_name — target has a vector source lacks (existing path, included for symmetry).
  • test_segment_builder_rejects_source_with_extra_vector_name — the new check.

Test plan

  • cargo check --workspace --all-targets
  • cargo test -p segment --test integration (122/122)
  • cargo test -p shard --lib (72/72)
  • cargo test -p collection --lib (197/197)
  • cargo test -p collection --test integration test_continuous_snapshot (was deadlocked with the lock approach; passes now)

🤖 Generated with Claude Code

execute_optimization captures `target_config` from the optimizer's frozen
config and then wraps source segments in proxies. Between those two points,
`CollectionUpdater::update` can apply a `CreateVectorName(V)` to the source
segments via `apply_segments`, leaving the optimizer with sources that have
V but a target_config that does not. The optimization then produces a
merged segment without V, and a follow-up optimization (running with the
refreshed config that includes V) fails to use that segment as a source:
"Cannot update from other segment because it is missing vector name X".

Close the race by extending the scope of the existing
`LockedSegmentHolder::acquire_updates_lock` to cover the proxy install
window. `CollectionUpdater::update` already takes this lock before
processing any shard update, so concurrent writers wait until proxies are
in place — at which point further mutations hit the proxies (recorded as
intent and propagated to the merged segment in `finish_optimization`)
instead of the originals. The guard is dropped right after proxy install so
the slow build phase does not extend it.

Tests:
- Three `SegmentBuilder::update` tests document the precondition the lock
  now guarantees: with a target schema that adds a named vector the source
  lacks, update errors with "missing vector name X". Quantized and
  mixed-source variants exercise the same error path.
- `test_optimize_blocks_proxy_install_on_updates_lock` asserts the
  invariant directly: while the updates lock is held, proxies are not yet
  installed. Verified to fail when the new guard is removed (otherwise it
  passes because `finish_optimization` also takes the same lock, so a
  naive "did optimize finish?" check would not catch a missing proxy-install
  guard).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds vector schema validation to SegmentBuilder::update: it first scans source segments' vector storage keys and aborts with OperationError::cancelled if any source includes a vector name missing from the target schema. It also changes per-vector lookup failures (missing vector on a source) to return OperationError::cancelled with a retry-after-optimizer-config-refresh message. Two integration tests cover rejection when the target has an extra vector and when the source has an extra vector.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: rejecting source-superset schema mismatches in SegmentBuilder, which is the core fix described throughout the PR.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The pull request description clearly explains the bug, the specific fix implemented, the behavior changes, test coverage, and design trade-offs with alternative approaches.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/serialize-optimize-schema-change

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
lib/segment/tests/integration/segment_builder_test.rs (1)

674-674: ⚡ Quick win

Use explicit point-id conversion at Line 674.

Replace the implicit .into() with an explicit type conversion to match repository Rust style in new code.

Suggested diff
-            .upsert_point(10 + i, (100 + i).into(), vectors, &hw_counter)
+            .upsert_point(
+                10 + i,
+                segment::types::PointIdType::from(100 + i),
+                vectors,
+                &hw_counter,
+            )

As per coding guidelines: "**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/segment/tests/integration/segment_builder_test.rs` at line 674, The call
to upsert_point uses an implicit conversion `(100 + i).into()` for the point id;
replace that with an explicit from conversion using the crate's point-id type
(e.g., PointId::from(100 + i) or the actual PointId type used in this module) so
the call becomes upsert_point(10 + i, PointIdType::from(100 + i), vectors,
&hw_counter); update the type name to the concrete point-id type used in this
crate and ensure imports/paths are adjusted accordingly.
lib/collection/src/collection_manager/optimizers/indexing_optimizer.rs (1)

827-831: 💤 Low value

Consider adding a comment about timing sensitivity for future maintainers.

The 250ms sleep is a reasonable approach for this concurrency test, but could be flaky on slow or heavily-loaded CI runners. If this test becomes unreliable, consider either increasing the sleep duration or adding a brief comment noting the timing dependency.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/collection/src/collection_manager/optimizers/indexing_optimizer.rs`
around lines 827 - 831, Add a short explanatory comment above the
std::thread::sleep(Duration::from_millis(250)) in the concurrency test that
explains this 250ms wait is timing-sensitive and may be flaky on slow CI, and
suggest options for future maintainers (e.g., increase the duration, replace
with a synchronization primitive, or tune for CI). Reference the optimize_done
atomic and the assert! that follows (which checks
optimize_done.load(Ordering::SeqCst)) so readers know the sleep exists to avoid
the race that would make the assert flaky.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@lib/collection/src/collection_manager/optimizers/indexing_optimizer.rs`:
- Around line 827-831: Add a short explanatory comment above the
std::thread::sleep(Duration::from_millis(250)) in the concurrency test that
explains this 250ms wait is timing-sensitive and may be flaky on slow CI, and
suggest options for future maintainers (e.g., increase the duration, replace
with a synchronization primitive, or tune for CI). Reference the optimize_done
atomic and the assert! that follows (which checks
optimize_done.load(Ordering::SeqCst)) so readers know the sleep exists to avoid
the race that would make the assert flaky.

In `@lib/segment/tests/integration/segment_builder_test.rs`:
- Line 674: The call to upsert_point uses an implicit conversion `(100 +
i).into()` for the point id; replace that with an explicit from conversion using
the crate's point-id type (e.g., PointId::from(100 + i) or the actual PointId
type used in this module) so the call becomes upsert_point(10 + i,
PointIdType::from(100 + i), vectors, &hw_counter); update the type name to the
concrete point-id type used in this crate and ensure imports/paths are adjusted
accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ff67ebad-d708-433e-935b-e16ce79f06ef

📥 Commits

Reviewing files that changed from the base of the PR and between 76bb521 and 2e8da6c.

📒 Files selected for processing (3)
  • lib/collection/src/collection_manager/optimizers/indexing_optimizer.rs
  • lib/segment/tests/integration/segment_builder_test.rs
  • lib/shard/src/optimize.rs

Address review of #9110:

1. `finish_optimization` was acquiring `upgradable_read` before
   `acquire_updates_lock`, while the new guard at the start of
   `execute_optimization` acquires them in the reverse order. With two
   optimizer threads in flight, thread A in `finish_optimization` could
   hold `upgradable_read` and wait on `updates_lock` while thread B at the
   top of `execute_optimization` held `updates_lock` and waited on
   `upgradable_read` (parking_lot allows only one upgradable reader),
   deadlocking. Swap `finish_optimization` to take `updates_lock` first so
   both halves agree.

2. Drop the quantized and mixed-source variants of the inverted
   `SegmentBuilder` unit test — all three asserted the same error path
   (the mismatch check fires before quantization training or per-source
   branching), so only one is useful as documentation of the precondition.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
lib/shard/src/optimize.rs (1)

695-705: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Don't hold updates_lock across preflight work and callbacks.

Line 704 now blocks collection updates until Line 812, and that window includes check_segments_size() plus the caller-provided on_successful_start(). That widens the "serialize proxy install" critical section to arbitrary I/O/callback latency, and the callback can deadlock if it re-enters a path that also needs the same lock. Please narrow the locked region to the schema/COW/proxy-install steps, or at least move the callback outside the guard.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/shard/src/optimize.rs` around lines 695 - 705, The updates_lock (held via
segment_holder.acquire_updates_lock() in proxy_install_guard) is being held too
long — across check_segments_size() and the caller-provided
on_successful_start() — which can block CollectionUpdater::update and cause
deadlocks; release the proxy_install_guard immediately after performing only the
schema/COW/proxy-install steps that must be serialized (the actual proxy
creation/installation and any copy-on-write adjustments), and move
check_segments_size() and the on_successful_start() callback to execute after
the guard is dropped so the critical section only encloses the minimal
proxy-installation work (ensure finish_optimization behavior remains unchanged).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@lib/shard/src/optimize.rs`:
- Around line 695-705: The updates_lock (held via
segment_holder.acquire_updates_lock() in proxy_install_guard) is being held too
long — across check_segments_size() and the caller-provided
on_successful_start() — which can block CollectionUpdater::update and cause
deadlocks; release the proxy_install_guard immediately after performing only the
schema/COW/proxy-install steps that must be serialized (the actual proxy
creation/installation and any copy-on-write adjustments), and move
check_segments_size() and the on_successful_start() callback to execute after
the guard is dropped so the critical section only encloses the minimal
proxy-installation work (ensure finish_optimization behavior remains unchanged).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0f8950a4-7095-49fa-8d17-3ca5c2c9ae85

📥 Commits

Reviewing files that changed from the base of the PR and between 2e8da6c and a7ccc0f.

📒 Files selected for processing (2)
  • lib/segment/tests/integration/segment_builder_test.rs
  • lib/shard/src/optimize.rs
💤 Files with no reviewable changes (1)
  • lib/segment/tests/integration/segment_builder_test.rs

Drop the lock approach (deadlocked test_continuous_snapshot) and fix the bug
at the merge layer instead.

Snapshot's proxy_all_segments_and_apply acquires the segment_holder
upgradable_read first and then takes acquire_updates_lock tactically inside
the snapshot operation. The previous commits' lock-extension acquired
updates_lock before upgradable_read, so a snapshot in flight and an
optimization just entering execute_optimization could deadlock holding each
other's required next lock. Snapshot cannot easily reverse its order — that
would hold updates_lock for the entire snapshot duration, blocking all
writes.

Move the fix to where the actual harm happens: SegmentBuilder::update
iterates the target's vector_data and silently drops source vectors that
aren't in target. That silent drop is what produces the broken merged
segment in the CreateVectorName-vs-optimizer race. Add a check that every
source vector name is in the target schema; the optimization aborts cleanly
on mismatch and the next round (with refreshed config) merges correctly.

This is strictly stronger than the lock: the lock only closed the window
where V arrived *during* the proxy-install region. The schema check catches
both that window and the window where V's apply_segments completed before
the optimizer's lock acquisition.

Diff is contained to lib/segment; no locking changes, no cross-crate
plumbing. test_continuous_snapshot passes again.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@agourlay agourlay changed the title fix: serialize optimization proxy install against shard updates fix: reject source-superset schema mismatch at SegmentBuilder May 20, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
lib/segment/tests/integration/segment_builder_test.rs (1)

625-625: ⚡ Quick win

Use explicit from conversion instead of .into() for point IDs.

Line 625 uses implicit .into() when converting the point ID. Replace with explicit ExtendedPointId::from(100 + i) for consistency with the coding guidelines.

Proposed change
-            .upsert_point(10 + i, (100 + i).into(), vectors, &hw_counter)
+            .upsert_point(10 + i, segment::types::ExtendedPointId::from(100 + i), vectors, &hw_counter)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/segment/tests/integration/segment_builder_test.rs` at line 625, Replace
the implicit .into() conversion used for point IDs in the upsert call with an
explicit ExtendedPointId::from(...) to follow coding guidelines: locate the
upsert_point call (the .upsert_point(10 + i, (100 + i).into(), vectors,
&hw_counter) invocation) and change the second argument to
ExtendedPointId::from(100 + i) so the point ID conversion is explicit and
consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@lib/segment/tests/integration/segment_builder_test.rs`:
- Line 625: Replace the implicit .into() conversion used for point IDs in the
upsert call with an explicit ExtendedPointId::from(...) to follow coding
guidelines: locate the upsert_point call (the .upsert_point(10 + i, (100 +
i).into(), vectors, &hw_counter) invocation) and change the second argument to
ExtendedPointId::from(100 + i) so the point ID conversion is explicit and
consistent.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 18900485-9dfc-4aa8-ab2b-2f21d423694e

📥 Commits

Reviewing files that changed from the base of the PR and between a7ccc0f and 937423c.

📒 Files selected for processing (2)
  • lib/segment/src/segment_constructor/segment_builder.rs
  • lib/segment/tests/integration/segment_builder_test.rs

agourlay and others added 2 commits May 20, 2026 13:11
…a mismatch

ServiceError flips the shard to RED status (via `report_optimizer_error` →
`segments.optimizer_errors`) and stays sticky until the next
`recreate_optimizers_blocking` clears it. That's the right shape for
hardware/IO failures but wrong for the schema-mismatch case here, which is
an expected, recoverable race outcome — the next optimizer round with a
refreshed target_config merges the same originals cleanly.

`Cancelled` is the variant the optimization worker treats as a recoverable
cancellation: logged at debug, tracker marked Cancelled, no
`report_optimizer_error` call, no RED status.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rset error

The existing "missing vector name" check at the start of the merge loop
also fires during a race — specifically the optimizer-vs-DeleteVectorName
shape, where V is removed from originals before J wraps proxies but J's
frozen target_config still has V. Like the new source-superset check, this
is an expected, recoverable race outcome, so use Cancelled instead of
ServiceError to avoid flipping the shard to RED.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@generall generall merged commit 386e222 into dev May 21, 2026
15 checks passed
@generall generall deleted the fix/serialize-optimize-schema-change branch May 21, 2026 15:25
generall pushed a commit that referenced this pull request May 22, 2026
* fix: serialize optimization proxy install against shard updates

execute_optimization captures `target_config` from the optimizer's frozen
config and then wraps source segments in proxies. Between those two points,
`CollectionUpdater::update` can apply a `CreateVectorName(V)` to the source
segments via `apply_segments`, leaving the optimizer with sources that have
V but a target_config that does not. The optimization then produces a
merged segment without V, and a follow-up optimization (running with the
refreshed config that includes V) fails to use that segment as a source:
"Cannot update from other segment because it is missing vector name X".

Close the race by extending the scope of the existing
`LockedSegmentHolder::acquire_updates_lock` to cover the proxy install
window. `CollectionUpdater::update` already takes this lock before
processing any shard update, so concurrent writers wait until proxies are
in place — at which point further mutations hit the proxies (recorded as
intent and propagated to the merged segment in `finish_optimization`)
instead of the originals. The guard is dropped right after proxy install so
the slow build phase does not extend it.

Tests:
- Three `SegmentBuilder::update` tests document the precondition the lock
  now guarantees: with a target schema that adds a named vector the source
  lacks, update errors with "missing vector name X". Quantized and
  mixed-source variants exercise the same error path.
- `test_optimize_blocks_proxy_install_on_updates_lock` asserts the
  invariant directly: while the updates lock is held, proxies are not yet
  installed. Verified to fail when the new guard is removed (otherwise it
  passes because `finish_optimization` also takes the same lock, so a
  naive "did optimize finish?" check would not catch a missing proxy-install
  guard).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(optimize): finish_optimization lock order; drop redundant tests

Address review of #9110:

1. `finish_optimization` was acquiring `upgradable_read` before
   `acquire_updates_lock`, while the new guard at the start of
   `execute_optimization` acquires them in the reverse order. With two
   optimizer threads in flight, thread A in `finish_optimization` could
   hold `upgradable_read` and wait on `updates_lock` while thread B at the
   top of `execute_optimization` held `updates_lock` and waited on
   `upgradable_read` (parking_lot allows only one upgradable reader),
   deadlocking. Swap `finish_optimization` to take `updates_lock` first so
   both halves agree.

2. Drop the quantized and mixed-source variants of the inverted
   `SegmentBuilder` unit test — all three asserted the same error path
   (the mismatch check fires before quantization training or per-source
   branching), so only one is useful as documentation of the precondition.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: reject source-superset schema mismatch at SegmentBuilder

Drop the lock approach (deadlocked test_continuous_snapshot) and fix the bug
at the merge layer instead.

Snapshot's proxy_all_segments_and_apply acquires the segment_holder
upgradable_read first and then takes acquire_updates_lock tactically inside
the snapshot operation. The previous commits' lock-extension acquired
updates_lock before upgradable_read, so a snapshot in flight and an
optimization just entering execute_optimization could deadlock holding each
other's required next lock. Snapshot cannot easily reverse its order — that
would hold updates_lock for the entire snapshot duration, blocking all
writes.

Move the fix to where the actual harm happens: SegmentBuilder::update
iterates the target's vector_data and silently drops source vectors that
aren't in target. That silent drop is what produces the broken merged
segment in the CreateVectorName-vs-optimizer race. Add a check that every
source vector name is in the target schema; the optimization aborts cleanly
on mismatch and the next round (with refreshed config) merges correctly.

This is strictly stronger than the lock: the lock only closed the window
where V arrived *during* the proxy-install region. The schema check catches
both that window and the window where V's apply_segments completed before
the optimizer's lock acquisition.

Diff is contained to lib/segment; no locking changes, no cross-crate
plumbing. test_continuous_snapshot passes again.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(segment_builder): use Cancelled instead of ServiceError for schema mismatch

ServiceError flips the shard to RED status (via `report_optimizer_error` →
`segments.optimizer_errors`) and stays sticky until the next
`recreate_optimizers_blocking` clears it. That's the right shape for
hardware/IO failures but wrong for the schema-mismatch case here, which is
an expected, recoverable race outcome — the next optimizer round with a
refreshed target_config merges the same originals cleanly.

`Cancelled` is the variant the optimization worker treats as a recoverable
cancellation: logged at debug, tracker marked Cancelled, no
`report_optimizer_error` call, no RED status.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(segment_builder): also use Cancelled for the existing target-superset error

The existing "missing vector name" check at the start of the merge loop
also fires during a race — specifically the optimizer-vs-DeleteVectorName
shape, where V is removed from originals before J wraps proxies but J's
frozen target_config still has V. Like the new source-superset check, this
is an expected, recoverable race outcome, so use Cancelled instead of
ServiceError to avoid flipping the shard to RED.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants