fix(vdf): limit run-ahead via reset-seed confirmation gate#1449
fix(vdf): limit run-ahead via reset-seed confirmation gate#1449glottologist wants to merge 3 commits into
Conversation
Limit VDF Run-Ahead via Seed-Confirmation Gate — Design PlanMode: Bug Root CauseThe VDF mining loop ( let is_too_far_ahead = (global_step_number + 1).is_multiple_of(vdf_reset_frequency)
&& global_step_number + 1 > canonical_global_step_number + vdf_reset_frequency;This opens the gate the instant the canonical tip reaches The seed-pinning design is intended to be fork-safe by pinning one window early, but Fix ApproachReplace the zero-confirmation boundary condition with a confirmation-depth gate: a Mechanism — the reset seed changes only at rotation blocks, and
The gate predicate is extracted into a pure function so it can be proptested in Effect (verified)
Residual (out of scope)A seed becomes truly irreversible only at finalization (the prune boundary,
Files Affected
Risk Assessment
VerificationAcceptance criteria:
External Review (Codex — Design)Date: 2026-06-15 Codex validated the core shape: Amendments Made
Rejected Findings
External Review (Codex — Implementation, design-level corrections)The Phase-7 implementation-plan review surfaced two issues that also corrected this design
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughIntroduces ChangesVDF Reset-Boundary Seed-Confirmation Gate
Sequence Diagram(s)sequenceDiagram
rect rgba(100, 180, 255, 0.5)
Note over run_vdf,BlockTree: Snapshot refresh on each VDF step
end
participant run_vdf
participant BlockStatusProvider
participant BlockTree
run_vdf->>BlockStatusProvider: latest_canonical_vdf_info()
BlockStatusProvider->>BlockTree: get latest canonical entry header → vdf_limiter_info
BlockStatusProvider->>BlockTree: confirmed_canonical_step(block_migration_depth)
BlockTree-->>BlockStatusProvider: confirmed_global_step_number (skips non-Onchain tail)
BlockStatusProvider-->>run_vdf: CanonicalVdfSnapshot { vdf_info, confirmed_global_step_number }
run_vdf->>run_vdf: compute confirmation_lag = canonical_gsn - confirmed_gsn
run_vdf->>run_vdf: select gate_step (canonical or confirmed)
rect rgba(255, 160, 80, 0.5)
Note over run_vdf,is_reset_boundary_blocked: At each candidate reset boundary
end
run_vdf->>is_reset_boundary_blocked: (next_global_step, reset_frequency, confirmed_gsn)
alt blocked = true
is_reset_boundary_blocked-->>run_vdf: park / sleep
else blocked = false
is_reset_boundary_blocked-->>run_vdf: advance past boundary
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
Benchmark results: https://irys-xyz.github.io/irys/dev/bench/jason%2Fvdf_run_ahead/index.html |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/vdf/src/vdf.rs (1)
119-126:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftApply the reset-boundary gate to fast-forward steps too.
The fast-forward loop runs before the boundary gate and discards
confirmed_global_step_number, so a proposed FF step at a reset boundary can be stored and thenprocess_resetappliesnext_reset_seedeven when the confirmed chain has not reached that boundary’s rotation point. That preserves the same seed-poisoning path for FF-driven catch-up. Gateproposed_ff_step.global_step_numberbeforestore_step/process_reset, using the same confirmed snapshot state as local stepping.Suggested direction
if let Some(CanonicalVdfSnapshot { vdf_info, - confirmed_global_step_number: _, + confirmed_global_step_number: confirmed_step, }) = block_provider.latest_canonical_vdf_info() { next_reset_seed = vdf_info.next_seed; canonical_global_step_number = vdf_info.global_step_number; + confirmed_global_step_number = confirmed_step; } + + let confirmation_lag = + canonical_global_step_number.saturating_sub(confirmed_global_step_number); + let gate_step = if confirmation_lag.saturating_mul(2) >= vdf_reset_frequency { + canonical_global_step_number + } else { + confirmed_global_step_number + }; + if is_reset_boundary_blocked( + proposed_ff_step.global_step_number, + vdf_reset_frequency, + gate_step, + ) { + warn!( + proposed = proposed_ff_step.global_step_number, + canonical = canonical_global_step_number, + confirmed = confirmed_global_step_number, + gate = gate_step, + "Fast-forward step gated at reset boundary" + ); + break; + }Also applies to: 157-162, 210-214
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/vdf/src/vdf.rs` around lines 119 - 126, The fast-forward loop is processing and storing steps without applying the reset-boundary gate, which allows a proposed FF step at a reset boundary to be stored and have process_reset apply next_reset_seed even when the confirmed chain has not reached that boundary's rotation point. Apply the reset-boundary gate to proposed_ff_step.global_step_number before calling store_step and process_reset at the anchor site (crates/vdf/src/vdf.rs lines 119-126) and its sibling sites (lines 157-162 and 210-214), using the same confirmed snapshot state (vdf_info) as reference, similar to how the local stepping path validates against the boundary. This requires checking proposed_ff_step.global_step_number against the canonical_global_step_number and reset boundary conditions before proceeding with storage and reset operations at all three locations.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/vdf/src/vdf.rs`:
- Around line 203-209: The gate_step logic currently falls back to
canonical_global_step_number when confirmation lag is large (i.e., when
confirmation_lag.saturating_mul(2) >= vdf_reset_frequency), which allows
boundaries to open before the confirmed chain reaches the required point and
drops the k-block confirmation requirement. To fail closed on consensus instead
of opening to the forkable canonical tip, invert the conditional logic so that
gate_step uses confirmed_global_step_number when confirmation lag is large and
canonical_global_step_number only when confirmation lag is small. If this
fallback is essential for test scenarios, gate it behind a test-only
configuration flag or declare the configuration invalid for production use
rather than silently reintroducing unconfirmed seed consumption.
---
Outside diff comments:
In `@crates/vdf/src/vdf.rs`:
- Around line 119-126: The fast-forward loop is processing and storing steps
without applying the reset-boundary gate, which allows a proposed FF step at a
reset boundary to be stored and have process_reset apply next_reset_seed even
when the confirmed chain has not reached that boundary's rotation point. Apply
the reset-boundary gate to proposed_ff_step.global_step_number before calling
store_step and process_reset at the anchor site (crates/vdf/src/vdf.rs lines
119-126) and its sibling sites (lines 157-162 and 210-214), using the same
confirmed snapshot state (vdf_info) as reference, similar to how the local
stepping path validates against the boundary. This requires checking
proposed_ff_step.global_step_number against the canonical_global_step_number and
reset boundary conditions before proceeding with storage and reset operations at
all three locations.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 9212840f-c599-46fd-9230-de18e09b0cec
📒 Files selected for processing (6)
crates/chain/src/chain.rscrates/domain/src/models/block_tree.rscrates/p2p/src/block_status_provider.rscrates/p2p/src/tests/util.rscrates/types/src/block_provider.rscrates/vdf/src/vdf.rs
8379620 to
a9c90a3
Compare
|
@CodeRabbit full review |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/vdf/src/vdf.rs (1)
207-229:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winThrottle the expected boundary-gate warning.
A normal confirmation wait can remain gated across multiple blocks, but this path logs
warn!every 200ms while parked. Keep the first warning, then rate-limit repeats to avoid noisy logs during expected gate closures.Proposed fix
let vdf_reset_frequency = config.reset_frequency as u64; + let mut last_gate_warning_at: Option<Instant> = None; loop { @@ if !is_mining_enabled.load(std::sync::atomic::Ordering::Relaxed) || is_too_far_ahead { if is_too_far_ahead { - warn!( - "VDF gated at reset boundary: next step {} (canonical {}, confirmed {}); waiting", - global_step_number + 1, - canonical_global_step_number, - confirmed_global_step_number - ); + let should_warn = last_gate_warning_at + .map(|last| last.elapsed() >= Duration::from_secs(30)) + .unwrap_or(true); + if should_warn { + warn!( + "VDF gated at reset boundary: next step {} (canonical {}, confirmed {}); waiting", + global_step_number + 1, + canonical_global_step_number, + confirmed_global_step_number + ); + last_gate_warning_at = Some(Instant::now()); + } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/vdf/src/vdf.rs` around lines 207 - 229, The `warn!` log message for "VDF gated at reset boundary" inside the `if is_too_far_ahead` block executes repeatedly every 200ms while the condition persists, creating noisy logs during expected gate closures. Implement rate limiting for this warning to log it once initially and then throttle subsequent repeats at a longer interval. Consider using a mechanism to track the last time this warning was logged and only emit it again after a reasonable cooldown period has elapsed, so users see the first occurrence but not the flood of identical messages during extended gate closures.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@crates/vdf/src/vdf.rs`:
- Around line 207-229: The `warn!` log message for "VDF gated at reset boundary"
inside the `if is_too_far_ahead` block executes repeatedly every 200ms while the
condition persists, creating noisy logs during expected gate closures. Implement
rate limiting for this warning to log it once initially and then throttle
subsequent repeats at a longer interval. Consider using a mechanism to track the
last time this warning was logged and only emit it again after a reasonable
cooldown period has elapsed, so users see the first occurrence but not the flood
of identical messages during extended gate closures.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 560d2512-3a72-479d-a9b8-12dce3a509f0
📒 Files selected for processing (7)
crates/chain-tests/src/block_production/reset_seed.rscrates/chain/src/chain.rscrates/domain/src/models/block_tree.rscrates/p2p/src/block_status_provider.rscrates/p2p/src/tests/util.rscrates/types/src/block_provider.rscrates/vdf/src/vdf.rs
|
@CodeRabbit full review |
✅ Action performedFull review finished. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/vdf/src/vdf.rs`:
- Around line 1072-1074: The documentation comment for the boundary rotation
logic in the vdf.rs file contains outdated wording that references a fallback to
canonical tip behavior that has been removed from the implementation. Update the
comment (around lines 1072-1074) to remove the phrase about "falling back to the
canonical tip" and clarify that the gate simply reads the confirmed step when
the confirmation lag is below the reset_frequency threshold, ensuring the
comment accurately reflects the current fail-closed invariant behavior.
- Around line 207-215: The warn! log message "VDF gated at reset boundary..." is
being emitted too frequently (approximately 5 times per second) when the gate is
closed due to the 200ms loop, causing excessive log spam. Either change the
warn! macro to debug! when logging the reset boundary condition, or implement
rate-limiting for this specific warning to reduce its frequency. Ensure the
solution does not suppress all occurrences but rather throttles the repeated
messages while still capturing the event for debugging purposes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 600f056a-e4f3-4d25-82b2-0341cab39993
📒 Files selected for processing (7)
crates/chain-tests/src/block_production/reset_seed.rscrates/chain/src/chain.rscrates/domain/src/models/block_tree.rscrates/p2p/src/block_status_provider.rscrates/p2p/src/tests/util.rscrates/types/src/block_provider.rscrates/vdf/src/vdf.rs
Summary
Before:
The VDF loop crossed a reset boundary the instant the rotation block that pinned that boundary's seed became the canonical tip — at zero confirmations. A node running a full window ahead applied a reset seed from a brand-new block; if that block then lost a fork, the node mixed a non-canonical seed into every later VDF step and wedged (issue #1447, observed on testnet peer-4 across a ~1-second same-height fork).
After:
Reset-boundary crossing is gated on confirmation depth: a node crosses boundary
Bonly once the rotation block that pinnedB's seed is at leastblock_migration_depth(6) blocks deep on canonical. The existing readiness guard is retained, not replaced. The startup seed is exempt, so a fresh or genesis node does not deadlock at its first boundary.Changes
VDF stepping loop (
crates/vdf/src/vdf.rs)min_seed_confirmationsblocks deep. The original readiness guard (next_step > canonical_gsn + reset_frequency) is kept —next_seedis carried forward unchanged between rotations, so confirmation depth alone cannot replace it.(active_reset_seed, seed_pin_height); record the pin height at the tip whennext_seedtransitions while the loop runs. Confirmation depth istip_height − seed_pin_height, measured in blocks — immune to a stall followed by one large catch-up block.min_seed_confirmationsblocks. Only a seed transition observed while running ahead is aged.is_reset_boundary_blocked, proptested in isolation.BlockProvider trait (
crates/types/src/block_provider.rs)latest_canonical_vdf_infonow returnsCanonicalVdfSnapshot { vdf_info, tip_height }, carrying the tip block height alongside the limiter info from the same tip read, so seed and height stay consistent.P2P block status provider (
crates/p2p/src/block_status_provider.rs)tip_heightwithvdf_infofrom the same canonical-tip snapshot.Chain wiring (
crates/chain/src/chain.rs)config.consensus.block_migration_depthintorun_vdfasmin_seed_confirmations. No new config field is introduced.Tests
crates/vdf/src/vdf.rs: proptest for the pure gate predicate; a regression test that a freshly-pinned (zero-confirmation) seed at a boundary parks the loop until its source isKblocks deep, and a guard that the exempt startup seed crosses its first boundary without parking (no deadlock).crates/chain-tests/src/block_production/reset_seed.rs: end-to-end regression (213 lines).Technical Notes
reset_frequencytoreset_frequency − K × steps_per_block. Mainnet ~600 → ~528 steps (~9 min); testnet ~1200 → ~1140 steps (~19 min).block_migration_depthis the confirmed marker, weaker than finalized (the prune boundary). A confirmed-but-not-finalized seed source can still be reversed by a partition-recovery reorg; closing that tail needs reorg-triggered VDF-buffer reconciliation, which is out of scope. This change is prevention only — recovery and self-heal are not included.Testing
Related
Summary by CodeRabbit
Release Notes