refactor: extract non-pool responsibilities from the mempool#1159
Conversation
…Guard directly Replace all MempoolServiceMessage::GetBlockHeader usages with direct block_header_lookup::get_block_header calls. This removes a pure proxy that routed block header lookups through the mempool's message queue despite having nothing to do with mempool state. Key changes: - New block_header_lookup module with get_block_header(tree, db, hash, include_chunk) - Refactored get_assigned_ingress_proofs to remove generic closure parameter - Added BlockTreeReadGuard to RethService (replacing mempool dependency) - Removed GetBlockHeader variant, handler, and facade method
The removal of MempoolFacade::get_block_header in e8291c2 left the test fixtures storing block headers in the now-unreachable MempoolStub.blocks field. Persist headers to MDBX instead so block_header_lookup's DB fallback path can find them.
… HashMap Remove GetCommitmentTxs, GetBestMempoolTxs, GetDataTxs, and GetMempoolStatus message variants from MempoolServiceMessage. All callers now use MempoolReadGuard or AtomicMempoolState directly: - Test utilities use mempool_guard.get_commitment_txs() and TxSelector::select_best_txs() directly - API routes use mempool_guard.atomic_state().get_status() and mempool_guard.get_tx_metadata() directly - Add get_data_txs_best_effort() free function in mempool_guard for mempool-first, DB-fallback semantics - Thread ChunkIngressState into ApiState for direct status access Also convert valid_submit_ledger_tx and valid_commitment_tx from BTreeMap to HashMap (no ordering dependency) with deterministic tie-breakers on tx id / address for eviction consistency. Net: -258 lines (148 added, 406 removed)
📝 WalkthroughWalkthroughAdds centralized anchor-resolution/validation and a new tx-selection pipeline; replaces mempool IPC header lookups with BlockTree+DB fallback; refactors mempool state to HashMap with batch APIs and reorg primitives; wires chunk-ingress and reth adapter into block production; updates tests to DB-backed and guard-based APIs. Changes
Sequence Diagram(s)sequenceDiagram
participant Producer as BlockProducer
participant Selector as TxSelector
participant Mempool as MempoolState
participant BlockTree as BlockTreeService
participant DB as Database
participant Reth as RethNode
Producer->>Selector: select_best_txs(parent_hash, timestamp, ctx)
Selector->>BlockTree: get parent header / canonical check
BlockTree-->>Selector: parent header
Selector->>Mempool: fetch candidate txs
Mempool-->>Selector: candidate txs
loop per-candidate
Selector->>BlockTree: get_anchor_height(anchor, canonical?)
alt in-memory hit
BlockTree-->>Selector: anchor_height
else fallback
Selector->>DB: canonical_block_height_by_hash(anchor)
DB-->>Selector: height or None
end
Selector->>Reth: fetch_balances_for_transactions(block_id, txs)
Reth-->>Selector: balances
Selector->>Selector: check_funding / fee calculations
end
Selector->>DB: fetch ingress proofs / data roots
DB-->>Selector: proofs / roots
Selector-->>Producer: MempoolTxs (commitment, submit, publish)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…validation Prevents stale recent_valid_tx entries from causing is_known_commitment_in_mempool to skip re-ingress when process_pending_pledges_for_new_stake runs. Also restricts write_for_reorg visibility to pub(crate) and adds doc comments clarifying revalidation scope and why pending_pledges is not consulted.
Keep our structural refactoring (direct block_header_lookup calls, tx_selector bypassing mempool queue, mempool_guard direct access) while accepting master's Traced<T> channel wrappers and send_traced for messages that still go through the service bus.
…xs-extraction # Conflicts: # crates/actors/src/block_producer.rs # crates/actors/src/block_validation.rs # crates/actors/src/mempool_service.rs # crates/actors/src/mempool_service/commitment_txs.rs # crates/actors/src/mempool_service/data_txs.rs # crates/actors/src/mempool_service/facade.rs # crates/actors/src/mempool_service/lifecycle.rs # crates/actors/src/reth_service.rs
…z/irys into rob/block-builder-txs-extraction
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
crates/actors/src/mempool_service.rs (2)
2355-2417: 🧹 Nitpick | 🔵 TrivialAdd tie-case coverage for the new deterministic eviction rules.
These tests only cover strictly higher/lower fee and value comparisons. The new behavior in
find_lowest_fee_data_tx()andfind_lowest_value_address()changes on equal-fee/equal-value inputs, so the determinism guarantee is still untested.A small table-driven
rstestfor equal-fee and equal-total-value cases would lock this down. As per coding guidelines,Use rstest for parameterized tests.Also applies to: 2500-2581
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/actors/src/mempool_service.rs` around lines 2355 - 2417, Add parameterized rstest cases to cover tie scenarios for the deterministic eviction rules in find_lowest_fee_data_tx() and find_lowest_value_address(): create table-driven tests (using rstest) that exercise equal-fee ties (new tx fee == lowest existing fee) and equal-total-value ties (evicted address total value == candidate) referencing the existing tests test_data_tx_evicts_lowest_fee_when_full and test_data_tx_rejects_lower_fee_when_full as templates; for each case assert the deterministic outcome (which tx/address is evicted and which are preserved) according to the implemented tie-break rules so equal-fee/equal-value behavior is locked down.
129-151:⚠️ Potential issue | 🟠 MajorValidate commitment value before hitting RPC.
Lines 129-151 call
validate_funding()before the localvalidate_value()check. That means obviously malformed commitment txs still trigger a balance RPC, which is avoidable load on the reth adapter and mempool workers under spam.♻️ Proposed change
- // Funding - validate_funding(reth_adapter, commitment_tx, parent_evm_block_id) - .await - .map_err(|e| { - warn!( - tx.id = ?commitment_tx.id(), - tx.signer = ?commitment_tx.signer(), - tx.error = ?e, - "Commitment tx funding validation failed" - ); - e - })?; - // Value commitment_tx.validate_value(consensus).map_err(|e| { warn!( tx.id = ?commitment_tx.id(), tx.signer = ?commitment_tx.signer(), tx.error = ?e, "Commitment tx value validation failed" ); TxIngressError::from(e) })?; + + // Funding + validate_funding(reth_adapter, commitment_tx, parent_evm_block_id) + .await + .map_err(|e| { + warn!( + tx.id = ?commitment_tx.id(), + tx.signer = ?commitment_tx.signer(), + tx.error = ?e, + "Commitment tx funding validation failed" + ); + e + })?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/actors/src/mempool_service.rs` around lines 129 - 151, The current flow calls validate_funding(reth_adapter, commitment_tx, parent_evm_block_id) before the local commitment_tx.validate_value(consensus), causing malformed transactions to hit the reth_adapter RPC; swap the checks so call commitment_tx.validate_value(consensus) first and return on error (preserving the existing map_err warning/TxIngressError conversion), then call validate_funding(...) only if value validation succeeds; keep the same logging fields (tx.id, tx.signer, tx.error) and error propagation behavior for both validation steps.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/actors/src/mempool_service.rs`:
- Around line 1730-1733: put_recent_invalid currently only inserts into
recent_invalid_tx, leaving entries in recent_valid_tx which causes
mempool_data_tx_status and mempool_commitment_tx_status (which check
recent_valid_tx first) to still report ValidSeen; update
put_recent_invalid(tx_id: H256) to also remove tx_id from recent_valid_tx (e.g.,
call recent_valid_tx.remove(&tx_id) or equivalent) before or after inserting
into recent_invalid_tx so the two sets remain mutually exclusive; reference the
methods/fields recent_invalid_tx, recent_valid_tx, put_recent_invalid,
mempool_data_tx_status and mempool_commitment_tx_status when making the change.
---
Outside diff comments:
In `@crates/actors/src/mempool_service.rs`:
- Around line 2355-2417: Add parameterized rstest cases to cover tie scenarios
for the deterministic eviction rules in find_lowest_fee_data_tx() and
find_lowest_value_address(): create table-driven tests (using rstest) that
exercise equal-fee ties (new tx fee == lowest existing fee) and
equal-total-value ties (evicted address total value == candidate) referencing
the existing tests test_data_tx_evicts_lowest_fee_when_full and
test_data_tx_rejects_lower_fee_when_full as templates; for each case assert the
deterministic outcome (which tx/address is evicted and which are preserved)
according to the implemented tie-break rules so equal-fee/equal-value behavior
is locked down.
- Around line 129-151: The current flow calls validate_funding(reth_adapter,
commitment_tx, parent_evm_block_id) before the local
commitment_tx.validate_value(consensus), causing malformed transactions to hit
the reth_adapter RPC; swap the checks so call
commitment_tx.validate_value(consensus) first and return on error (preserving
the existing map_err warning/TxIngressError conversion), then call
validate_funding(...) only if value validation succeeds; keep the same logging
fields (tx.id, tx.signer, tx.error) and error propagation behavior for both
validation steps.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 3391d00e-6a24-43cc-81e2-b5717167d25c
📒 Files selected for processing (1)
crates/actors/src/mempool_service.rs
| } | ||
|
|
||
| /// Adds a tx ID to the recent invalid set. | ||
| pub fn put_recent_invalid(&mut self, tx_id: H256) { |
There was a problem hiding this comment.
@coderabbitai can you make a tracking issue to redesign how the recent valid/invalid tx tracking works
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
gh issue create \
--repo Irys-xyz/irys \
--title "refactor: redesign recent valid/invalid tx tracking in mempool" \
--assignee JesseTheRobot \
--body "## Background
The mempool currently tracks transaction validity state via three separate LRU caches in \`MempoolState\` (crates/actors/src/mempool_service.rs):
\`\`\`rust
pub recent_invalid_tx: LruCache<H256, ()>,
pub recent_invalid_payload_fingerprints: LruCache<H256, ()>,
pub recent_valid_tx: LruCache<H256, ()>,
\`\`\`
This design was identified as a candidate for redesign during review of PR `#1159` (https://github.com/Irys-xyz/irys/pull/1159#discussion_r2896314880), requested by `@JesseTheRobot`.
## Problems with the Current Approach
1. **Scattered management**: The valid/invalid state is manually kept in sync across many methods — \`mark_tx_as_invalid\`, \`batch_prune_data_txs\`, \`batch_prune_commitment_txs\`, \`insert_tx_and_mark_valid\`, \`put_recent_invalid\`, \`wipe_blacklists\`, \`remove_blacklisted_txids\`, \`apply_block_confirmed_updates\`, etc. Any new code path risks forgetting to update one of the caches.
2. **Conflated semantics of \`recent_valid_tx\`**: This cache serves multiple purposes simultaneously:
- Deduplication (skip already-seen valid txs on re-ingress)
- Tracking confirmed-but-pruned txs (so they aren't re-ingested after confirmation)
- Promotion/inclusion status signalling
The dual use makes it hard to reason about correctness.
3. **Three-cache inconsistency risk**: A tx can theoretically be in both \`recent_valid_tx\` and \`recent_invalid_tx\` if updates are applied out of order or a code path is missed.
4. **Fragile disk restore**: \`wipe_blacklists()\` must be called after \`restore_mempool_from_disk()\` to reset the invalid cache, indicating the caches carry stale state across restarts that must be explicitly cleared.
5. **No explicit tx lifecycle**: There is no formal model of the states a tx moves through (unknown → pending → valid → included → promoted / unknown → invalid). The LRU caches are an implicit, untyped approximation.
## Proposed Direction
Consider replacing the three LRU caches with a single, typed tx-status map or a small state machine per tx ID, e.g.:
\`\`\`rust
enum TxLifecycleState {
Valid, // In mempool, not yet confirmed
Included(u64), // Confirmed at block height
Promoted(u64), // Promoted (publish ledger) at block height
Invalid(String),// Rejected, with reason
}
recent_tx_status: LruCache<H256, TxLifecycleState>,
\`\`\`
This would:
- Provide a single source of truth for tx status
- Make transitions explicit and auditable
- Eliminate the need for \`wipe_blacklists()\` on restart (invalid entries simply expire from LRU)
- Reduce the number of lock acquisitions needed to update tx state
## Scope
- [ ] Audit all write sites for \`recent_valid_tx\`, \`recent_invalid_tx\`, and \`recent_invalid_payload_fingerprints\`
- [ ] Define a typed \`TxLifecycleState\` enum covering all observed states
- [ ] Replace the three caches with a single \`recent_tx_status\` LRU map
- [ ] Update all read sites (\`is_recent_valid_tx\`, \`mempool_data_tx_status\`, \`mempool_commitment_tx_status\`, \`is_known_commitment_in_mempool\`, etc.)
- [ ] Remove \`wipe_blacklists()\` or narrow its scope
- [ ] Add tests for state transitions and re-ingress after confirmation
"
Length of output: 44
@JesseTheRobot, I've opened the tracking issue here: #1191
It covers the key pain points in the current three-cache design (recent_valid_tx, recent_invalid_tx, recent_invalid_payload_fingerprints):
- Scattered manual sync across ~10+ methods
- Conflated semantics of
recent_valid_tx(deduplication, confirmation tracking, and promotion signalling all in one cache) - Risk of a tx appearing in both valid and invalid caches simultaneously
- Fragile
wipe_blacklists()requirement after disk restore - No formal tx lifecycle model
The proposed direction is to replace the three caches with a single LruCache<H256, TxLifecycleState> with explicit state transitions (Valid → Included(u64) → Promoted(u64) / Invalid(String)), and I've included a scoped checklist of all the read/write sites that would need updating.
🧠 Learnings used
Learnt from: DanMacDonald
Repo: Irys-xyz/irys PR: 1104
File: crates/actors/src/mempool_service/data_txs.rs:31-37
Timestamp: 2026-01-24T02:18:04.459Z
Learning: In `crates/actors/src/mempool_service/data_txs.rs`, the precheck for DataLedger intentionally rejects OneYear and ThirtyDay ledgers during the staged rollout. Downstream match arms for these ledgers are placeholders for future implementation and should not be flagged as unreachable during this development phase.
Learnt from: roberts-pumpurs
Repo: Irys-xyz/irys PR: 1159
File: crates/api-server/src/routes/tx.rs:293-304
Timestamp: 2026-03-05T11:31:20.677Z
Learning: In the Irys codebase (crates/api-server/src/routes/tx.rs and related code), the **DB is authoritative** for tx promotion metadata (`promoted_height`, `included_height`), not the mempool. The correct ordering is:
1. `BlockTreeService::on_block_validation_finished` calls `persist_metadata(...)` to write metadata to the DB **first** (synchronously and immediately on block confirmation).
2. Only afterwards does it send `MempoolServiceMessage::BlockConfirmed` to the mempool, which then updates in-memory state.
3. `BlockMigrationService` delayed migration (gated by `block_migration_depth`) applies only to full block/header migration, NOT to tx metadata — metadata is persisted inline on each confirmed block.
4. On restart, the mempool explicitly reconstructs metadata from the DB (documented at mempool_service.rs line ~500 as DB-authoritative).
5. The only transient staleness window is between the block tree advancing internally and `persist_metadata()` completing — after that, DB is ahead of mempool.
Therefore, in `get_tx_promotion_status` (and similar code paths), using DB metadata unconditionally when present is correct. Do not flag "DB before mempool" as a bug in this codebase.
Learnt from: roberts-pumpurs
Repo: Irys-xyz/irys PR: 1159
File: crates/api-server/src/routes/tx.rs:293-304
Timestamp: 2026-03-05T11:30:53.088Z
Learning: In the Irys codebase, the **DB is authoritative** for tx promotion metadata (`promoted_height`, `included_height`), not the mempool. The correct write order is:
1. `BlockTreeService::on_block_validation_finished` calls `persist_metadata(...)` to write tx metadata (included_height, promoted_height) to the DB synchronously and inline on each confirmed block.
2. Only *after* that DB write does it send `MempoolServiceMessage::BlockConfirmed` to the mempool, which triggers the in-memory update via `apply_block_confirmed_updates`.
DB is therefore always updated before the mempool receives its in-memory update.
On restart, `reconstruct_metadata_from_db()` in `mempool_service.rs` (around line 500) explicitly reloads metadata from DB, and the code documents DB as authoritative.
The delayed `BlockMigrationService` path applies to full block/header migration into the block index — it does NOT govern tx metadata persistence, which is inline and immediate.
The only transient window where both DB and mempool are stale is the brief moment after the block tree advances internally but before `persist_metadata()` runs.
Therefore, in `crates/api-server/src/routes/tx.rs`, using DB metadata unconditionally first (as the current code does) is correct. File: crates/api-server/src/routes/tx.rs, crates/actors/src/block_tree_service.rs, crates/actors/src/mempool_service.rs.
Learnt from: antouhou
Repo: Irys-xyz/irys PR: 1097
File: crates/p2p/src/block_pool.rs:834-867
Timestamp: 2026-02-11T21:49:10.701Z
Learning: In crates/p2p/src/block_pool.rs, the spawned background task for mempool ingestion (handle_commitment_transaction_ingress_gossip and handle_data_transaction_ingress_gossip) intentionally uses a fire-and-forget pattern without panic handling because failures at this stage don't matter—block validation will re-ingest the transactions anyway.
Learnt from: roberts-pumpurs
Repo: Irys-xyz/irys PR: 1159
File: crates/api-server/src/routes/tx.rs:293-304
Timestamp: 2026-03-05T11:30:53.088Z
Learning: In the Irys codebase, `BlockMigrationService` handles two distinct responsibilities: (1) tx metadata persistence (included_height, promoted_height) which happens synchronously and inline on each confirmed block, and (2) full block/header migration into the block index which is delayed by `block_migration_depth` (default ~6 confirmations). These must not be conflated. The deferred migration depth applies only to (2), not (1). File: crates/actors/src/block_migration_service.rs.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/actors/src/tx_selector/mod.rs`:
- Around line 1084-1087: The current overlay uses
metadata().merge(db_header.metadata()) which preserves mempool fields when DB
fields are None, letting stale promoted_height/included_height survive; update
the code handling (Some(mut mempool_header), Some(db_header)) to still merge for
general fields but then explicitly copy the authoritative promotion metadata
from db_header: after creating merged =
mempool_header.metadata().merge(db_header.metadata()), set
merged.promoted_height = db_header.metadata().promoted_height (and
merged.included_height = db_header.metadata().included_height) before calling
mempool_header.set_metadata(merged), so DB values for
promoted_height/included_height are used verbatim.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 8732e5ce-6b1d-46a7-8a9f-6d832ca62c0b
📒 Files selected for processing (2)
crates/actors/src/mempool_service.rscrates/actors/src/tx_selector/mod.rs
| (Some(mut mempool_header), Some(db_header)) => { | ||
| trace!("Got tx {} from mempool; overlaying DB metadata", tx_id); | ||
| let merged = mempool_header.metadata().merge(db_header.metadata()); | ||
| mempool_header.set_metadata(merged); |
There was a problem hiding this comment.
Use the DB values verbatim for authoritative metadata.
merge() preserves the mempool fields when the DB fields are None, so stale included_height / promoted_height can survive this overlay. In this path the DB is authoritative, so those fields should be copied from db_header directly; otherwise a tx can keep looking already included/promoted and get filtered out incorrectly.
🔧 Suggested fix
(Some(mut mempool_header), Some(db_header)) => {
trace!("Got tx {} from mempool; overlaying DB metadata", tx_id);
- let merged = mempool_header.metadata().merge(db_header.metadata());
- mempool_header.set_metadata(merged);
+ mempool_header.metadata_mut().included_height =
+ db_header.metadata().included_height;
+ mempool_header.metadata_mut().promoted_height =
+ db_header.metadata().promoted_height;
found_txs.push(Some(mempool_header));
}Based on learnings: DB is authoritative for tx promotion metadata (promoted_height, included_height), and it is updated before the mempool’s in-memory state.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/actors/src/tx_selector/mod.rs` around lines 1084 - 1087, The current
overlay uses metadata().merge(db_header.metadata()) which preserves mempool
fields when DB fields are None, letting stale promoted_height/included_height
survive; update the code handling (Some(mut mempool_header), Some(db_header)) to
still merge for general fields but then explicitly copy the authoritative
promotion metadata from db_header: after creating merged =
mempool_header.metadata().merge(db_header.metadata()), set
merged.promoted_height = db_header.metadata().promoted_height (and
merged.included_height = db_header.metadata().included_height) before calling
mempool_header.set_metadata(merged), so DB values for
promoted_height/included_height are used verbatim.
| // Track fees for this address regardless of whether this specific transaction is included | ||
| fees_spent_per_address | ||
| .entry(signer) | ||
| .and_modify(|val| *val = val.checked_add(fee).unwrap_or(U256::MAX)) | ||
| .or_insert(fee); |
There was a problem hiding this comment.
@coderabbitai please make a tracking issue to investigate this behaviour (the tracking regardless of inclusion)
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
gh issue create \
--repo Irys-xyz/irys \
--assignee JesseTheRobot \
--title "Investigate: fee tracking in check_funding accumulates costs for excluded transactions" \
--body "## Summary
In \`crates/actors/src/tx_selector/helpers.rs\`, the \`check_funding\` function accumulates fees in \`fees_spent_per_address\` for a transaction **regardless of whether that transaction is actually included** in the block being built. This occurs even when \`has_funds\` is \`false\` (i.e., the transaction is excluded due to insufficient balance).
## Concern
If a transaction from a given signer is rejected for insufficient funds, its fee is still added to the cumulative \`fees_spent_per_address\` tally for that signer. This could cause subsequent transactions from the same address — which might otherwise have sufficient balance — to be **incorrectly rejected** during the same block selection pass.
## Relevant Code
File: \`crates/actors/src/tx_selector/helpers.rs\`
\`\`\`rust
// Track fees for this address regardless of whether this specific transaction is included
fees_spent_per_address
.entry(signer)
.and_modify(|val| *val = val.checked_add(fee).unwrap_or(U256::MAX))
.or_insert(fee);
\`\`\`
## Questions to Investigate
1. Is this intentional? (e.g., to conservatively reserve budget and avoid over-selection)
2. If a tx is excluded, should its fee be excluded from the cumulative tally for that signer?
3. Does this cause valid transactions to be incorrectly dropped during block production?
4. Are there ordering dependencies that make this safe in practice?
## References
- PR: https://github.com/Irys-xyz/irys/pull/1159
- Comment: https://github.com/Irys-xyz/irys/pull/1159#discussion_r2896411610
- Raised by: `@JesseTheRobot`"
Length of output: 44
@JesseTheRobot, I've opened a tracking issue for this: #1192 — Investigate: fee tracking in check_funding accumulates costs for excluded transactions.
The issue captures the core concern: when a transaction is rejected due to insufficient funds, its fee is still accumulated in fees_spent_per_address, which could cause subsequent transactions from the same signer to be incorrectly excluded during the same block selection pass. The issue includes the relevant code snippet and a set of investigation questions (e.g., is this intentional as a conservative budget reservation, or is it a bug?).
🧠 Learnings used
Learnt from: roberts-pumpurs
Repo: Irys-xyz/irys PR: 1159
File: crates/actors/src/tx_selector/mod.rs:0-0
Timestamp: 2026-03-05T18:29:26.436Z
Learning: In `crates/actors/src/tx_selector/mod.rs` (and the broader Irys codebase), the ingress proof acceptance gate intentionally uses the raw config-specified `number_of_ingress_proofs_total_at(timestamp)` value WITHOUT clamping to the number of currently staked miners (`total_miners`). This matches the behavior of the validation service and block producer on master, which also do not clamp. Do not suggest clamping `proofs_per_tx` to `min(configured_total, total_miners)` in the final acceptance check — it would create an asymmetry between block production and validation, causing valid blocks to be rejected. Dynamic proof count support is a future work item.
Learnt from: roberts-pumpurs
Repo: Irys-xyz/irys PR: 1159
File: crates/api-server/src/routes/tx.rs:293-304
Timestamp: 2026-03-05T11:31:20.677Z
Learning: In the Irys codebase (crates/api-server/src/routes/tx.rs and related code), the **DB is authoritative** for tx promotion metadata (`promoted_height`, `included_height`), not the mempool. The correct ordering is:
1. `BlockTreeService::on_block_validation_finished` calls `persist_metadata(...)` to write metadata to the DB **first** (synchronously and immediately on block confirmation).
2. Only afterwards does it send `MempoolServiceMessage::BlockConfirmed` to the mempool, which then updates in-memory state.
3. `BlockMigrationService` delayed migration (gated by `block_migration_depth`) applies only to full block/header migration, NOT to tx metadata — metadata is persisted inline on each confirmed block.
4. On restart, the mempool explicitly reconstructs metadata from the DB (documented at mempool_service.rs line ~500 as DB-authoritative).
5. The only transient staleness window is between the block tree advancing internally and `persist_metadata()` completing — after that, DB is ahead of mempool.
Therefore, in `get_tx_promotion_status` (and similar code paths), using DB metadata unconditionally when present is correct. Do not flag "DB before mempool" as a bug in this codebase.
Learnt from: roberts-pumpurs
Repo: Irys-xyz/irys PR: 1159
File: crates/api-server/src/routes/tx.rs:293-304
Timestamp: 2026-03-05T11:30:53.088Z
Learning: In the Irys codebase, the **DB is authoritative** for tx promotion metadata (`promoted_height`, `included_height`), not the mempool. The correct write order is:
1. `BlockTreeService::on_block_validation_finished` calls `persist_metadata(...)` to write tx metadata (included_height, promoted_height) to the DB synchronously and inline on each confirmed block.
2. Only *after* that DB write does it send `MempoolServiceMessage::BlockConfirmed` to the mempool, which triggers the in-memory update via `apply_block_confirmed_updates`.
DB is therefore always updated before the mempool receives its in-memory update.
On restart, `reconstruct_metadata_from_db()` in `mempool_service.rs` (around line 500) explicitly reloads metadata from DB, and the code documents DB as authoritative.
The delayed `BlockMigrationService` path applies to full block/header migration into the block index — it does NOT govern tx metadata persistence, which is inline and immediate.
The only transient window where both DB and mempool are stale is the brief moment after the block tree advances internally but before `persist_metadata()` runs.
Therefore, in `crates/api-server/src/routes/tx.rs`, using DB metadata unconditionally first (as the current code does) is correct. File: crates/api-server/src/routes/tx.rs, crates/actors/src/block_tree_service.rs, crates/actors/src/mempool_service.rs.
Learnt from: roberts-pumpurs
Repo: Irys-xyz/irys PR: 1118
File: crates/actors/src/mempool_service/ingress_proofs.rs:286-299
Timestamp: 2026-02-02T14:41:28.988Z
Learning: In ingress proof handling (e.g., `crates/actors/src/mempool_service/ingress_proofs.rs`), the staking checks are intentionally asymmetric: proof ingestion (`handle_ingest_ingress_proof`) accepts proofs from nodes staked in either `epoch_snapshot` or `commitment_snapshot` to minimize fork issues, but proof generation (`generate_and_store_ingress_proof`, `reanchor_and_store_ingress_proof`) only checks `epoch_snapshot` to ensure nodes generate proofs only when definitively staked in the current epoch.
Learnt from: CR
Repo: Irys-xyz/irys PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T16:43:55.969Z
Learning: Encode protocol-level actions (rewards, storage fees, staking) as shadow transactions sent to `SHADOW_TX_DESTINATION_ADDR` and execute via `IRYS_SHADOW_EXEC` precompile
Learnt from: roberts-pumpurs
Repo: Irys-xyz/irys PR: 1159
File: crates/api-server/src/routes/tx.rs:293-304
Timestamp: 2026-03-05T11:30:53.088Z
Learning: In the Irys codebase, `BlockMigrationService` handles two distinct responsibilities: (1) tx metadata persistence (included_height, promoted_height) which happens synchronously and inline on each confirmed block, and (2) full block/header migration into the block index which is delayed by `block_migration_depth` (default ~6 confirmations). These must not be conflated. The deferred migration depth applies only to (2), not (1). File: crates/actors/src/block_migration_service.rs.
Learnt from: antouhou
Repo: Irys-xyz/irys PR: 1097
File: crates/p2p/src/block_pool.rs:834-867
Timestamp: 2026-02-11T21:49:10.701Z
Learning: In crates/p2p/src/block_pool.rs, the spawned background task for mempool ingestion (handle_commitment_transaction_ingress_gossip and handle_data_transaction_ingress_gossip) intentionally uses a fire-and-forget pattern without panic handling because failures at this stage don't matter—block validation will re-ingest the transactions anyway.
Learnt from: DanMacDonald
Repo: Irys-xyz/irys PR: 1104
File: crates/actors/src/mempool_service/data_txs.rs:31-37
Timestamp: 2026-01-24T02:18:04.459Z
Learning: In `crates/actors/src/mempool_service/data_txs.rs`, the precheck for DataLedger intentionally rejects OneYear and ThirtyDay ledgers during the staged rollout. Downstream match arms for these ledgers are placeholders for future implementation and should not be flagged as unreachable during this development phase.
|
@coderabbitai make a tracking issue for investigating this line in our codebase irys/crates/actors/src/tx_selector/helpers.rs Line 115 in 314f227 |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chainLet me first read the relevant line to understand what needs to be tracked. 🏁 Script executed: Length of output: 1086 Let me get a bit more context before creating the issue. 🏁 Script executed: Length of output: 2326 Now I have a clear picture of the code. Let me create the tracking issue. 🏁 Script executed: Length of output: 44
The investigation covers two specific concerns I identified on and around line 115 of
🧠 Learnings used |
Mempool Simplification — Extract Non-Pool Responsibilities
The mempool had grown into a god object handling 6 distinct roles across 3,570 lines. This PR extracts non-pool responsibilities, fixes lock contention, and streamlines the message API. Net result: -276 lines across 30 files,
mempool_service.rsshrinks by ~1,600 lines.Why
GetBlockHeaderforwarded toBlockTreeReadGuardthrough the mempool message queue despite having nothing to do with mempool stateInner: Fee calculation, anchor validation, and signature validation took&selfonly to access config/block tree — no mempool state neededBTreeMap<H256, _>whereHashMapsuffices — H256 has no meaningful orderingWhat Changed
handle_block_confirmed_messageandprune_pending_txs— single write lock per operation instead of per-txtx_selectormodule — statelessselect_best_txs()called directly by block producer, bypassing the mempool message queue entirelyanchor_validationmodule — shared by mempool andChunkIngressServiceblock_header_lookupmodule — replaces 7 call sites that routed through mempool;RethServicenow holdsBlockTreeReadGuarddirectlyMempoolServiceMessagevariants (21 → 13): callers useMempoolReadGuard/AtomicMempoolStatedirectlyBTreeMap→HashMapforvalid_submit_ledger_txandvalid_commitment_txwith deterministic eviction tie-breakersArchitecture Before/After
Impact
mempool_service.rsMempoolServiceMessagevariantsvalid_submit_ledger_txlookupChecklist
Summary by CodeRabbit
New Features
Refactor
Bug Fixes
Tests