fix: size cache poisoning and tx offset validation by glottologist · Pull Request #1045 · Irys-xyz/irys

glottologist · 2025-12-04T10:57:52Z

Describe the changes
Before:
Chunk ingress rejected chunks when the cached data_size was smaller than the chunk's claimed size, even if the cached value was unverified. Attackers could poison the cache with small data_size transactions, causing legitimate chunks to be rejected. Pre-header cache limiting used tx_offset directly, allowing attackers to evict legitimate entries by submitting high-offset chunks.

After:
Tracks data_size_confirmed flag to distinguish verified vs unverified cache entries. Unconfirmed size mismatches park chunks rather than rejecting them. Pre-header cache uses count-based limiting instead of offset-based. Adds Merkle proof verification of rightmost chunks to confirm data_size.

Changes

Chunk Validation (`crates/types/src/chunk.rs`)

Added max_valid_offset() to compute maximum valid tx_offset for a data_size
Added is_valid_offset() for bounds checking tx_offset against data_size
Added end_byte_offset_checked() with overflow protection
Added rstest parameterised tests for offset validation edge cases

Mempool Chunk Ingress (`crates/actors/src/mempool_service/chunks.rs`)

Track data_size_confirmed from cached data root
Fall back to storage modules for data_size lookup with publish/submit ledger distinction - publish is trusted, submit is attempted to be verified.
Park chunks when unconfirmed data_size is smaller than the chunk's claimed size
Added verify_data_size_from_storage_modules() for merkle proof-based verification
Validate tx_offset bounds before merkle path validation
Change pre-header limiting from offset-based to count-based
Added InvalidOffset(String) error variant with context

Mempool State (`crates/actors/src/mempool_service.rs`, `pending_chunks.rs`, `mempool_guard.rs`)

Added pending_chunk_count_for_data_root() method
Added get() method to PriorityPendingChunks for read-only access

Tests (`crates/chain/tests/`)

Updated test_overlapping_data_sizes to expect parking instead of rejection
Renamed preheader_rejects_out_of_cap_tx_offset to preheader_rejects_when_cache_full
Test now verifies count-based limiting behaviour

Related Issue(s)
Please link to the issue(s) that will be closed with this PR.

Checklist

Tests have been added/updated for the changes.
Documentation has been updated for the changes (if applicable).
The code follows Rust's style guidelines.

Additional Context
Add any other context about the pull request here.

JesseTheRobot

overall looks good, but I do have a couple blocking change requests

JesseTheRobot · 2025-12-09T09:54:32Z

+                    for info in &infos.0 {
+                        let current_max = sm_data_size.unwrap_or(0);
+                        if info.data_size > current_max {
+                            sm_data_size = Some(info.data_size);


there is a flaw here: if a submit slot has a larger data_size for a data_root, and then we find a data_size on a publish ledger SM, we will trust the larger invalid size

fixed in a39be0a

JesseTheRobot · 2025-12-09T09:59:26Z

+                let mut sm_data_size: Option<u64> = None;
+                let mut from_publish_ledger = false;
+
+                for sm in storage_modules.iter() {


we should probably run these operations in parallel - on the larger nodes (with 40+ drives) doing these checks drive-by-drive might cause notable latency, and each drive is it's own I/O domain so there shouldn't be any issues with doing this. we should also parallelise verify_data_size_from_storage_modules for the same reason.

(I am fine with this being a followup if required, but I would request we instrument this iteration & collect_data_root_infos in this PR so we can see the timing info in our telemetry so we can see if it start becoming an issue)

addressed in 26025d6

JesseTheRobot · 2025-12-09T10:03:30Z

        }

+        let num_chunks = data_size.div_ceil(chunk_size);
+        let max_valid_offset = num_chunks.saturating_sub(1);


I think this calculation is correct, but it might be worth double-checking/extracting this to a single source-of-truth function

oh you did move it to a function - can you extract it from UnpackedChunk and then change this code to use it?

addressed in da52a52

JesseTheRobot · 2025-12-09T10:07:55Z

+                _ => continue,
+            };
+
+            for info in &infos.0 {


this inner loop we can leave as-is, the outer loop is the only one that would massively benefit from parallelisation

JesseTheRobot · 2025-12-09T10:08:48Z

+
+                let partition_offset =
+                    irys_types::PartitionChunkOffset::from(relative_offset as u32);
+                let chunk = match sm.generate_full_chunk(partition_offset) {


nit(perf): we just need the data_path, which we can resolve without generating the full chunk (or at least without having to read in the bytes)

feel free to extract that metadata logic from generate_full_chunk into a seperate helper function so that you can use it here (can be a followup)

fixed in e5ff113

JesseTheRobot · 2025-12-09T10:12:09Z

                )
                    .into()),
+                CriticalChunkIngressError::InvalidOffset(ref msg) => {
+                    Ok(HttpResponse::build(StatusCode::BAD_REQUEST)


we need to use the ApiStatusResponse / ApiError as it gives canonical JSON body formatting for errors (applies for all the HttpResponse's in this file)

fixed in 4e28ddd

…ytes

JesseTheRobot

LGTM

glottologist changed the title ~~Jason/data size cache poisoning rework~~ fix: size cache poisoning and tx offset validation Dec 4, 2025

This was referenced Dec 4, 2025

fix: additional tx_offset validation to prevent overflows #1032

Closed

fix(mempool): prevent data_size cache poisoning #1034

Closed

glottologist marked this pull request as ready for review December 4, 2025 12:04

glottologist requested review from DanMacDonald, JesseTheRobot and antouhou and removed request for JesseTheRobot December 4, 2025 12:04

glottologist added 7 commits December 9, 2025 08:43

fix(mempool): prevent data_size cache poisoning

48af3fd

fix(mempool): use chunk count instead of tx_offset

8e31994

fix: additional tx_offset validation to prevent overflows

b753c09

fix(mempool): add proof-based data_size verification for chunk ingress

ac0088d

chore(mempool/chunks): fixup comments

1c9c9d8

chore(mempool/chunks): enrich log message

0dbb12a

fix(mempool/chunks): use saturating add for relative offset

f1dbfa4

glottologist force-pushed the jason/data_size_cache_poisoning_rework branch from 9f3eaee to f1dbfa4 Compare December 9, 2025 09:13

JesseTheRobot requested changes Dec 9, 2025

View reviewed changes

glottologist added 5 commits December 9, 2025 12:47

fix(mempool/chunks): trust the publish confrimed size

a39be0a

perf(mempool/chunks): parallelize storage module size confirmations

26025d6

refactor(types/chunk): extract max_chunk_offset to standalone function

da52a52

perf(domain/storage): add get_chunk_metadata to avoid reading chunk b…

e5ff113

…ytes

fix(api_server/chunks): use ApiStatusResponse

4e28ddd

glottologist requested a review from JesseTheRobot December 9, 2025 15:29

glottologist and others added 3 commits December 9, 2025 15:47

Merge branch 'master' into jason/data_size_cache_poisoning_rework

3eca4e4

chore: refine API formatting a bit

4761f0d

chore: internal error -> internal advisory error

8f998b9

JesseTheRobot approved these changes Dec 9, 2025

View reviewed changes

glottologist merged commit 420b137 into master Dec 9, 2025
17 checks passed

glottologist deleted the jason/data_size_cache_poisoning_rework branch December 9, 2025 18:40

Conversation

glottologist commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Chunk Validation (crates/types/src/chunk.rs)

Mempool Chunk Ingress (crates/actors/src/mempool_service/chunks.rs)

Mempool State (crates/actors/src/mempool_service.rs, pending_chunks.rs, mempool_guard.rs)

Tests (crates/chain/tests/)

Uh oh!

JesseTheRobot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JesseTheRobot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

glottologist commented Dec 4, 2025 •

edited

Loading

Chunk Validation (`crates/types/src/chunk.rs`)

Mempool Chunk Ingress (`crates/actors/src/mempool_service/chunks.rs`)

Mempool State (`crates/actors/src/mempool_service.rs`, `pending_chunks.rs`, `mempool_guard.rs`)

Tests (`crates/chain/tests/`)