feat(snapshot): portable chain-state import/export tooling#1419
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughAdds end-to-end snapshot export/import: workspace deps and manifest schema, deterministic zstd+tar archive packing/unpacking, MDBX and Reth snapshot utilities, CLI ChangesNode snapshot export and import
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/cli/proptest-regressions/snapshot/archive.txt (1)
8-8:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRemove the stray
8line from the regression seed file.This line does not match proptest regression seed syntax and looks accidental; it can make regression-case parsing brittle.
Suggested fix
-8🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/cli/proptest-regressions/snapshot/archive.txt` at line 8, The regression snapshot file snapshot/archive.txt contains a stray line with only "8" that doesn't match proptest seed syntax; remove that line so the file contains only valid regression seed entries (and ensure the file ends with a single newline). Locate snapshot/archive.txt and delete the standalone "8" entry, saving the cleaned file.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@MAINNET_BETA.md`:
- Around line 189-224: Add missing blank lines around the new snapshot headings
and the table to satisfy MD022/MD058: insert a blank line before and after the
"## Exporting a snapshot" and "## Importing a snapshot" headings, ensure there
is a blank line between the fenced code block and the following paragraph, and
add a blank line before the "## Snapshot contents" heading and before the table
header line starting with "| Layer | Included | Notes |". Locate these sections
by the exact heading text ("## Exporting a snapshot", "## Importing a snapshot",
"## Snapshot contents") and the code fence containing the irys-cli snapshot
examples, then add the needed blank lines.
---
Outside diff comments:
In `@crates/cli/proptest-regressions/snapshot/archive.txt`:
- Line 8: The regression snapshot file snapshot/archive.txt contains a stray
line with only "8" that doesn't match proptest seed syntax; remove that line so
the file contains only valid regression seed entries (and ensure the file ends
with a single newline). Locate snapshot/archive.txt and delete the standalone
"8" entry, saving the cleaned file.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7d88750b-7c44-4992-b60e-e1d00c32b59b
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (18)
Cargo.tomlMAINNET_BETA.mdcrates/cli/Cargo.tomlcrates/cli/proptest-regressions/snapshot/archive.txtcrates/cli/src/cli_args.rscrates/cli/src/commands.rscrates/cli/src/main.rscrates/cli/src/snapshot/archive.rscrates/cli/src/snapshot/export.rscrates/cli/src/snapshot/import.rscrates/cli/src/snapshot/manifest.rscrates/cli/src/snapshot/mod.rscrates/database/Cargo.tomlcrates/database/src/lib.rscrates/database/src/snapshot.rscrates/reth-node-bridge/Cargo.tomlcrates/reth-node-bridge/src/lib.rscrates/reth-node-bridge/src/snapshot.rs
|
Benchmark results: https://irys-xyz.github.io/irys/dev/bench/jason%2Fsnapshotting/index.html |
|
@CodeRabbit full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/cli/src/commands.rs`:
- Line 126: Replace the typo "mis-stamp" in the comment that references chain_id
with either "misstamp" or a clearer rephrase (e.g., "would cause an incorrect
snapshot timestamp") so the spell-checker passes; locate the comment near the
chain_id mention in the commands module (around the chain_id comment in the
commands.rs comment block) and update the wording accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 07f2cee0-06fe-407d-bda8-19f36c001e5b
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (18)
Cargo.tomlMAINNET_BETA.mdcrates/cli/Cargo.tomlcrates/cli/proptest-regressions/snapshot/archive.txtcrates/cli/src/cli_args.rscrates/cli/src/commands.rscrates/cli/src/main.rscrates/cli/src/snapshot/archive.rscrates/cli/src/snapshot/export.rscrates/cli/src/snapshot/import.rscrates/cli/src/snapshot/manifest.rscrates/cli/src/snapshot/mod.rscrates/database/Cargo.tomlcrates/database/src/lib.rscrates/database/src/snapshot.rscrates/reth-node-bridge/Cargo.tomlcrates/reth-node-bridge/src/lib.rscrates/reth-node-bridge/src/snapshot.rs
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/cli/src/commands.rs`:
- Around line 126-149: verified_chain_id_for_export currently only aborts when
both genesis files exist and differ, allowing unlabeled/mislabeled snapshots if
either genesis is missing; update the function (verified_chain_id_for_export) to
require that both read_optional_file results for crate::snapshot::GENESIS_FILE
are present and identical before returning the config chain_id, and otherwise
eyre::bail (i.e., fail-closed); locate the read_optional_file calls and the
existing conditional comparing local and target and change the logic to treat
any missing genesis or any mismatch as an error (include the same contextual
fields like data_dir.display() and node_config.base_directory.display() in the
bail message) so run_export cannot emit a manifest.chain_id unless the genesis
identity is verifiably matched.
In `@crates/cli/src/snapshot/import.rs`:
- Around line 223-254: Both verify_checksums and place_manifest_files assume
manifest paths are regular files but never check inode types; this allows
symlinks or other special files to be imported. Fix by performing a
non-following metadata check (use std::fs::symlink_metadata or equivalent) on
staging.join(entry.path) for each ManifestFile and bail (or return an error) if
the file_type is not a regular file (metadata.file_type().is_file() == false);
add this guard at the start of verify_checksums and repeat in
place_manifest_files (or centralize into a small helper used by both) so you
never hash, copy, or rename symlinks, device nodes, or directories.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 2f246e64-2e0e-4ee1-9b32-dcdf3fd2ffba
📒 Files selected for processing (8)
MAINNET_BETA.mdcrates/cli/src/commands.rscrates/cli/src/snapshot/archive.rscrates/cli/src/snapshot/export.rscrates/cli/src/snapshot/import.rscrates/cli/src/snapshot/mod.rscrates/reth-node-bridge/src/snapshot.rscrates/types/src/versions.rs
|
@CodeRabbit full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (2)
crates/cli/src/commands.rs (1)
137-149:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFail closed when genesis identity cannot be proven.
At Line 137, export only fails on mismatch when both genesis files exist; if either is missing, the function still returns
chain_id. That can still mislabel snapshots.🛠️ Proposed fix
fn verified_chain_id_for_export( node_config: &NodeConfig, data_dir: &std::path::Path, chain_id: u64, ) -> eyre::Result<u64> { @@ let local_bytes = read_optional_file(&local_path)?; let target_bytes = read_optional_file(&target_path)?; - if let (Some(local), Some(target)) = (local_bytes, target_bytes) - && local != target - { - eyre::bail!( - "--data-dir {} has a different {} than the configured base_directory {}; \ - the data dir appears to belong to a different chain. Refusing to label \ - the snapshot with chain_id={chain_id}.", - data_dir.display(), - crate::snapshot::GENESIS_FILE, - node_config.base_directory.display(), - ); - } - Ok(chain_id) + match (local_bytes, target_bytes) { + (Some(local), Some(target)) if local == target => Ok(chain_id), + _ => eyre::bail!( + "unable to verify chain identity for --data-dir {} against configured base_directory {} using {}; \ + refusing to label snapshot with chain_id={chain_id}", + data_dir.display(), + node_config.base_directory.display(), + crate::snapshot::GENESIS_FILE, + ), + } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/cli/src/commands.rs` around lines 137 - 149, The current check only bails when both genesis files exist and differ; instead, fail closed whenever the genesis identity cannot be proven — i.e., if either genesis is missing or they differ. Replace the existing condition (which matches on (Some(local), Some(target)) && local != target) with a conservative check that bails when local_bytes.is_none() || target_bytes.is_none() || local_bytes != target_bytes, keeping the same eyre::bail! message that references data_dir, crate::snapshot::GENESIS_FILE, node_config.base_directory and chain_id.crates/cli/src/snapshot/import.rs (1)
223-253:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRe-add the non-following regular-file guard before hashing and placement.
File::openandrenamecurrently operate on whatever inode is atstaging.join(entry.path). If an extracted manifest path is a symlink or other non-regular file, checksum verification follows it and placement can import it intodata_dir. Gate both paths with a sharedsymlink_metadata(...).file_type().is_file()check first.🔒 Minimal fix
+fn ensure_regular_file(path: &Path) -> eyre::Result<()> { + let meta = std::fs::symlink_metadata(path) + .with_context(|| format!("stat extracted file {}", path.display()))?; + if !meta.file_type().is_file() { + eyre::bail!("manifest entry {} is not a regular file", path.display()); + } + Ok(()) +} + fn place_manifest_files( staging: &Path, target: &Path, files: &[ManifestFile], force: bool, ) -> eyre::Result<()> { for entry in files { let src = staging.join(&entry.path); + ensure_regular_file(&src)?; let dst = target.join(&entry.path); if dst.exists() { if !force { eyre::bail!( @@ fn verify_checksums(staging: &std::path::Path, expected: &[ManifestFile]) -> eyre::Result<()> { for entry in expected { let path = staging.join(&entry.path); + ensure_regular_file(&path)?; let reader = std::io::BufReader::new( std::fs::File::open(&path) .with_context(|| format!("opening extracted file {}", path.display()))?, );Also applies to: 271-297
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/cli/src/snapshot/import.rs` around lines 223 - 253, The code currently opens and moves whatever inode is at staging.join(entry.path) (e.g., in place_manifest_files), which allows symlinks or non-regular files to be hashed and imported; add a guard that calls std::fs::symlink_metadata(&src)?.file_type().is_file() before any hashing or placement and fail (eyre::bail! with a clear message including src.display()) if it is not a regular file; apply the same symlink_metadata/is_file() gate to the other corresponding placement/hashing path referenced in the diff so both code paths validate the source is a regular file before File::open, std::fs::rename, or std::fs::copy.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/cli/src/snapshot/archive.rs`:
- Around line 115-120: The manifest read functions (e.g., read_manifest_from_dir
and the similar reader-based helper around lines 135-165) currently slurp the
entire manifest into a String with no size cap; update them to enforce a small
upper bound (e.g., MAX_MANIFEST_BYTES constant) before deserializing
SnapshotManifest: open the file (or accept the provided reader), check the file
size via metadata and reject if > MAX_MANIFEST_BYTES, or read via a
BufReader.take(MAX_MANIFEST_BYTES + 1) and fail if more than the limit was
returned, then parse the bounded byte buffer with
serde_json::from_slice/from_str; keep error contexts (format!("reading/parsing
manifest {}", path.display())) and reference MANIFEST_FILENAME and
SnapshotManifest when making changes.
In `@crates/cli/src/snapshot/import.rs`:
- Around line 60-63: verify_no_extra_entries currently only checks top-level
entries (via manifest_root_entries) so undeclared nested files like reth/jwt.hex
or irys_consensus_data/extra slip through; change verify_no_extra_entries to
recursively walk the extracted staging tree and compute exact relative paths,
then reject any path not listed in manifest.files plus the manifest.json entry.
Update calls that rely on root-only behavior (the call around
manifest_root_entries and the second usage mentioned at lines 175-191) to pass
the full extracted.files list or staging.path() so the recursive check validates
the complete tree before running sanity_check_consensus_db and
sanity_check_reth_db. Ensure error messages identify the offending relative
paths and reference manifest files for clarity.
In `@crates/database/Cargo.toml`:
- Line 37: The function open_fixture currently declares a return type of
tempfile::TempDir but the project requires using TempDirBuilder; update the
signature of open_fixture() to return (tempfile::TempDirBuilder, DatabaseEnv)
instead of (tempfile::TempDir, DatabaseEnv), and change its implementation to
return the TempDirBuilder instance (e.g. TempDirBuilder::new() or the builder
variable) rather than calling .build() so the returned type matches the
signature (adjust any returned expression accordingly).
---
Duplicate comments:
In `@crates/cli/src/commands.rs`:
- Around line 137-149: The current check only bails when both genesis files
exist and differ; instead, fail closed whenever the genesis identity cannot be
proven — i.e., if either genesis is missing or they differ. Replace the existing
condition (which matches on (Some(local), Some(target)) && local != target) with
a conservative check that bails when local_bytes.is_none() ||
target_bytes.is_none() || local_bytes != target_bytes, keeping the same
eyre::bail! message that references data_dir, crate::snapshot::GENESIS_FILE,
node_config.base_directory and chain_id.
In `@crates/cli/src/snapshot/import.rs`:
- Around line 223-253: The code currently opens and moves whatever inode is at
staging.join(entry.path) (e.g., in place_manifest_files), which allows symlinks
or non-regular files to be hashed and imported; add a guard that calls
std::fs::symlink_metadata(&src)?.file_type().is_file() before any hashing or
placement and fail (eyre::bail! with a clear message including src.display()) if
it is not a regular file; apply the same symlink_metadata/is_file() gate to the
other corresponding placement/hashing path referenced in the diff so both code
paths validate the source is a regular file before File::open, std::fs::rename,
or std::fs::copy.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: adfe6165-4f59-40b2-87a7-0eb6cca2a567
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (19)
Cargo.tomlMAINNET_BETA.mdcrates/cli/Cargo.tomlcrates/cli/proptest-regressions/snapshot/archive.txtcrates/cli/src/cli_args.rscrates/cli/src/commands.rscrates/cli/src/main.rscrates/cli/src/snapshot/archive.rscrates/cli/src/snapshot/export.rscrates/cli/src/snapshot/import.rscrates/cli/src/snapshot/manifest.rscrates/cli/src/snapshot/mod.rscrates/database/Cargo.tomlcrates/database/src/lib.rscrates/database/src/snapshot.rscrates/reth-node-bridge/Cargo.tomlcrates/reth-node-bridge/src/lib.rscrates/reth-node-bridge/src/snapshot.rscrates/types/src/versions.rs
| rand.workspace = true | ||
| proptest.workspace = true | ||
| rstest.workspace = true | ||
| tempfile.workspace = true |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Check for direct tempfile usage that violates the guideline
# Search for tempfile::tempdir() or tempfile::TempDir usage in snapshot-related code
rg -n -C3 'tempfile::(tempdir|TempDir)' crates/database/src/snapshot.rs crates/database/src/ -g '*.rs'
# Also check for std::env::temp_dir usage
rg -n -C3 'std::env::temp_dir' crates/database/src/snapshot.rs crates/database/src/ -g '*.rs'
# Verify that TempDirBuilder is used instead
rg -n -C3 'TempDirBuilder' crates/database/src/snapshot.rs crates/database/src/ -g '*.rs'Repository: Irys-xyz/irys
Length of output: 16412
Update the return type on snapshot.rs:184 to use TempDirBuilder instead of tempfile::TempDir.
The function open_fixture() declares a return type of tempfile::TempDir which violates the project guideline requiring TempDirBuilder for all temporary directories. The function correctly uses TempDirBuilder::new().build() in the implementation (line 185), but the return type annotation must be updated to match.
Change line 184 from:
fn open_fixture() -> (tempfile::TempDir, DatabaseEnv) {to use the type returned by TempDirBuilder::build() instead of exposing tempfile::TempDir directly in the function signature.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/database/Cargo.toml` at line 37, The function open_fixture currently
declares a return type of tempfile::TempDir but the project requires using
TempDirBuilder; update the signature of open_fixture() to return
(tempfile::TempDirBuilder, DatabaseEnv) instead of (tempfile::TempDir,
DatabaseEnv), and change its implementation to return the TempDirBuilder
instance (e.g. TempDirBuilder::new() or the builder variable) rather than
calling .build() so the returned type matches the signature (adjust any returned
expression accordingly).
There was a problem hiding this comment.
TempDirBuilder is a builder that consumes itself: TempDirBuilder::new().build() returns tempfile::TempDir (crates/utils/testing-utils/src/utils.rs:120 — pub fn build(self) -> TempDir). The project guideline mandates how a temp dir is created (via TempDirBuilder rather than tempfile::tempdir() directly), not what type the resource handle is — and open_fixture at snapshot.rs:185 already creates the dir correctly via TempDirBuilder::new().build(). Returning the builder rather than the built TempDir would leak the cleanup contract: nothing would be created on disk until the caller calls .build(), and the original allocation guarantees would be lost. No change warranted.
There was a problem hiding this comment.
@glottologist, you're absolutely right — thank you for the clear explanation. The project guideline governs the creation path (use TempDirBuilder::new().build() rather than tempfile::tempdir() directly), not the return type of the handle. Since TempDirBuilder::build() consumes the builder and returns tempfile::TempDir, returning TempDir from open_fixture is the correct contract — changing it to return the builder would break cleanup semantics entirely. No change is needed here.
✏️ Learnings added
Learnt from: glottologist
Repo: Irys-xyz/irys PR: 1419
File: crates/database/Cargo.toml:37-37
Timestamp: 2026-05-20T13:13:40.194Z
Learning: In the Irys codebase, the project guideline for temporary directories mandates using `TempDirBuilder::new().build()` for creation (from `irys-testing-utils`, defined in `crates/utils/testing-utils/src/utils.rs`), rather than calling `tempfile::tempdir()` directly. However, `TempDirBuilder::build()` returns `tempfile::TempDir`, so function return types and variable bindings should still use `tempfile::TempDir`. The guideline controls the *creation path*, not the handle type.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
Describe the changes
Before: Lagging nodes could only catch up via P2P sync — bandwidth-bound and unreliable for tens of GB of state
After:
irys-cli snapshot export|importproduces and restores a portable content-addressed.tar.zstarchive of chain state.Related Issue(s)
Please link to the issue(s) that will be closed with this PR.
Checklist
Additional Context
Add any other context about the pull request here.
Summary by CodeRabbit
New Features
Documentation
Chores
Tests