feat: improved multiversion tests#1404
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR updates P2P gossip wire-format handling (custom RejectionReason serde, v1/v2 response shaping) and adds a multiversion test harness: per-version config templates, data-transaction HTTP helpers, cluster upgrade/rollback flows, strict cross-node tx/block assertions, tests, and docs. ChangesP2P Gossip Wire Protocol V1/V2 Compatibility
Multiversion Test Harness Cross-Version Validation Infrastructure
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/p2p/src/gossip_client.rs`:
- Around line 734-752: Add a regression test that exercises the
ProtocolVersionsRepr::Single branch by mocking the peer `/protocol_version`
response to return HTTP 200 with body "1" (an integer, not an array) and
asserting the client parses it into Vec<u32> == vec![1]; place the test
alongside the existing in-file tests for gossip parsing so it hits the same code
path that calls response.json().await (the deserialization into
ProtocolVersionsRepr) to ensure the Single(u32) case is covered and does not
regress.
In `@crates/tooling/multiversion-tests/examples/readme.md`:
- Around line 1-12: The README example lacks proper Markdown structure (missing
H1, blank lines around the code fence, and fence language) which causes
markdownlint errors; update the file to add a top-level heading (e.g.,
"Multiversion test example"), add blank lines before and after the paragraph and
code block, and annotate the fenced block with the language (bash) so the
example renders cleanly—apply these changes to the content that contains the
example command and its surrounding text in the readme.md.
In `@crates/tooling/multiversion-tests/src/cluster.rs`:
- Around line 442-445: The code currently masks a missing or non-u64
blockMigrationDepth by using read_consensus_u64(...).unwrap_or(6), which can
under-wait; change this to fail fast: call read_consensus_u64(&consensus,
"blockMigrationDepth") and propagate an error if it returns None or a parse
failure instead of falling back to 6, returning an Err from the surrounding
function (so probe.get_network_config / read_consensus_u64 failures bubble up);
then use the obtained migration_depth to call wait_for_height_at_least(height +
migration_depth + 2, timeout). Ensure you reference the same symbols
(probe.get_network_config, read_consensus_u64, "blockMigrationDepth",
migration_depth, wait_for_height_at_least) when making the change.
- Around line 867-869: The initial peer config generation calls
patch_peer_consensus(..., None) which omits the per-kind template_overlay used
by upgrade_node(); modify the config-generation path so the same template
overlay selected for each peer is passed into
patch_peer_configs/patch_peer_consensus instead of None: propagate the
template_overlay value from the template selection code into the loop over
peer_specs (where config_path is built) and change
patch_peer_configs/patch_peer_consensus signatures/call sites to accept and
apply the overlay (use the same template_overlay variable name as in
upgrade_node to locate the logic).
In `@crates/tooling/multiversion-tests/src/config.rs`:
- Around line 174-204: Add unit tests that directly exercise
extract_consensus_custom_from_template and overlay_template_onto_consensus:
build a base consensus custom map (as toml::Value/table) and an overlay template
string containing nested tables and scalar fields, call
extract_consensus_custom_from_template on the overlay to obtain the overlay map,
call overlay_template_onto_consensus(&mut target_map, &overlay_map), and assert
(1) nested-table keys are merged rather than replaced, (2) scalar fields present
in the overlay replace the target scalar values, and (3) fields like
expected_genesis_hash that should remain sourced from the live chain are
preserved on the target when the overlay does not provide them (and overwritten
only if overlay supplies them). Place tests in the same module/test block near
the functions so they exercise both extraction and recursive merge logic.
In `@fixtures/gossip_fixtures.json`:
- Around line 209-219: One of the fixtures collapsed to the unit-string form so
the legacy object-shaped path in RejectionReason::deserialize() is no longer
exercised; update fixtures/gossip_fixtures.json by keeping at least one of the
"gossip_response_rejected_handshake_required_*" entries as the object-shaped
form (i.e., set the value to an object like {"Rejected": {"HandshakeRequired":
...}} instead of the unit-string) so the legacy deserialization branch for
RejectionReason::deserialize() is covered.
In `@xtask/src/main.rs`:
- Around line 764-792: The code currently only sets environment variables when
base_config_old, base_config_new, or run_config are Some(...), which leaves any
previously inherited env vars intact when the flags are omitted; update the None
branches for base_config_old, base_config_new, and run_config so you explicitly
clear the env before spawning tests (prefer calling
sh.remove_var("IRYS_BASE_CONFIG_OLD"/"IRYS_BASE_CONFIG_NEW"/"IRYS_TEST_RUN_CONFIG")
if available, otherwise call sh.set_var(..., "") to blank them) while keeping
the existing canonicalization and set_var(...) logic in the Some(...) branches.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 8dfa521b-6e58-456e-b613-4db3088eee54
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (19)
crates/p2p/src/gossip_client.rscrates/p2p/src/types.rscrates/tooling/multiversion-tests/Cargo.tomlcrates/tooling/multiversion-tests/README.mdcrates/tooling/multiversion-tests/examples/base-config-new.tomlcrates/tooling/multiversion-tests/examples/base-config-old.tomlcrates/tooling/multiversion-tests/examples/readme.mdcrates/tooling/multiversion-tests/examples/run-config-d071fc03.tomlcrates/tooling/multiversion-tests/src/binary.rscrates/tooling/multiversion-tests/src/cluster.rscrates/tooling/multiversion-tests/src/config.rscrates/tooling/multiversion-tests/src/data_tx.rscrates/tooling/multiversion-tests/src/lib.rscrates/tooling/multiversion-tests/src/run_config.rscrates/tooling/multiversion-tests/src/tests/e2e.rscrates/tooling/multiversion-tests/src/tests/helpers.rscrates/tooling/multiversion-tests/src/tests/upgrade.rsfixtures/gossip_fixtures.jsonxtask/src/main.rs
| // Tolerate both shapes for v1 backwards-compat: | ||
| // * `[1, 2]` — current shape (array of supported versions) | ||
| // * `1` — older v1 peers' `/protocol_version` endpoint, which | ||
| // returns the single u32 it understands. Rather than 404'ing on | ||
| // the route, those peers serve `Ok(1)`, so the path above (which | ||
| // looks at HTTP status) never fires for them. | ||
| #[derive(serde::Deserialize)] | ||
| #[serde(untagged)] | ||
| enum ProtocolVersionsRepr { | ||
| Many(Vec<u32>), | ||
| Single(u32), | ||
| } | ||
| let parsed: ProtocolVersionsRepr = response.json().await.map_err(|error| { | ||
| GossipClientError::GetJsonResponsePayload(peer.gossip.to_string(), error.to_string()) | ||
| })?; | ||
| let versions = match parsed { | ||
| ProtocolVersionsRepr::Many(v) => v, | ||
| ProtocolVersionsRepr::Single(v) => vec![v], | ||
| }; |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | ⚡ Quick win
Add a regression test for the single-integer response shape.
ProtocolVersionsRepr::Single is the new backward-compat path, but the current in-file tests only cover [1, 2]. Please add a case that serves 200 with body 1 so this branch stays protected.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/p2p/src/gossip_client.rs` around lines 734 - 752, Add a regression
test that exercises the ProtocolVersionsRepr::Single branch by mocking the peer
`/protocol_version` response to return HTTP 200 with body "1" (an integer, not
an array) and asserting the client parses it into Vec<u32> == vec![1]; place the
test alongside the existing in-file tests for gossip parsing so it hits the same
code path that calls response.json().await (the deserialization into
ProtocolVersionsRepr) to ensure the Single(u32) case is covered and does not
regress.
| example set of configurations for running multiversion tests from a significantly older version (9f074ccf35319cbb046b150a2e37e4ab37feb66b) to HEAD (fd107a5e9280498ae84fcccc784ffceeeb6f8fa8) | ||
| full command: | ||
| ``` | ||
| cargo xtask multiversion-test \ | ||
| --profile dev \ | ||
| --old-ref 9f074ccf35319cbb046b150a2e37e4ab37feb66b \ | ||
| --new-ref CURRENT \ | ||
| --base-config-old ./crates/tooling/multiversion-tests/examples/base-config-old.toml \ | ||
| --base-config-new ./crates/tooling/multiversion-tests/examples/base-config-new.toml \ | ||
| --run-config ./crates/tooling/multiversion-tests/examples/run-config-d071fc03.toml \ | ||
| -- --no-fail-fast | ||
| ``` |
There was a problem hiding this comment.
Fix the Markdown structure so the example renders cleanly.
This file is already tripping markdownlint on the missing H1, missing blank lines around the fence, and missing fence language. A small cleanup here will keep the example readable in docs tooling.
Suggested patch
- example set of configurations for running multiversion tests from a significantly older version (9f074ccf35319cbb046b150a2e37e4ab37feb66b) to HEAD (fd107a5e9280498ae84fcccc784ffceeeb6f8fa8)
- full command:
- ```
+# Multiversion test example
+
+Example set of configurations for running multiversion tests from a significantly older version (`9f074ccf35319cbb046b150a2e37e4ab37feb66b`) to HEAD (`fd107a5e9280498ae84fcccc784ffceeeb6f8fa8`).
+
+Full command:
+
+```bash
cargo xtask multiversion-test \
--profile dev \
--old-ref 9f074ccf35319cbb046b150a2e37e4ab37feb66b \
--new-ref CURRENT \
--base-config-old ./crates/tooling/multiversion-tests/examples/base-config-old.toml \
--base-config-new ./crates/tooling/multiversion-tests/examples/base-config-new.toml \
--run-config ./crates/tooling/multiversion-tests/examples/run-config-d071fc03.toml \
-- --no-fail-fast</details>
<details>
<summary>🧰 Tools</summary>
<details>
<summary>🪛 markdownlint-cli2 (0.22.1)</summary>
[warning] 1-1: First line in a file should be a top-level heading
(MD041, first-line-heading, first-line-h1)
---
[warning] 3-3: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
---
[warning] 3-3: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
</details>
</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.
In @crates/tooling/multiversion-tests/examples/readme.md around lines 1 - 12,
The README example lacks proper Markdown structure (missing H1, blank lines
around the code fence, and fence language) which causes markdownlint errors;
update the file to add a top-level heading (e.g., "Multiversion test example"),
add blank lines before and after the paragraph and code block, and annotate the
fenced block with the language (bash) so the example renders cleanly—apply these
changes to the content that contains the example command and its surrounding
text in the readme.md.
</details>
<!-- fingerprinting:phantom:medusa:grasshopper:b7542b9e-4e78-4900-b1b9-df9d3b9b2f26 -->
<!-- d98c2f50 -->
<!-- This is an auto-generated comment by CodeRabbit -->
| for node_spec in peer_specs { | ||
| let config_path = temp_base.join(format!("config-{}.toml", node_spec.name)); | ||
| crate::config::patch_peer_consensus(&config_path, &consensus_json, &genesis_hash)?; | ||
| crate::config::patch_peer_consensus(&config_path, &consensus_json, &genesis_hash, None)?; |
There was a problem hiding this comment.
Initial mixed-version peers still skip the new consensus overlay.
upgrade_node() correctly extracts template_overlay from the target template, but the initial startup path always calls patch_peer_consensus(..., None) here. That leaves a NEW peer booting against an OLD genesis without the same backfilled [consensus.Custom] fields that upgrades rely on, so cross-version clusters can still fail before the first upgrade step. Please thread the per-kind overlay into patch_peer_configs() using the same template selected during config generation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/tooling/multiversion-tests/src/cluster.rs` around lines 867 - 869, The
initial peer config generation calls patch_peer_consensus(..., None) which omits
the per-kind template_overlay used by upgrade_node(); modify the
config-generation path so the same template overlay selected for each peer is
passed into patch_peer_configs/patch_peer_consensus instead of None: propagate
the template_overlay value from the template selection code into the loop over
peer_specs (where config_path is built) and change
patch_peer_configs/patch_peer_consensus signatures/call sites to accept and
apply the overlay (use the same template_overlay variable name as in
upgrade_node to locate the logic).
| /// Returns the `[consensus.Custom]` sub-table from a base-config template, or | ||
| /// `None` if the template uses bare `consensus = "Testing"` / `"Mainnet"` / | ||
| /// has no `consensus` key. Used by [`patch_peer_consensus`] as the | ||
| /// cross-version overlay source. | ||
| pub fn extract_consensus_custom_from_template( | ||
| template: &str, | ||
| ) -> Option<toml::map::Map<String, Value>> { | ||
| let parsed: Value = template.parse().ok()?; | ||
| let consensus = parsed.as_table()?.get("consensus")?; | ||
| let custom = consensus.as_table()?.get("Custom")?; | ||
| custom.as_table().cloned() | ||
| } | ||
|
|
||
| /// Recursively merges `overlay` into `target`, with overlay values winning | ||
| /// for shared keys. Tables on both sides are merged in-place; any other value | ||
| /// (or table-vs-non-table mismatch) is replaced by the overlay value. | ||
| fn overlay_template_onto_consensus( | ||
| target: &mut toml::map::Map<String, Value>, | ||
| overlay: &toml::map::Map<String, Value>, | ||
| ) { | ||
| for (key, overlay_value) in overlay { | ||
| match (target.get_mut(key), overlay_value) { | ||
| (Some(Value::Table(target_table)), Value::Table(overlay_table)) => { | ||
| overlay_template_onto_consensus(target_table, overlay_table); | ||
| } | ||
| (_, _) => { | ||
| target.insert(key.clone(), overlay_value.clone()); | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | ⚡ Quick win
Add direct tests for the new consensus overlay semantics.
extract_consensus_custom_from_template() and overlay_template_onto_consensus() are now the compatibility layer for new-only consensus fields, but the test module does not exercise either path yet. A couple of focused tests for nested-table merge, scalar override, and expected_genesis_hash still coming from the live chain would lock this behavior down.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/tooling/multiversion-tests/src/config.rs` around lines 174 - 204, Add
unit tests that directly exercise extract_consensus_custom_from_template and
overlay_template_onto_consensus: build a base consensus custom map (as
toml::Value/table) and an overlay template string containing nested tables and
scalar fields, call extract_consensus_custom_from_template on the overlay to
obtain the overlay map, call overlay_template_onto_consensus(&mut target_map,
&overlay_map), and assert (1) nested-table keys are merged rather than replaced,
(2) scalar fields present in the overlay replace the target scalar values, and
(3) fields like expected_genesis_hash that should remain sourced from the live
chain are preserved on the target when the overlay does not provide them (and
overwritten only if overlay supplies them). Place tests in the same module/test
block near the functions so they exercise both extraction and recursive merge
logic.
| "gossip_response_rejected_handshake_required_none": { | ||
| "Rejected": { | ||
| "HandshakeRequired": null | ||
| } | ||
| "Rejected": "HandshakeRequired" | ||
| }, | ||
| "gossip_response_rejected_handshake_required_not_in_peer_list": { | ||
| "Rejected": { | ||
| "HandshakeRequired": "RequestOriginIsNotInThePeerList" | ||
| } | ||
| "Rejected": "HandshakeRequired" | ||
| }, | ||
| "gossip_response_rejected_handshake_required_origin_mismatch": { | ||
| "Rejected": { | ||
| "HandshakeRequired": "RequestOriginDoesNotMatchExpected" | ||
| } | ||
| "Rejected": "HandshakeRequired" | ||
| }, | ||
| "gossip_response_rejected_handshake_required_unknown_miner": { | ||
| "Rejected": { | ||
| "HandshakeRequired": "MinerAddressIsUnknown" | ||
| } | ||
| "Rejected": "HandshakeRequired" |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | ⚡ Quick win
Keep at least one legacy object-shaped handshake fixture.
These four fixtures now all collapse to the same unit-string payload, so the fixture suite no longer exercises the backward-compat path that accepts legacy {"HandshakeRequired": ...} objects. Since RejectionReason::deserialize() explicitly supports both shapes, please retain an object-form fixture here or add an equivalent dedicated test for it.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@fixtures/gossip_fixtures.json` around lines 209 - 219, One of the fixtures
collapsed to the unit-string form so the legacy object-shaped path in
RejectionReason::deserialize() is no longer exercised; update
fixtures/gossip_fixtures.json by keeping at least one of the
"gossip_response_rejected_handshake_required_*" entries as the object-shaped
form (i.e., set the value to an object like {"Rejected": {"HandshakeRequired":
...}} instead of the unit-string) so the legacy deserialization branch for
RejectionReason::deserialize() is covered.
| let base_config_old = base_config_old | ||
| .map(|p| { | ||
| std::fs::canonicalize(&p).map_err(|e| { | ||
| eyre::eyre!("base_config_old: failed to canonicalize `{p}`: {e}") | ||
| }) | ||
| }) | ||
| .transpose()?; | ||
| let base_config_new = base_config_new | ||
| .map(|p| { | ||
| std::fs::canonicalize(&p).map_err(|e| { | ||
| eyre::eyre!("base_config_new: failed to canonicalize `{p}`: {e}") | ||
| }) | ||
| }) | ||
| .transpose()?; | ||
| if let Some(ref path) = base_config_old { | ||
| sh.set_var("IRYS_BASE_CONFIG_OLD", path); | ||
| } | ||
| if let Some(ref path) = base_config_new { | ||
| sh.set_var("IRYS_BASE_CONFIG_NEW", path); | ||
| } | ||
| let run_config = run_config | ||
| .map(|p| { | ||
| std::fs::canonicalize(&p) | ||
| .map_err(|e| eyre::eyre!("run_config: failed to canonicalize `{p}`: {e}")) | ||
| }) | ||
| .transpose()?; | ||
| if let Some(ref path) = run_config { | ||
| sh.set_var("IRYS_TEST_RUN_CONFIG", path); | ||
| } |
There was a problem hiding this comment.
Clear inherited multiversion env vars when the new flags are omitted.
If the caller already has IRYS_BASE_CONFIG_OLD, IRYS_BASE_CONFIG_NEW, or IRYS_TEST_RUN_CONFIG in their shell, omitting the corresponding xtask flag here does not disable it — the inherited value still reaches the harness, so the run silently uses the wrong template/run-config. Please blank or remove these vars in the None branch before spawning cargo test.
💡 Suggested fix
if let Some(ref path) = base_config_old {
sh.set_var("IRYS_BASE_CONFIG_OLD", path);
+ } else {
+ sh.set_var("IRYS_BASE_CONFIG_OLD", "");
}
if let Some(ref path) = base_config_new {
sh.set_var("IRYS_BASE_CONFIG_NEW", path);
+ } else {
+ sh.set_var("IRYS_BASE_CONFIG_NEW", "");
}
let run_config = run_config
.map(|p| {
std::fs::canonicalize(&p)
.map_err(|e| eyre::eyre!("run_config: failed to canonicalize `{p}`: {e}"))
})
.transpose()?;
if let Some(ref path) = run_config {
sh.set_var("IRYS_TEST_RUN_CONFIG", path);
+ } else {
+ sh.set_var("IRYS_TEST_RUN_CONFIG", "");
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let base_config_old = base_config_old | |
| .map(|p| { | |
| std::fs::canonicalize(&p).map_err(|e| { | |
| eyre::eyre!("base_config_old: failed to canonicalize `{p}`: {e}") | |
| }) | |
| }) | |
| .transpose()?; | |
| let base_config_new = base_config_new | |
| .map(|p| { | |
| std::fs::canonicalize(&p).map_err(|e| { | |
| eyre::eyre!("base_config_new: failed to canonicalize `{p}`: {e}") | |
| }) | |
| }) | |
| .transpose()?; | |
| if let Some(ref path) = base_config_old { | |
| sh.set_var("IRYS_BASE_CONFIG_OLD", path); | |
| } | |
| if let Some(ref path) = base_config_new { | |
| sh.set_var("IRYS_BASE_CONFIG_NEW", path); | |
| } | |
| let run_config = run_config | |
| .map(|p| { | |
| std::fs::canonicalize(&p) | |
| .map_err(|e| eyre::eyre!("run_config: failed to canonicalize `{p}`: {e}")) | |
| }) | |
| .transpose()?; | |
| if let Some(ref path) = run_config { | |
| sh.set_var("IRYS_TEST_RUN_CONFIG", path); | |
| } | |
| if let Some(ref path) = base_config_old { | |
| sh.set_var("IRYS_BASE_CONFIG_OLD", path); | |
| } else { | |
| sh.set_var("IRYS_BASE_CONFIG_OLD", ""); | |
| } | |
| if let Some(ref path) = base_config_new { | |
| sh.set_var("IRYS_BASE_CONFIG_NEW", path); | |
| } else { | |
| sh.set_var("IRYS_BASE_CONFIG_NEW", ""); | |
| } | |
| let run_config = run_config | |
| .map(|p| { | |
| std::fs::canonicalize(&p) | |
| .map_err(|e| eyre::eyre!("run_config: failed to canonicalize `{p}`: {e}")) | |
| }) | |
| .transpose()?; | |
| if let Some(ref path) = run_config { | |
| sh.set_var("IRYS_TEST_RUN_CONFIG", path); | |
| } else { | |
| sh.set_var("IRYS_TEST_RUN_CONFIG", ""); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@xtask/src/main.rs` around lines 764 - 792, The code currently only sets
environment variables when base_config_old, base_config_new, or run_config are
Some(...), which leaves any previously inherited env vars intact when the flags
are omitted; update the None branches for base_config_old, base_config_new, and
run_config so you explicitly clear the env before spawning tests (prefer calling
sh.remove_var("IRYS_BASE_CONFIG_OLD"/"IRYS_BASE_CONFIG_NEW"/"IRYS_TEST_RUN_CONFIG")
if available, otherwise call sh.set_var(..., "") to blank them) while keeping
the existing canonicalization and set_var(...) logic in the Some(...) branches.
0f80812 to
a660cb3
Compare
Two on-the-wire shapes drifted between d071fc0 and HEAD in ways that break cross-version gossip on the receive side: * `RejectionReason::HandshakeRequired` was a unit variant on older nodes (`"HandshakeRequired"`) but became a newtype variant carrying an `Option<HandshakeRequirementReason>` on HEAD (`{"HandshakeRequired": null}`). The diagnostic payload is purely advisory — older peers cannot deserialize the newtype shape. Replace the derived `Serialize`/`Deserialize` for `RejectionReason` with custom impls. Serialize emits the v1 unit-string form for `HandshakeRequired` (dropping the `Option` payload on the wire), keeps the newtype shape for `UnsupportedProtocolVersion(u32)` whose payload is load-bearing, and emits all other variants as unit strings. Deserialize accepts both unit-string and single-key object forms, with an `IgnoredAny` tolerance for legacy emissions of object-shaped unit variants. * `/v1/protocol_version` returned a single `u32` on older nodes but HEAD's `get_protocol_versions` deserializes the response body as `Vec<u32>`. Add an untagged `ProtocolVersionsRepr` enum that accepts either `[1, 2]` or bare `1` and normalizes to a `Vec<u32>`, preserving the `MAX_PROTOCOL_VERSIONS` DDoS guard. Regenerate `fixtures/gossip_fixtures.json` to reflect the v1 wire form of the four `gossip_response_rejected_handshake_required_*` fixtures.
…, data tx flow, and promotion verification The harness used to verify cross-version upgrades by checking that nodes booted, gossiped, and converged. That gave a thin signal: a binary swap could leave silent data-shape corruption on disk and the tests would still pass. This change extends the harness so every test path actually drives transactions through the cluster and strictly validates what every node serves back, across binary boundaries. Cross-version cluster configuration ----------------------------------- * `BinaryKind` (`Old` | `New`) is attached to every `ResolvedBinary`, letting the cluster route per-node config generation through the right base template when OLD and NEW disagree on `NodeConfig` schema. * `ClusterSpec` carries `base_config_old` + `base_config_new` instead of a single template. `Cluster::upgrade_node` regenerates the on-disk config from the new template before respawn, so the new binary never sees the old-shaped file. For peer nodes the new template's `[consensus.Custom]` block (when present) is overlaid on top of what the running genesis serves at `/v1/network/config` via `patch_peer_consensus(template_overlay=...)`, letting users backfill new-only fields without losing the genesis-provided values for shared fields. * `xtask multiversion-test --base-config-old/--base-config-new` point at TOML templates, exported as `IRYS_BASE_CONFIG_OLD/NEW` env vars to the test process. Sample templates for the d071fc0 ↔ HEAD span are committed under `examples/`. Run config (`--run-config`) --------------------------- * New `run_config.rs` module parses an optional TOML run config with three sections: [tx_header] aliases + skip lists for the strict tx-header diff [block_header] same shape, applied to the cross-node block consistency check [tx_build] `keep_default` list of header fields to leave at default at signing time (for spans where a non-default value would change the canonical signature prehash on the older side). * Surfaced via `--run-config` and the `IRYS_TEST_RUN_CONFIG` env var. Default is the empty config — the right baseline for adjacent-release tests where renames are rare. The d071fc0 ↔ HEAD example config lives at `examples/run-config-d071fc03.toml`. Data tx submission and strict verification ------------------------------------------ * New `data_tx.rs` module submits signed Publish-ledger data transactions over plain HTTP (`/v1/anchor`, `/v1/price`, `/v1/tx`, `/v1/tx/{id}`). It uses HEAD's `irys-types` for signing — the wire-shape compatibility of the request/response body is what we're actually testing. * `submit_data_tx` sets `header_size` to a non-default sentinel before signing so the on-disk `Compact` encoding actually exercises non-zero payload bytes. `metadata_format` (the renamed field) is opt-in non-default via `tx_build.keep_default`. * `assert_tx_matches_original` does a strict full-header round trip against `/v1/tx/{id}` on every node, comparing every field in the JSON object via `compare_full_object`. Aliased rename pairs (`bundleFormat` ↔ `metadataFormat`) check presence only; value comparison is skipped because the rename also changes types. * `Cluster::assert_block_index_consistent` enumerates `/v1/block-index?height=0&limit=500` from the genesis, then for each block hash fetches `/v1/block/{hash}` from every node and diffs the responses against the genesis's view via `compare_full_object`. Catches PoA / ledger-metadata / signature drift independently of the tx-header layout. Chunk upload and promotion verification --------------------------------------- * `upload_chunks_for_tx` POSTs every `UnpackedChunk` to `/v1/chunk` so storage nodes can produce ingress proofs and promote the tx from Submit to Publish. * `wait_for_promotion` polls `/v1/tx/{id}/promotion-status` until `promotion_height` is set, on both OLD and NEW nodes (same response shape on both). * `Cluster::submit_promote_and_verify` strings the whole flow together: submit, upload chunks, wait for genesis to promote, then wait for the chain to advance past `block_migration_depth + 2` so every peer's `block_migration_service` durably commits the promotion to `IrysDataTxMetadata`. * `assert_tx_promoted_on_all_nodes` polls every running node for non-`None` `promotion_height` — catches loss of promotion state across binary swaps. Test wiring ----------- * All three e2e tests now submit + promote + verify, not just produce blocks. * All four upgrade tests submit + promote + verify before the first transition; after each transition they re-verify the full tx history is visible everywhere; the strict full-header round-trip and block-index sweep run after every binary swap; promotion preservation is asserted as well. * `rolling_upgrade_all_nodes` does its end-to-end promotion check once at the end of the loop — per-step checks raced the just-restarted miner. * `rollback_after_upgrade` waits past `block_migration_depth + 4` blocks before each binary swap so on-disk records are actually populated when the migrations run. Dependency additions -------------------- * `irys-types` (with `test-utils`) for `IrysSigner`, `BoundedFee`, `DataTransaction`, `H256`, `U256`. * `irys-api-client` for typed wire shapes shared with chain-tests. * `k256` for constructing the signer from the funded dev key. * `hex`, `eyre` for support.
…ture improvements Bring the README in line with the new harness shape: * Add `## Cross-Version Configuration` covering the `--base-config-old` / `--base-config-new` template flow and the `--run-config` TOML structure (tx-header / block-header alias and skip lists, `tx_build.keep_default`). Includes a fully-worked invocation for the d071fc0 ↔ HEAD span and the simpler adjacent-release case. * Document the three-layer cross-version assertion stack (visibility, strict tx-header round-trip, cross-node block-header consistency, plus end-to-end tx promotion). * Update the env-vars table to include `IRYS_BASE_CONFIG_OLD/NEW` and `IRYS_TEST_RUN_CONFIG`, and pair every variable with its matching xtask flag. * Refresh the architecture tree (new `examples/` dir, new `data_tx.rs` and `run_config.rs` modules, tests now live under `src/tests/` rather than a separate `tests/` directory). * Refresh the test-suite tables to reflect that every test now submits + promotes + verifies, and explicitly call out the rollback test's `block_migration_depth` wait and end-of-loop promotion check on rolling upgrades. * Add `## Future Improvements` at the bottom enumerating concrete follow-ups: chunk-level read-back, non-default values for renamed fields on adjacent-release runs, long-running rollback populations, true gossip-isolated rollback (requires a bootstrap-from-disk flag on the node binary), partition + upgrade combinations, commitment-tx coverage, parallelization, build cache reuse, multi-mismatch reporting in `compare_full_object`, and a per-tx block-signature round-trip.
1fbba88 to
1ed43a6
Compare
Describe the changes
This PR improves/expands the multiversion test harness as a side-effect of testing the mainnet -> master upgrade jump.
Checklist
Summary by CodeRabbit