feat: Add failure tracking & test filtering to xtask test by JesseTheRobot · Pull Request #1093 · Irys-xyz/irys

JesseTheRobot · 2026-01-05T18:29:23Z

NOTE: WRITTEN (mostly) BY CLAUDE WITH GUIDANCE

Describe the changes
This PR changes xtask test to run through a nextest wrapper binary that can track test success/failures and log them to a file. This file can then be used to re-run just the failing tests, automatically.
new args:
--rereun-failures - uses the saved failure information from the last run to filter out all tests that passed
--fresh - wipes the saved failure information, and runs the full test suite
--no-update-failures - does not update the saved failure information. useful for a batch of tests that may partially respond to fixes.

Summary by CodeRabbit

New Features
- Persistent test-failure tracking across runs with per-test result recording, automatic updates, and status reporting.
- New test flags to rerun only failures, reset runs, and skip updating failure state; optional coverage support.
- Generates a wrapper and temporary config to integrate failure-tracking and apply failure-based filtering.
Tests
- Added unit tests for argument parsing, test-name extraction, failure recording, and result aggregation.
Documentation
- Added "re-run only failing tests" usage to Testing docs.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Checklist

Tests have been added/updated for the changes.
Documentation has been updated for the changes (if applicable).
The code follows Rust's style guidelines.

Additional Context
Add any other context about the pull request here.

NOTE: WRITTEN (mostly) BY CLAUDE WITH GUIDANCE

coderabbitai · 2026-01-05T18:29:30Z

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

Adds cross-run test-failure tracking to xtask: a new failures module for persisted state, a nextest wrapper binary that records per-test results, Cargo manifest updates to declare library and binaries, and integration into xtask's test command with flags to rerun/refresh/skip updates.

Changes

Cohort / File(s)	Summary
Manifest Configuration `xtask/Cargo.toml`	Added `default-run = "xtask"`, declared `[lib]` and two `[[bin]]` targets (`xtask`, `nextest-failure-tracker`); added workspace-scoped `serde`/`serde_json` and `tempfile` dependency.
Failure Tracking Infrastructure `xtask/src/failures.rs`	New module providing `FailuresFile`, `TestResult`, `RunResults`; path helpers for failures/results, JSON/JSONL load/save/clear, append-with-locking, `ensure_dir`, `generate_nextest_config`, and `build_failure_filter`.
Library Entry Point `xtask/src/lib.rs`	New crate root exposing `pub mod failures;` and crate documentation.
Nextest Wrapper Binary `xtask/src/bin/nextest-failure-tracker.rs`	New binary that runs a test binary, extracts test name, records pass/fail via `append_result`, logs errors, and returns the test exit code; includes unit tests for name extraction.
Test Workflow Integration `xtask/src/main.rs`	Expanded `Commands::Test` to add `rerun_failures`, `fresh`, `no_update_failures`; builds wrapper, generates Nextest config, optionally filters rerun set, invokes cargo-nextest with wrapper, and updates `FailuresFile` / `RunResults` post-run.
Docs `README.md`	Added instructions showing how to re-run only failing tests via xtask test flags.

Sequence Diagram(s)

sequenceDiagram
    participant User as User
    participant Xtask as xtask (main)
    participant Nextest as cargo-nextest
    participant Wrapper as nextest-failure-tracker
    participant Test as Test Binary
    participant Storage as failures storage

    rect rgb(232,240,255)
    Note over User,Xtask: User runs `cargo xtask test` (flags: rerun_failures / fresh / no_update_failures)
    end

    User->>Xtask: invoke test command
    Xtask->>Storage: load FailuresFile / RunResults
    alt rerun requested
        Xtask->>Xtask: build_failure_filter (if failures exist)
    end
    Xtask->>Xtask: build wrapper binary & generate_nextest_config
    Xtask->>Nextest: invoke with config & wrapper
    Nextest->>Wrapper: spawn wrapper per-test
    Wrapper->>Test: execute test binary
    Test-->>Wrapper: exit code
    alt failed
        Wrapper->>Storage: append_result(name, passed=false)
        Wrapper-->>Nextest: return nonzero
    else passed
        Wrapper->>Storage: append_result(name, passed=true)
        Wrapper-->>Nextest: return zero
    end
    Nextest-->>Xtask: run complete
    rect rgb(232,255,232)
    Note over Xtask,Storage: Post-run aggregation and update of FailuresFile / RunResults
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: Add failure tracking & test filtering to xtask test' directly and clearly summarizes the main changes: implementing failure tracking and test filtering for the xtask test command, which aligns perfectly with the substantial changes across multiple new files and modules.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

📜 Recent review details

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2872abb and 4a54648.

📒 Files selected for processing (1)

README.md

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

JesseTheRobot · 2026-01-05T18:29:43Z

@coderabbitai full review

coderabbitai · 2026-01-05T18:29:49Z

✅ Actions performed

Full review triggered.

coderabbitai

Actionable comments posted: 7

Fix all issues with AI Agents 🤖

In @xtask/src/bin/nextest-failure-tracker.rs:
- Around line 72-92: Add a test documenting the empty-args behavior by creating
a new test function named like test_extract_test_name_empty_args that calls
extract_test_name(&Vec::<String>::new()) and annotating it with #[should_panic]
so the panic behavior is asserted; place it in the existing tests module
alongside test_extract_test_name_basic and
test_extract_test_name_with_leading_flags to clearly document the current
behavior.
- Around line 46-54: Recording only failures causes RunResults.into_sets() to
never see passed tests, so previously failing tests aren't cleared; call
append_result(&result) for every test result (not only when !passed) and handle
any Err(e) the same way currently done, i.e., remove the surrounding if !passed
{ ... } guard around the append_result(&result) call so append_result(&result)
is invoked unconditionally when processing each Result for the
nextest-failure-tracker; keep the existing error logging behavior for
append_result failures.
- Around line 63-70: The function extract_test_name currently panics when it
cannot find a non-flag argument; change it to return a Result<String, String>
(or Option<String>) instead of calling panic!, e.g., have
extract_test_name(args: &[String]) -> Result<String, String> and return Err with
a clear error message when no test name is found, then update callers (notably
main) to handle the error by printing a user-friendly message and exiting
gracefully (or falling back to a default) rather than letting the wrapper crash.

In @xtask/src/failures.rs:
- Around line 284-291: The current string-based check using
config_content.contains("experimental") and contains("wrapper-scripts") is
fragile and can match comments or string values; parse and manipulate the TOML
AST instead (e.g., use the toml or toml_edit crate) to reliably detect and
modify the experimental table/array and append "wrapper-scripts" if missing, or
at minimum perform a safer line-based/key-aware check (match an unquoted key
like /^\\s*experimental\\s*=/ or locate the experimental table/array entry)
before modifying config_content so you don't accidentally touch comments or
unrelated strings.
- Around line 203-254: Remove the two redundant local imports of the Write trait
inside append_result_to by deleting the lines `use std::io::Write as _;` that
appear before the two writeln! calls; the Write trait is already imported at the
top of the file so no additional imports are necessary—keep the writeln! calls
as-is in the append_result_to function.

In @xtask/src/main.rs:
- Around line 196-202: The args vector nextest_args contains a redundant
"--tests" entry alongside "--all-targets"; remove the "--tests". Locate the
nextest_args initialization (the vec! with "nextest", "run", "--workspace",
"--tests", "--all-targets") and delete the "--tests". Ensure no other code
relies on "--tests" being present and run the command locally to verify behavior
remains unchanged.
- Around line 8-11: The import list in xtask::failures includes an unused symbol
build_failure_filter; remove build_failure_filter from the use statement so only
actually used items (failures, generate_nextest_config, get_failures_file_path,
FailuresFile, RunResults) are imported. Locate the use xtask::failures::{ ... }
declaration and delete build_failure_filter from that list (keep
generate_nextest_config as it internally calls build_failure_filter).

📜 Review details

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3f5d8ef and 832171f.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (5)

xtask/Cargo.toml
xtask/src/bin/nextest-failure-tracker.rs
xtask/src/failures.rs
xtask/src/lib.rs
xtask/src/main.rs

🧰 Additional context used

🧬 Code graph analysis (2)

xtask/src/bin/nextest-failure-tracker.rs (2)

xtask/src/failures.rs (1)

append_result (204-206)

xtask/src/main.rs (1)

main (564-569)

xtask/src/main.rs (1)

xtask/src/failures.rs (8)

build_failure_filter (321-327)

generate_nextest_config (265-318)

get_failures_file_path (61-63)

clear (112-118)

clear (173-184)

ensure_dir (257-259)

load (78-80)

load (146-148)

🪛 GitHub Check: cargo check

xtask/src/main.rs