feat: Add failure tracking & test filtering to xtask test#1093
Conversation
NOTE: WRITTEN (mostly) BY CLAUDE WITH GUIDANCE
|
Caution Review failedThe pull request is closed. 📝 WalkthroughWalkthroughAdds cross-run test-failure tracking to xtask: a new Changes
Sequence Diagram(s)sequenceDiagram
participant User as User
participant Xtask as xtask (main)
participant Nextest as cargo-nextest
participant Wrapper as nextest-failure-tracker
participant Test as Test Binary
participant Storage as failures storage
rect rgb(232,240,255)
Note over User,Xtask: User runs `cargo xtask test` (flags: rerun_failures / fresh / no_update_failures)
end
User->>Xtask: invoke test command
Xtask->>Storage: load FailuresFile / RunResults
alt rerun requested
Xtask->>Xtask: build_failure_filter (if failures exist)
end
Xtask->>Xtask: build wrapper binary & generate_nextest_config
Xtask->>Nextest: invoke with config & wrapper
Nextest->>Wrapper: spawn wrapper per-test
Wrapper->>Test: execute test binary
Test-->>Wrapper: exit code
alt failed
Wrapper->>Storage: append_result(name, passed=false)
Wrapper-->>Nextest: return nonzero
else passed
Wrapper->>Storage: append_result(name, passed=true)
Wrapper-->>Nextest: return zero
end
Nextest-->>Xtask: run complete
rect rgb(232,255,232)
Note over Xtask,Storage: Post-run aggregation and update of FailuresFile / RunResults
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
📜 Recent review detailsConfiguration used: Repository UI Review profile: ASSERTIVE Plan: Pro 📒 Files selected for processing (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 7
Fix all issues with AI Agents 🤖
In @xtask/src/bin/nextest-failure-tracker.rs:
- Around line 72-92: Add a test documenting the empty-args behavior by creating
a new test function named like test_extract_test_name_empty_args that calls
extract_test_name(&Vec::<String>::new()) and annotating it with #[should_panic]
so the panic behavior is asserted; place it in the existing tests module
alongside test_extract_test_name_basic and
test_extract_test_name_with_leading_flags to clearly document the current
behavior.
- Around line 46-54: Recording only failures causes RunResults.into_sets() to
never see passed tests, so previously failing tests aren't cleared; call
append_result(&result) for every test result (not only when !passed) and handle
any Err(e) the same way currently done, i.e., remove the surrounding if !passed
{ ... } guard around the append_result(&result) call so append_result(&result)
is invoked unconditionally when processing each Result for the
nextest-failure-tracker; keep the existing error logging behavior for
append_result failures.
- Around line 63-70: The function extract_test_name currently panics when it
cannot find a non-flag argument; change it to return a Result<String, String>
(or Option<String>) instead of calling panic!, e.g., have
extract_test_name(args: &[String]) -> Result<String, String> and return Err with
a clear error message when no test name is found, then update callers (notably
main) to handle the error by printing a user-friendly message and exiting
gracefully (or falling back to a default) rather than letting the wrapper crash.
In @xtask/src/failures.rs:
- Around line 284-291: The current string-based check using
config_content.contains("experimental") and contains("wrapper-scripts") is
fragile and can match comments or string values; parse and manipulate the TOML
AST instead (e.g., use the toml or toml_edit crate) to reliably detect and
modify the experimental table/array and append "wrapper-scripts" if missing, or
at minimum perform a safer line-based/key-aware check (match an unquoted key
like /^\\s*experimental\\s*=/ or locate the experimental table/array entry)
before modifying config_content so you don't accidentally touch comments or
unrelated strings.
- Around line 203-254: Remove the two redundant local imports of the Write trait
inside append_result_to by deleting the lines `use std::io::Write as _;` that
appear before the two writeln! calls; the Write trait is already imported at the
top of the file so no additional imports are necessary—keep the writeln! calls
as-is in the append_result_to function.
In @xtask/src/main.rs:
- Around line 196-202: The args vector nextest_args contains a redundant
"--tests" entry alongside "--all-targets"; remove the "--tests". Locate the
nextest_args initialization (the vec! with "nextest", "run", "--workspace",
"--tests", "--all-targets") and delete the "--tests". Ensure no other code
relies on "--tests" being present and run the command locally to verify behavior
remains unchanged.
- Around line 8-11: The import list in xtask::failures includes an unused symbol
build_failure_filter; remove build_failure_filter from the use statement so only
actually used items (failures, generate_nextest_config, get_failures_file_path,
FailuresFile, RunResults) are imported. Locate the use xtask::failures::{ ... }
declaration and delete build_failure_filter from that list (keep
generate_nextest_config as it internally calls build_failure_filter).
📜 Review details
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
xtask/Cargo.tomlxtask/src/bin/nextest-failure-tracker.rsxtask/src/failures.rsxtask/src/lib.rsxtask/src/main.rs
🧰 Additional context used
🧬 Code graph analysis (2)
xtask/src/bin/nextest-failure-tracker.rs (2)
xtask/src/failures.rs (1)
append_result(204-206)xtask/src/main.rs (1)
main(564-569)
xtask/src/main.rs (1)
xtask/src/failures.rs (8)
build_failure_filter(321-327)generate_nextest_config(265-318)get_failures_file_path(61-63)clear(112-118)clear(173-184)ensure_dir(257-259)load(78-80)load(146-148)
🪛 GitHub Check: cargo check
xtask/src/main.rs
[failure] 9-9:
unused import: build_failure_filter
🪛 GitHub Check: cargo clippy
xtask/src/main.rs
[failure] 9-9:
unused import: build_failure_filter
🪛 GitHub Check: cargo doc
xtask/src/main.rs
[failure] 9-9:
unused import: build_failure_filter
🪛 GitHub Check: cargo fmt
xtask/src/main.rs
[failure] 9-9:
unused import: build_failure_filter
🪛 GitHub Check: cargo test
xtask/src/main.rs
[failure] 9-9:
unused import: build_failure_filter
🪛 GitHub Check: cargo unused-deps
xtask/src/main.rs
[failure] 9-9:
unused import: build_failure_filter
🔇 Additional comments (9)
xtask/Cargo.toml (1)
5-27: LGTM!The manifest changes are well-structured:
default-runensures the main binary is the default, the lib target properly exposes shared code, and the new binary for failure tracking is correctly declared. The workspace-scoped serde dependencies and tempfile are appropriate for the JSON persistence and config generation needs.xtask/src/lib.rs (1)
1-6: LGTM!Clean library entry point that properly exposes the
failuresmodule for cross-binary usage.xtask/src/main.rs (2)
95-117: LGTM!The
build_wrapperfunction correctly builds the failure tracker binary and locates it via cargo metadata. The hardcodeddebugpath matches the build command which doesn't specify--release.
227-266: Result processing logic is correct, assuming the wrapper fix is applied.The flow properly updates the failures file by removing newly-passed tests and adding new failures. Note that this depends on the wrapper recording all test results (not just failures) as flagged in the wrapper review.
xtask/src/failures.rs (5)
21-68: LGTM!Path resolution logic is sound: it correctly prioritizes
CARGO_TARGET_DIR, then walks up to find the workspace root viaCargo.lock, with a sensible fallback to./target.
70-127: LGTM!
FailuresFileimplementation is clean with proper error handling. Graceful degradation on parse/read errors with warnings is appropriate for a dev tool.
138-201: LGTM!
RunResultscorrectly handles JSONL parsing with resilience to malformed lines and proper cleanup of lock files.
329-387: LGTM!Tests cover the core roundtrip functionality with proper isolation using
TempDir. Consider adding tests for error cases (corrupt JSON, concurrent writes) if the module becomes more critical.
320-327: LGTM with a note on special characters.The filter construction is correct for standard Rust test names. Test names containing special characters needing nextest escaping (e.g., parentheses, commas, backslashes) might break the filter expression, but Rust's naming conventions for test paths (using
::separators, underscores, hyphens) don't allow such characters, making this a non-issue in practice.
There was a problem hiding this comment.
Actionable comments posted: 1
Fix all issues with AI Agents 🤖
In @xtask/src/failures.rs:
- Around line 319-326: The build_failure_filter currently inserts raw test names
into the expression which can break if names contain special characters; update
build_failure_filter to escape each test name before formatting. Implement or
call a helper like escape_test_name (or use regex::escape from the regex crate)
to escape characters such as parentheses, pipes, quotes and backslashes, then
use format!("test(={})", escaped_name) when building filter_parts; add the regex
dependency if using regex::escape and ensure the escaped strings are joined as
before.
♻️ Duplicate comments (3)
xtask/src/failures.rs (2)
283-290: String-basedexperimentalkey detection is fragile but acceptable.As noted in a previous review,
contains("experimental")could match false positives in comments or string values. For a dev tool this is acceptable, but if issues arise, consider using a TOML parser liketoml_edit.
245-253: RedundantWriteimport in fallback path.The
use std::io::Write as _;at line 251 is redundant sinceWriteis already imported at the module level (line 7).🔎 Proposed fix
// Fallback: just try to write anyway eprintln!("Warning: Could not acquire lock for results file, writing anyway"); let line = serde_json::to_string(result).map_err(std::io::Error::other)?; let mut file = OpenOptions::new().create(true).append(true).open(path)?; - use std::io::Write as _; writeln!(file, "{}", line) }xtask/src/main.rs (1)
195-201:--testsis redundant with--all-targets.As noted in a previous review,
--all-targetsalready includes test targets, making--testssuperfluous. Not harmful, but could be removed for clarity.🔎 Proposed fix
let mut nextest_args = vec![ "nextest".to_string(), "run".to_string(), "--workspace".to_string(), - "--tests".to_string(), "--all-targets".to_string(), ];
📜 Review details
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
xtask/src/bin/nextest-failure-tracker.rsxtask/src/failures.rsxtask/src/main.rs
🧰 Additional context used
🧬 Code graph analysis (2)
xtask/src/main.rs (1)
xtask/src/failures.rs (7)
generate_nextest_config(264-317)get_failures_file_path(61-63)clear(112-118)clear(173-184)ensure_dir(256-258)load(78-80)load(146-148)
xtask/src/bin/nextest-failure-tracker.rs (1)
xtask/src/failures.rs (1)
append_result(204-206)
🔇 Additional comments (8)
xtask/src/bin/nextest-failure-tracker.rs (2)
10-57: LGTM! Critical issue from previous review has been addressed.The code now correctly records all test results (both passes and failures) unconditionally at lines 49-54, which fixes the previously identified issue where
RunResults.into_sets()would always return an emptypassedset. Theextract_test_namefunction now gracefully returnsOption<String>instead of panicking, with appropriate fallback handling inmain().
72-107: Tests are comprehensive and cover all edge cases.Good coverage including the empty args and only-flags scenarios that were previously noted as missing.
xtask/src/main.rs (4)
8-10: Import list has been cleaned up.The unused
build_failure_filterimport flagged in the previous review has been removed.
94-116:build_wrapperassumes debug build target.The wrapper path is hardcoded to
target/debug/nextest-failure-tracker(line 106). If the xtask is ever invoked in release mode or with a custom profile, this will fail to find the binary. For an internal dev tool this is acceptable, but consider adding a note or making it configurable if needed.
226-265: Failure tracking logic is well-implemented.The post-run processing correctly:
- Removes tests that now pass from the failures list
- Adds new failures with deduplication
- Sorts for consistent output
- Provides appropriate user feedback
374-384: LocalChecks correctly integrates with the expanded Test command.The new fields (
rerun_failures,fresh,no_update_failures) are properly initialized to maintain the default behavior when running tests as part of local checks.xtask/src/failures.rs (2)
21-53: Path resolution is robust with multiple fallbacks.The
get_target_dir()function correctly handles multiple scenarios:
CARGO_TARGET_DIRenvironment variable- Workspace root detection via
Cargo.lock- Fallback to
./targetThis ensures the failure tracking files are placed consistently regardless of where the command is invoked.
328-386: Unit tests provide good coverage of core functionality.The tests cover JSON and JSONL roundtrips, proper aggregation into passed/failed sets, and basic state checks. Using
TempDirensures test isolation.
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 5
Fix all issues with AI Agents 🤖
In @xtask/Cargo.toml:
- Around line 25-27: The Cargo.toml dependency entry uses a loose version for
tempfile; change the tempfile entry from tempfile = "3" to a more specific
pinned range such as tempfile = "3.24" (keeping serde.workspace = true and
serde_json.workspace = true as-is) so the xtask/Cargo.toml lists serde,
serde_json as workspace-scoped and tempfile with the tighter version constraint.
In @xtask/src/bin/nextest-failure-tracker.rs:
- Around line 30-42: The match arm for status currently casts the i32 exit code
to u8 with "code as u8", which can produce incorrect values for negative or
out-of-range codes; update the logic in the Ok(status) branch (the variables
passed, exit_code, and local code) to convert safely by attempting a try_into()
to u8 and falling back to 1 on error (or clamp to 0..=255), then use that safe
value for exit_code while keeping passed = code == 0; replace the direct cast
with this safe conversion.
In @xtask/src/failures.rs:
- Around line 21-32: The find_workspace_root function only looks for Cargo.lock
which fails for repos where Cargo.lock is gitignored or not generated; update
find_workspace_root to also look for Cargo.toml that contains a [workspace]
table as a fallback (parse the file or search for the literal “[workspace]”)
when Cargo.lock is absent, returning the directory containing that Cargo.toml,
and if you prefer not to change behavior instead add a clear comment/docstring
on find_workspace_root describing the assumption that Cargo.lock must exist and
suggesting the [workspace] fallback as an alternative.
- Around line 217-243: The spin-lock using lock_path and attempts can leave an
orphaned lock if the process dies after creating the lock file but before
remove_file; update the acquire loop (around the
OpenOptions::new().create_new(true).open(&lock_path) block and the Err(_)
branch) to detect and clean stale locks by writing the current PID and timestamp
into the lock file on creation, and on Err(_) read lock_path to parse
PID/timestamp and, if the timestamp is older than a stale threshold (e.g., 1s)
or the PID does not exist, remove the stale lock and retry; ensure writes are
flushed/atomic and any remove_file errors are handled so the function still
returns Ok/Err correctly.
- Around line 172-184: Stale lock file cleanup in clear() currently swallows
errors from fs::remove_file(lock_path); change this to check the Result and only
ignore NotFound while logging other failures: call fs::remove_file(lock_path)
and match on the Err variant, if err.kind() == std::io::ErrorKind::NotFound then
ignore, otherwise emit a warning (e.g., using log::warn! or tracing::warn!)
including the lock_path and the error; keep the rest of clear() behavior
unchanged.
♻️ Duplicate comments (4)
xtask/src/failures.rs (3)
283-290: String-based detection ofexperimentalkey is fragile.As noted in a previous review,
contains("experimental")could match comments or string values. For a dev tool this is acceptable, but the warning message appropriately alerts users to potential issues.
319-326: Test names are not escaped in filter expression.As noted previously, special characters in test names could break the filter. For typical Rust test names this is fine.
251-252: Redundantuse std::io::Write as _import.This import is already present at line 7. Remove this redundant local import.
🔎 Proposed fix
let mut file = OpenOptions::new().create(true).append(true).open(path)?; - use std::io::Write as _; writeln!(file, "{}", line)xtask/src/main.rs (1)
195-201:--testsis redundant with--all-targets.As noted in previous review,
--all-targetsalready includes test targets, making--testssuperfluous.
📜 Review details
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
xtask/Cargo.tomlxtask/src/bin/nextest-failure-tracker.rsxtask/src/failures.rsxtask/src/lib.rsxtask/src/main.rs
🧰 Additional context used
🧬 Code graph analysis (2)
xtask/src/main.rs (1)
xtask/src/failures.rs (6)
generate_nextest_config(264-317)get_failures_file_path(61-63)clear(112-118)clear(173-184)load(78-80)load(146-148)
xtask/src/bin/nextest-failure-tracker.rs (3)
crates/c/build/main.rs (1)
env(75-79)xtask/src/failures.rs (1)
append_result(204-206)xtask/src/main.rs (1)
main(563-568)
🔇 Additional comments (19)
xtask/src/failures.rs (4)
43-45: Potential panic oncurrent_dirfailure is handled gracefully.The fallback chain (
CARGO_MANIFEST_DIR→current_dir()→".") is reasonable for a dev tool.
70-127: FailuresFile implementation is clean and functional.The load/save pattern with graceful error handling (warnings instead of panics) is appropriate for a dev tool. The
as_set()helper is useful for set operations in the main workflow.
186-200:into_setscorrectly handles duplicate test names.If a test appears multiple times in results (e.g., due to retries), the last result wins due to
HashSet::insertbehavior. This is appropriate—the final outcome is what matters.
328-386: Unit tests provide good coverage for core functionality.The roundtrip tests for
FailuresFileandRunResultsvalidate the serialization/deserialization paths. Consider adding a test for concurrentappend_result_tocalls to validate the locking mechanism.xtask/src/lib.rs (1)
1-6: LGTM!Clean module documentation and appropriate public exposure of the
failuresmodule for use by binaries within the crate.xtask/Cargo.toml (1)
5-16: LGTM!The crate structure correctly defines:
- A library target for shared code (
failuresmodule)- The main
xtaskbinary- The
nextest-failure-trackerwrapper binarySetting
default-run = "xtask"ensurescargo runworks as expected.xtask/src/bin/nextest-failure-tracker.rs (4)
1-4: Documentation header is informative.The comment clearly explains the binary's purpose and invocation pattern.
10-26: Argument handling is correct and graceful.The check for minimum args (line 14) and graceful fallback for missing test name (lines 22-25) properly address previous review feedback. Using
"unknown"as fallback allows the wrapper to continue functioning while logging the issue.
44-57: Result recording correctly captures all outcomes.All test results (pass and fail) are recorded via
append_result, which addresses the previous critical review comment about tracking only failures. This enables the main workflow to correctly identify tests that now pass.
72-107: Test coverage addresses previously raised concern.The
test_extract_test_name_emptyandtest_extract_test_name_only_flagstests document theNonereturn behavior for edge cases, addressing the previous review suggestion about testing empty args.xtask/src/main.rs (9)
8-10: Import list is correct after previous fix.The unused
build_failure_filterimport has been removed as suggested in previous review.
24-41: CLI flags are well-documented and logically organized.The new flags (
rerun_failures,fresh,no_update_failures) have clear help text. The flag combinations are intuitive:
--freshclears history and runs all--rerun-failuresruns only previously failed tests--no-update-failuresdisables tracking entirely
94-116:build_wrapperfunction is clean and correct.The function builds the wrapper binary and locates it via cargo metadata. The existence check (line 108) provides a clear error if the build artifact isn't found.
Note: This always builds in debug mode. For large test suites, a release build might be faster, but for a wrapper that just spawns processes, debug is fine.
148-158: Flag interaction betweenfreshand file clearing is correct.When
--freshis passed, the failures file is cleared (lines 149-152). The results file is always cleared before a run unless--no-update-failuresis set (lines 155-158). This ensures clean state for new runs.
160-179:rerun_failureswithfreshinteraction is handled correctly.The condition
if rerun_failures && !fresh(line 161) ensures that--freshtakes precedence—if both flags are passed, all tests run fresh. The warning when no failures exist (lines 164-169) is helpful.
219-223: Tempfile lifetime management is correct.The
config_fileis dropped aftertest_resultis obtained (line 223), ensuring the temp file remains available during test execution. Good practice to explicitly document this with a comment.
226-265: Result processing logic is sound.The logic correctly:
- Loads run results from the JSONL file
- Splits into passed/failed sets
- Retains existing failures when rerunning, starts fresh otherwise
- Removes now-passing tests from failures
- Adds new failures without duplicates
- Sorts for deterministic output
This ensures the failures file accurately reflects the current state of test failures.
267-268: Test result propagation preserves exit code.The
test_result?at line 268 ensures the command exits with an error if tests failed, after updating the failures file. This is the correct order—state is saved before propagating the error.
374-384: LocalChecks correctly integrates new flags.The
Commands::Testconstruction includes all new fields with sensible defaults for the local checks workflow (no rerun, no fresh, update failures enabled).
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
NOTE: WRITTEN (mostly) BY CLAUDE WITH GUIDANCE
Describe the changes
This PR changes xtask test to run through a nextest wrapper binary that can track test success/failures and log them to a file. This file can then be used to re-run just the failing tests, automatically.
new args:
--rereun-failures- uses the saved failure information from the last run to filter out all tests that passed--fresh- wipes the saved failure information, and runs the full test suite--no-update-failures- does not update the saved failure information. useful for a batch of tests that may partially respond to fixes.Summary by CodeRabbit
New Features
Tests
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.
Checklist
Additional Context
Add any other context about the pull request here.