nh-clean: cleanup; improve UX#647
Conversation
…ss-filesystems` Signed-off-by: NotAShelf <raf@notashelf.dev> Change-Id: I8265c8f4dfc6aea938a0739f47e1fe476a6a6964
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughWalk the entire /nix/var/nix/gcroots tree with WalkDir, classify direnv and /nix/store-direct-child gcroots, collect orphaned/broken gcroots, add --keep-one and --cross-filesystems flags, and remove tagged and orphaned gcroots during execution. ChangesGcroot Detection Refactor
Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/nh-clean/src/clean.rs`:
- Around line 215-234: The symlink target (dst) from read_link() can be
relative, so resolve relative targets against the symlink's parent
(src.parent()) before using exists()/metadata()/is_symlink() etc.; update the
logic that builds resolved_dst (and any following checks that use dst) to: if
dst.is_absolute() use dst, otherwise let resolved_dst =
src.parent().unwrap_or_else(|| Path::new("/").to_path_buf()).join(&dst) (or
equivalent) so all subsequent checks operate on the path relative to the
symlink's directory rather than the process CWD, preventing false orphaning and
wrong-file cleanup (apply this change around the variables src, dst,
resolved_dst and where orphan_gcroots is populated).
- Around line 236-272: The code currently treats a resolved /nix/store target
(dst) as the removable gcroot; instead, when
is_nix_store_direct_child(&resolved_dst) is true you must not queue the store
object for deletion—queue the gcroot symlink (src) instead and perform the
access and age checks against the symlink itself. Update the logic around
faccessat/faccess checks and the duration calculation to use src (and
src.symlink_metadata()/modified()) when resolved_dst is a nix store child, and
always insert src into gcroots_tagged (not dst); keep the existing behavior of
inserting src into orphan_gcroots unchanged for orphan handling. Ensure
functions/variables referenced: is_nix_store_direct_child, faccessat, dirfd,
dst, src, resolved_dst, gcroots_tagged, orphan_gcroots, and the
symlink_metadata/modified calls are adjusted accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 92a19914-1b40-49b4-96c5-b894135e69c7
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (4)
Cargo.tomlcrates/nh-clean/Cargo.tomlcrates/nh-clean/src/args.rscrates/nh-clean/src/clean.rs
| let src = entry.path().to_path_buf(); | ||
| let dst = src.read_link().wrap_err("Reading symlink destination")?; | ||
| let span = span!(Level::TRACE, "gcroot detection", ?dst); | ||
| let _entered = span.enter(); | ||
| debug!(?src); | ||
|
|
||
| if !dst.is_symlink() && !dst.exists() { | ||
| debug!( | ||
| ?src, | ||
| "gcroot is orphaned (dst missing), tagging for removal" | ||
| ); | ||
| orphan_gcroots.push(src); | ||
| continue; | ||
| } | ||
|
|
||
| let resolved_dst = if dst.is_symlink() { | ||
| dst.read_link().unwrap_or_else(|_| dst.clone()) | ||
| } else { | ||
| dst.clone() | ||
| }; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="crates/nh-clean/src/clean.rs"
echo "Relevant lines:"
sed -n '210,270p' "$file"
echo
echo "Path operations using dst directly:"
rg -n 'read_link\(|exists\(\)|faccessat\(|metadata\(\)|symlink_metadata\(\)|remove_path_nofail\(' "$file"Repository: nix-community/nh
Length of output: 2791
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="crates/nh-clean/src/clean.rs"
echo "=== Context around line 216 (dirfd initialization and setup) ==="
sed -n '180,220p' "$file"
echo
echo "=== Full faccessat usage and subsequent operations ==="
sed -n '245,280p' "$file"
echo
echo "=== Check if dst is passed to remove functions ==="
sed -n '250,270p' "$file"Repository: nix-community/nh
Length of output: 3337
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="crates/nh-clean/src/clean.rs"
echo "=== How gcroots_tagged is used (search for gcroots_tagged usage) ==="
rg -n 'gcroots_tagged' "$file" -A 2
echo
echo "=== Check if dst values are passed to remove_path_nofail ==="
sed -n '380,400p' "$file"
echo
echo "=== Test file to check for relative symlink tests ==="
fd -e rs test "$file" | head -5
rg -n 'symlink|relative' "$file" -C 2 | head -40Repository: nix-community/nh
Length of output: 1731
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Find test files ==="
fd -e rs test 'crates/nh-clean/src'
echo
echo "=== Look for gcroot-related tests ==="
fd -e rs 'test' 'crates/nh-clean/src' -x cat {}
echo
echo "=== Check if there are any examples or docs about gcroots ==="
rg -l 'gcroot' --type rsRepository: nix-community/nh
Length of output: 231
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Search for tests in clean.rs file ==="
sed -n '600,$p' "crates/nh-clean/src/clean.rs" | head -100
echo
echo "=== Check Nix gcroots documentation ==="
rg 'gcroot' --type rust
echo
echo "=== Check if relative path handling exists anywhere ==="
rg 'is_absolute|join.*parent' --type rust | head -20Repository: nix-community/nh
Length of output: 5765
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Check the dirfd opening and usage pattern ==="
sed -n '198,210p' "crates/nh-clean/src/clean.rs"
echo
echo "=== Check full faccessat call context ==="
sed -n '245,250p' "crates/nh-clean/src/clean.rs"
echo
echo "=== Verify what operations use dst directly ==="
sed -n '250,275p' "crates/nh-clean/src/clean.rs"Repository: nix-community/nh
Length of output: 1607
Resolve relative gcroot destinations against the symlink's parent.
read_link() returns the raw link target without resolving relative paths. If a gcroot points to a relative target, exists(), faccessat(), metadata(), and symlink_metadata() will all interpret dst relative to the current working directory instead of relative to src.parent(). This causes misclassification of live gcroots as orphaned and can point cleanup at the wrong file.
Suggested fix
- let src = entry.path().to_path_buf();
- let dst = src.read_link().wrap_err("Reading symlink destination")?;
+ let src = entry.path().to_path_buf();
+ let raw_dst = src.read_link().wrap_err("Reading symlink destination")?;
+ let dst = if raw_dst.is_absolute() {
+ raw_dst
+ } else {
+ src
+ .parent()
+ .context("gcroot has no parent")?
+ .join(raw_dst)
+ };📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let src = entry.path().to_path_buf(); | |
| let dst = src.read_link().wrap_err("Reading symlink destination")?; | |
| let span = span!(Level::TRACE, "gcroot detection", ?dst); | |
| let _entered = span.enter(); | |
| debug!(?src); | |
| if !dst.is_symlink() && !dst.exists() { | |
| debug!( | |
| ?src, | |
| "gcroot is orphaned (dst missing), tagging for removal" | |
| ); | |
| orphan_gcroots.push(src); | |
| continue; | |
| } | |
| let resolved_dst = if dst.is_symlink() { | |
| dst.read_link().unwrap_or_else(|_| dst.clone()) | |
| } else { | |
| dst.clone() | |
| }; | |
| let src = entry.path().to_path_buf(); | |
| let raw_dst = src.read_link().wrap_err("Reading symlink destination")?; | |
| let dst = if raw_dst.is_absolute() { | |
| raw_dst | |
| } else { | |
| src | |
| .parent() | |
| .context("gcroot has no parent")? | |
| .join(raw_dst) | |
| }; | |
| let span = span!(Level::TRACE, "gcroot detection", ?dst); | |
| let _entered = span.enter(); | |
| debug!(?src); | |
| if !dst.is_symlink() && !dst.exists() { | |
| debug!( | |
| ?src, | |
| "gcroot is orphaned (dst missing), tagging for removal" | |
| ); | |
| orphan_gcroots.push(src); | |
| continue; | |
| } | |
| let resolved_dst = if dst.is_symlink() { | |
| dst.read_link().unwrap_or_else(|_| dst.clone()) | |
| } else { | |
| dst.clone() | |
| }; |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/nh-clean/src/clean.rs` around lines 215 - 234, The symlink target
(dst) from read_link() can be relative, so resolve relative targets against the
symlink's parent (src.parent()) before using exists()/metadata()/is_symlink()
etc.; update the logic that builds resolved_dst (and any following checks that
use dst) to: if dst.is_absolute() use dst, otherwise let resolved_dst =
src.parent().unwrap_or_else(|| Path::new("/").to_path_buf()).join(&dst) (or
equivalent) so all subsequent checks operate on the path relative to the
symlink's directory rather than the process CWD, preventing false orphaning and
wrong-file cleanup (apply this change around the variables src, dst,
resolved_dst and where orphan_gcroots is populated).
| if !regexes | ||
| .iter() | ||
| .any(|next| next.is_match(&dst.to_string_lossy())) | ||
| && !is_nix_store_direct_child(&resolved_dst) | ||
| { | ||
| debug!("dst doesn't match any gcroot regex, skipping"); | ||
| debug!("dst doesn't match any gcroot filter, skipping"); | ||
| continue; | ||
| } | ||
|
|
||
| // Create a file descriptor for the current working directory | ||
| let dirfd = nix::fcntl::open( | ||
| ".", | ||
| nix::fcntl::OFlag::O_DIRECTORY, | ||
| nix::sys::stat::Mode::empty(), | ||
| )?; | ||
|
|
||
| // Use .exists to not travel symlinks | ||
| if match faccessat( | ||
| match faccessat( | ||
| &dirfd, | ||
| &dst, | ||
| AccessFlags::F_OK | AccessFlags::W_OK, | ||
| AtFlags::AT_SYMLINK_NOFOLLOW, | ||
| ) { | ||
| Ok(()) => true, | ||
| Err(errno) => { | ||
| match errno { | ||
| Errno::EACCES | Errno::ENOENT => false, | ||
| _ => { | ||
| bail!( | ||
| eyre!("Checking access for gcroot {:?}, unknown error", dst) | ||
| .wrap_err(errno) | ||
| ) | ||
| }, | ||
| Ok(()) => { | ||
| if dst.metadata().is_err() { | ||
| debug!(?dst, "gcroot target already GC'd, tagging for removal"); | ||
| gcroots_tagged.insert(dst, true); | ||
| } else { | ||
| let dur = now.duration_since( | ||
| dst | ||
| .symlink_metadata() | ||
| .wrap_err("Reading gcroot metadata")? | ||
| .modified()?, | ||
| ); | ||
| debug!(?dur); | ||
| match dur { | ||
| Err(err) => { | ||
| warn!(?err, ?now, "Failed to compare time!"); | ||
| }, | ||
| Ok(val) if val <= args.keep_since.into() => { | ||
| gcroots_tagged.insert(dst, false); | ||
| }, | ||
| Ok(_) => { | ||
| gcroots_tagged.insert(dst, true); | ||
| }, |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="crates/nh-clean/src/clean.rs"
echo "Filter + tagging block:"
sed -n '232,276p' "$file"
echo
echo "Removal sites:"
sed -n '385,395p' "$file"
echo
echo "Direct-child helper:"
sed -n '551,556p' "$file"Repository: nix-community/nh
Length of output: 1870
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="crates/nh-clean/src/clean.rs"
echo "=== Broader context (lines 220-280) ==="
sed -n '220,280p' "$file"
echo
echo "=== Loop iteration setup (lines 200-235) ==="
sed -n '200,235p' "$file"Repository: nix-community/nh
Length of output: 3014
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="crates/nh-clean/src/clean.rs"
echo "=== remove_path_nofail function ==="
rg -A 10 "fn remove_path_nofail" "$file"
echo
echo "=== Check store path context ==="
rg -B 5 -A 5 "nix/store" "$file" | head -40Repository: nix-community/nh
Length of output: 1361
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="crates/nh-clean/src/clean.rs"
echo "=== Function signature and comments (lines 150-210) ==="
sed -n '150,210p' "$file"
echo
echo "=== Check what happens in orphan_gcroots (lines 280-300) ==="
sed -n '280,310p' "$file"Repository: nix-community/nh
Length of output: 2685
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="crates/nh-clean/src/clean.rs"
echo "=== Look at orphan_gcroots handling vs gcroots_tagged ==="
sed -n '1,50p' "$file" | head -30
echo
echo "=== Check if src is ever inserted into gcroots_tagged ==="
rg "gcroots_tagged.insert" "$file"
echo
echo "=== Look at orphan removal flow ==="
sed -n '390,415p' "$file"Repository: nix-community/nh
Length of output: 1696
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="crates/nh-clean/src/clean.rs"
echo "=== Check test coverage for store direct child paths ==="
rg -A 20 "store_direct_child" "$file"
echo
echo "=== Understand the intent: should we remove the gcroot link or the store target? ==="
echo "Lines around where orphan_gcroots.push(src) happens:"
sed -n '220,235p' "$file"
echo
echo "=== Compare: orphan_gcroots stores src, so gcroots_tagged should also store src when dealing with store paths ==="Repository: nix-community/nh
Length of output: 4435
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="crates/nh-clean/src/clean.rs"
echo "=== Check if store paths should ever be removed directly ==="
sed -n '405,425p' "$file"
echo
echo "=== Verify: should age check be on the symlink or the target? ==="
echo "Context around symlink_metadata() call:"
sed -n '254,264p' "$file"
echo
echo "=== Understand: does AT_SYMLINK_NOFOLLOW make sense here? ==="
rg -B 2 -A 5 "AT_SYMLINK_NOFOLLOW" "$file"Repository: nix-community/nh
Length of output: 1538
Don't queue /nix/store targets for deletion.
When is_nix_store_direct_child(&resolved_dst) passes the filter, the code inserts dst (the store object) into gcroots_tagged for removal. This is incorrect—the gcroot symlink (src) should be removed, not the store target. Store objects are immutable and should only be garbage collected via nix store gc, not directly removed. Additionally, the access and age checks are performed on the wrong path: they should apply to the gcroot symlink itself, not the target it points to.
The pattern is inconsistent with how orphaned gcroots are handled (line 230), where src is inserted into orphan_gcroots for removal.
Suggested fix
- if !regexes
+ let is_store_direct_child = is_nix_store_direct_child(&resolved_dst);
+ if !regexes
.iter()
.any(|next| next.is_match(&dst.to_string_lossy()))
- && !is_nix_store_direct_child(&resolved_dst)
+ && !is_store_direct_child
{
debug!("dst doesn't match any gcroot filter, skipping");
continue;
}
+
+ let tracked_path = if is_store_direct_child { &src } else { &dst };
match faccessat(
&dirfd,
- &dst,
+ tracked_path,
AccessFlags::F_OK | AccessFlags::W_OK,
AtFlags::AT_SYMLINK_NOFOLLOW,
) {
Ok(()) => {
- if dst.metadata().is_err() {
+ if tracked_path.metadata().is_err() {
debug!(?dst, "gcroot target already GC'd, tagging for removal");
- gcroots_tagged.insert(dst, true);
+ gcroots_tagged.insert(tracked_path.to_path_buf(), true);
} else {
let dur = now.duration_since(
- dst
+ tracked_path
.symlink_metadata()
.wrap_err("Reading gcroot metadata")?
.modified()?,
);
debug!(?dur);
match dur {
Err(err) => {
warn!(?err, ?now, "Failed to compare time!");
},
Ok(val) if val <= args.keep_since.into() => {
- gcroots_tagged.insert(dst, false);
+ gcroots_tagged.insert(tracked_path.to_path_buf(), false);
},
Ok(_) => {
- gcroots_tagged.insert(dst, true);
+ gcroots_tagged.insert(tracked_path.to_path_buf(), true);
},
}
}
},🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/nh-clean/src/clean.rs` around lines 236 - 272, The code currently
treats a resolved /nix/store target (dst) as the removable gcroot; instead, when
is_nix_store_direct_child(&resolved_dst) is true you must not queue the store
object for deletion—queue the gcroot symlink (src) instead and perform the
access and age checks against the symlink itself. Update the logic around
faccessat/faccess checks and the duration calculation to use src (and
src.symlink_metadata()/modified()) when resolved_dst is a nix store child, and
always insert src into gcroots_tagged (not dst); keep the existing behavior of
inserting src into orphan_gcroots unchanged for orphan handling. Ensure
functions/variables referenced: is_nix_store_direct_child, faccessat, dirfd,
dst, src, resolved_dst, gcroots_tagged, orphan_gcroots, and the
symlink_metadata/modified calls are adjusted accordingly.
ecea5a2 to
af8e9fa
Compare
You can use |
Co-authored-by: Josip Vulic <jovulic@gmail.com> Signed-off-by: NotAShelf <raf@notashelf.dev> Change-Id: I3a0c8804ff903f6d0f8afc9e545c1dfc6a6a6964
af8e9fa to
eaf591b
Compare
|
We use this for the experimental features check, but I'm honestly not a big fan of it. There is a bit of nuance to GC that I don't want to just discard by bumrushing. |
RESULT_REGEXwithis_nix_store_direct_child(), which checks for/nix/storedirect children structurally.read_dirof/autowith aWalkDirover all of/nix/var/nix/gcroots, gated by--cross-filesystems/-xfor filesystem boundary control.Largely inspired by https://github.com/blitz/nix-link-cleanup. One solution I haven't fixed is non-standard Nix store paths. AFAIK Nix can have the Nix store at a non-standard path outside of
/nixand while most systems will not do this, it's an issue I haven't tackled. There is no good way to get the correct path without parsingnix.conf, which I don't want to do quite honestly.Change-Id: I8265c8f4dfc6aea938a0739f47e1fe476a6a6964
Add a 👍 reaction to pull requests you find important.