-
Notifications
You must be signed in to change notification settings - Fork 44
cargo: Change heuristic to use Cargo.lock for Rust root detection #1105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The previous heuristic using Cargo.toml files proved insufficient for
packages with multiple Cargo.toml files.
Using Cargo.lock provides more accurate root directory detection since
it always exists alongside the root Cargo.toml and better represents
the actual Rust project structure. This improves reliability when
processing Python packages containing Rust components.
Tested with simple project with this pyproject.toml:
```
[project]
name = "test"
version = "0.1.0"
dependencies = [
"cryptography>=45.0.7",
"hf-xet>=1.1.9",
]
```
and exported requirements.txt file.
Signed-off-by: Michal Šoltis <msoltis@redhat.com>
📝 WalkthroughWalkthroughSwitched Rust detection from Cargo.toml scanning to Cargo.lock scanning. Updated _get_rust_root_dir to accept Iterable[Path] of Cargo.lock files and return the parent of the top-most lockfile. Adjusted call sites and variable/type names to use cargo_lockfiles. Behavior otherwise unchanged; packages are still emitted as CargoPackageInput. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Scanner as Pip Rust Scanner
participant FS as FileSystem
participant Resolver as Rust Root Resolver
participant Collector as Package Collector
Scanner->>FS: Find Cargo.lock files
alt Any Cargo.lock found?
FS-->>Scanner: List<Lockfile Paths>
Scanner->>Resolver: _get_rust_root_dir(lockfiles)
Resolver-->>Scanner: Root dir (parent of top-most lockfile)
Scanner->>Collector: Add CargoPackageInput(type="cargo", path=relative root)
Collector-->>Scanner: Acknowledged
else None found
FS-->>Scanner: Empty
Scanner-->>Scanner: Skip package
end
note over Resolver: Logic changed to use Cargo.lock hierarchy
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Pre-merge checks (2 passed, 1 warning)❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
hermeto/core/package_managers/pip/rust.py (2)
53-66: Depth heuristic and docstring nits
- Use parents-count with a deterministic tiebreaker for stability across absolute/relative paths.
- Soften the “always” claim in the docstring.
def _get_rust_root_dir(cargo_lockfiles: Iterable[Path]) -> Path: """Find the top-most Cargo.lock in the package. NOTE: This is not necessarily the most accurate solution, but this heuristic has proven to work on the most popular python packages with rust dependencies. A repository may have multiple Cargo.toml files (e.g. workspace), but we are mostly interested in Cargo.lock file. - Also, on the same level as Cargo.lock file, there is always a Cargo.toml file. + Also, on the same level as Cargo.lock file, there is typically a Cargo.toml file. If this solution is not accurate enough, then we would probably need to check for the manifest path in pyproject.toml config section for maturin or setuptools-rust. """ - return min(cargo_lockfiles, key=lambda x: len(x.parts)).parent + # Prefer the shallowest lockfile; break ties deterministically. + return min(cargo_lockfiles, key=lambda p: (len(p.parents), str(p))).parent
88-90: Scan for Cargo.lock after confirming Rust deps; constrain search to source_dirThis avoids unnecessary filesystem walks and reduces false positives from unrelated lockfiles elsewhere in the tarball.
- cargo_lockfiles = list(extract_dir.rglob("Cargo.lock")) - if not cargo_lockfiles: - shutil.rmtree(extract_dir, ignore_errors=True) - continue @@ if not _depends_on_rust(source_dir): shutil.rmtree(extract_dir, ignore_errors=True) continue + # Only search within the detected project root to avoid stray lockfiles. + cargo_lockfiles = list(source_dir.rglob("Cargo.lock")) + if not cargo_lockfiles: + log.debug("No Cargo.lock under %s; skipping.", source_dir) + shutil.rmtree(extract_dir, ignore_errors=True) + continuePlease verify this reordering doesn’t miss any packages you expect to handle (e.g., libs that ship Rust but omit Cargo.lock).
Also applies to: 102-106
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
hermeto/core/package_managers/pip/rust.py(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Build container image and run integration tests on it
- GitHub Check: Konflux kflux-prd-rh03 / on-pull-request
🔇 Additional comments (1)
hermeto/core/package_managers/pip/rust.py (1)
105-105: Call-site update to new signature looks good
_get_rust_root_dir(cargo_lockfiles)is correct assuming the non-empty precondition holds from the prior check.
a-ovchinnikov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with some non-blocking questions.
eskultety
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the second (or third even?) time we're changing the heuristic? I'm wondering how long this one will last until we have to sit down and finally do what the comment suggests we ought to do :D .
|
I hope we cover 99% of use cases 🙏 |
Maintainers will complete the following section
Note: if the contribution is external (not from an organization member), the CI
pipeline will not run automatically. After verifying that the CI is safe to run:
/ok-to-test(as is the standard for Pipelines as Code)