Skip to content

Conversation

@slimreaper35
Copy link
Member

Maintainers will complete the following section

  • Commit messages are descriptive enough
  • Code coverage from testing does not decrease and new code is covered
  • Docs updated (if applicable)
  • Docs links in the code are still valid (if docs were updated)

Note: if the contribution is external (not from an organization member), the CI
pipeline will not run automatically. After verifying that the CI is safe to run:

The previous heuristic using Cargo.toml files proved insufficient for
packages with multiple Cargo.toml files.

Using Cargo.lock provides more accurate root directory detection since
it always exists alongside the root Cargo.toml and better represents
the actual Rust project structure. This improves reliability when
processing Python packages containing Rust components.

Tested with simple project with this pyproject.toml:

```
[project]
name = "test"
version = "0.1.0"
dependencies = [
    "cryptography>=45.0.7",
    "hf-xet>=1.1.9",
]
```

and exported requirements.txt file.

Signed-off-by: Michal Šoltis <msoltis@redhat.com>
@coderabbitai
Copy link

coderabbitai bot commented Sep 10, 2025

📝 Walkthrough

Walkthrough

Switched Rust detection from Cargo.toml scanning to Cargo.lock scanning. Updated _get_rust_root_dir to accept Iterable[Path] of Cargo.lock files and return the parent of the top-most lockfile. Adjusted call sites and variable/type names to use cargo_lockfiles. Behavior otherwise unchanged; packages are still emitted as CargoPackageInput.

Changes

Cohort / File(s) Summary
Rust root resolution via Cargo.lock
hermeto/core/package_managers/pip/rust.py
Replace Cargo.toml-based detection with Cargo.lock-based detection; update _get_rust_root_dir(cargo_lockfiles: Iterable[Path]) signature and docstring; rename variables from cargo_files to cargo_lockfiles; update call-site in filter_packages_with_rust_code; skip package when no Cargo.lock is found.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Scanner as Pip Rust Scanner
    participant FS as FileSystem
    participant Resolver as Rust Root Resolver
    participant Collector as Package Collector

    Scanner->>FS: Find Cargo.lock files
    alt Any Cargo.lock found?
        FS-->>Scanner: List<Lockfile Paths>
        Scanner->>Resolver: _get_rust_root_dir(lockfiles)
        Resolver-->>Scanner: Root dir (parent of top-most lockfile)
        Scanner->>Collector: Add CargoPackageInput(type="cargo", path=relative root)
        Collector-->>Scanner: Acknowledged
    else None found
        FS-->>Scanner: Empty
        Scanner-->>Scanner: Skip package
    end

    note over Resolver: Logic changed to use Cargo.lock hierarchy
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Pre-merge checks (2 passed, 1 warning)

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The current pull request description is just the unmodified template and lacks any actual information about what the change does, why it was made, how it was tested, or any confirmations against the listed checklist items. Please fill out the description by summarizing the changes and their motivation, detailing any new or updated tests and documentation, and verifying each checklist item for commit messages, coverage, and docs before merging.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title “cargo: Change heuristic to use Cargo.lock for Rust root detection” clearly and concisely communicates the main change of the pull request by indicating that the cargo detection heuristic now relies on Cargo.lock for identifying the Rust root.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
hermeto/core/package_managers/pip/rust.py (2)

53-66: Depth heuristic and docstring nits

  • Use parents-count with a deterministic tiebreaker for stability across absolute/relative paths.
  • Soften the “always” claim in the docstring.
 def _get_rust_root_dir(cargo_lockfiles: Iterable[Path]) -> Path:
     """Find the top-most Cargo.lock in the package.

     NOTE: This is not necessarily the most accurate solution, but this heuristic has proven to work
     on the most popular python packages with rust dependencies. A repository may have multiple
     Cargo.toml files (e.g. workspace), but we are mostly interested in Cargo.lock file.

-    Also, on the same level as Cargo.lock file, there is always a Cargo.toml file.
+    Also, on the same level as Cargo.lock file, there is typically a Cargo.toml file.

     If this solution is not accurate enough, then we would probably need to check for the manifest
     path in pyproject.toml config section for maturin or setuptools-rust.
     """
-    return min(cargo_lockfiles, key=lambda x: len(x.parts)).parent
+    # Prefer the shallowest lockfile; break ties deterministically.
+    return min(cargo_lockfiles, key=lambda p: (len(p.parents), str(p))).parent

88-90: Scan for Cargo.lock after confirming Rust deps; constrain search to source_dir

This avoids unnecessary filesystem walks and reduces false positives from unrelated lockfiles elsewhere in the tarball.

-        cargo_lockfiles = list(extract_dir.rglob("Cargo.lock"))
-        if not cargo_lockfiles:
-            shutil.rmtree(extract_dir, ignore_errors=True)
-            continue
@@
         if not _depends_on_rust(source_dir):
             shutil.rmtree(extract_dir, ignore_errors=True)
             continue
+        # Only search within the detected project root to avoid stray lockfiles.
+        cargo_lockfiles = list(source_dir.rglob("Cargo.lock"))
+        if not cargo_lockfiles:
+            log.debug("No Cargo.lock under %s; skipping.", source_dir)
+            shutil.rmtree(extract_dir, ignore_errors=True)
+            continue

Please verify this reordering doesn’t miss any packages you expect to handle (e.g., libs that ship Rust but omit Cargo.lock).

Also applies to: 102-106

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 46f7432 and 55b8f2c.

📒 Files selected for processing (1)
  • hermeto/core/package_managers/pip/rust.py (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build container image and run integration tests on it
  • GitHub Check: Konflux kflux-prd-rh03 / on-pull-request
🔇 Additional comments (1)
hermeto/core/package_managers/pip/rust.py (1)

105-105: Call-site update to new signature looks good

_get_rust_root_dir(cargo_lockfiles) is correct assuming the non-empty precondition holds from the prior check.

Copy link
Contributor

@a-ovchinnikov a-ovchinnikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some non-blocking questions.

Copy link
Member

@eskultety eskultety left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the second (or third even?) time we're changing the heuristic? I'm wondering how long this one will last until we have to sit down and finally do what the comment suggests we ought to do :D .

@slimreaper35
Copy link
Member Author

I hope we cover 99% of use cases 🙏

@slimreaper35 slimreaper35 added this pull request to the merge queue Sep 11, 2025
Merged via the queue into hermetoproject:main with commit d85f378 Sep 11, 2025
14 checks passed
@slimreaper35 slimreaper35 deleted the cargo-location branch September 11, 2025 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants