Skip to content

Conversation

@johanneskoester
Copy link
Contributor

@johanneskoester johanneskoester commented Dec 5, 2025

QC

  • The PR contains a test case for the changes or the changes are already covered by an existing test case.
  • The documentation (docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).

Summary by CodeRabbit

  • Bug Fixes
    • Improved handling of grouped workflow tasks when retrieving storage inputs, ensuring proper aggregation across all tasks within a group rather than treating them as a single unit.
    • Enhanced type safety with explicit parameter validation.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 5, 2025

📝 Walkthrough

Walkthrough

Refactors retrieve_storage_inputs in the DAG class to properly handle GroupJob containers by iterating over their inner jobs' inputs instead of treating them as single jobs. Adds explicit type hints for the jobs parameter. The filtering logic for storage and external inputs is maintained while aggregating inputs across grouped jobs.

Changes

Cohort / File(s) Summary
DAG retrieve_storage_inputs refactoring
src/snakemake/dag.py
Updated retrieve_storage_inputs method to handle GroupJob by iterating inner jobs' inputs. Added explicit type annotation: jobs: List[Union[Job, GroupJob]]. Control flow now branches to process grouped jobs separately before proceeding with existing storage retrieval logic.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Verify that the branching logic correctly identifies and iterates GroupJob containers without breaking existing behavior for single jobs
  • Confirm that the aggregation of inputs across grouped jobs preserves the intended filtering semantics for storage and external inputs
  • Validate type annotations are correct and call sites are updated appropriately

Possibly related PRs

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: fixing group handling in storage input retrieval, which aligns with the core refactoring to properly iterate over inner_job.input when encountering GroupJob.
Description check ✅ Passed The PR description includes both required checklist items marked as complete: test case is included and documentation has been addressed where necessary.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/retrieve_group_job_input

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cd265bb and 33de3f1.

📒 Files selected for processing (1)
  • src/snakemake/dag.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • src/snakemake/dag.py
🧠 Learnings (1)
📚 Learning: 2024-10-04T16:12:18.927Z
Learnt from: lczech
Repo: snakemake/snakemake PR: 3113
File: snakemake/scheduler.py:912-914
Timestamp: 2024-10-04T16:12:18.927Z
Learning: In `snakemake/scheduler.py`, avoid suggesting the use of `asyncio.gather` in the `jobs_rewards` method due to overhead concerns and the need for immediate results.

Applied to files:

  • src/snakemake/dag.py
🧬 Code graph analysis (1)
src/snakemake/dag.py (1)
src/snakemake/io/__init__.py (3)
  • is_storage (422-423)
  • exists_in_storage (169-170)
  • exists_in_storage (534-537)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (46)
  • GitHub Check: tests (10, ubuntu-latest, py312)
  • GitHub Check: tests (9, ubuntu-latest, py313)
  • GitHub Check: tests (10, ubuntu-latest, py311)
  • GitHub Check: tests (10, windows-2022, py313)
  • GitHub Check: tests (9, windows-2022, py313)
  • GitHub Check: tests (9, ubuntu-latest, py312)
  • GitHub Check: tests (10, ubuntu-latest, py313)
  • GitHub Check: tests (8, ubuntu-latest, py313)
  • GitHub Check: tests (9, ubuntu-latest, py311)
  • GitHub Check: tests (5, windows-2022, py313)
  • GitHub Check: tests (7, ubuntu-latest, py312)
  • GitHub Check: tests (7, ubuntu-latest, py311)
  • GitHub Check: tests (8, windows-2022, py313)
  • GitHub Check: tests (7, windows-2022, py313)
  • GitHub Check: tests (8, ubuntu-latest, py311)
  • GitHub Check: tests (6, ubuntu-latest, py311)
  • GitHub Check: tests (8, ubuntu-latest, py312)
  • GitHub Check: tests (7, ubuntu-latest, py313)
  • GitHub Check: tests (6, ubuntu-latest, py313)
  • GitHub Check: tests (6, windows-2022, py313)
  • GitHub Check: tests (6, macos-latest, py313)
  • GitHub Check: tests (4, windows-2022, py313)
  • GitHub Check: tests (5, ubuntu-latest, py312)
  • GitHub Check: tests (5, ubuntu-latest, py313)
  • GitHub Check: tests (5, ubuntu-latest, py311)
  • GitHub Check: tests (6, ubuntu-latest, py312)
  • GitHub Check: tests (5, macos-latest, py313)
  • GitHub Check: tests (4, ubuntu-latest, py312)
  • GitHub Check: tests (4, macos-latest, py313)
  • GitHub Check: tests (4, ubuntu-latest, py311)
  • GitHub Check: tests (4, ubuntu-latest, py313)
  • GitHub Check: tests (1, macos-latest, py313)
  • GitHub Check: tests (2, macos-latest, py313)
  • GitHub Check: tests (3, ubuntu-latest, py313)
  • GitHub Check: tests (3, windows-2022, py313)
  • GitHub Check: tests (2, ubuntu-latest, py313)
  • GitHub Check: tests (2, windows-2022, py313)
  • GitHub Check: tests (3, ubuntu-latest, py312)
  • GitHub Check: tests (3, ubuntu-latest, py311)
  • GitHub Check: tests (1, ubuntu-latest, py312)
  • GitHub Check: tests (1, ubuntu-latest, py311)
  • GitHub Check: tests (2, ubuntu-latest, py312)
  • GitHub Check: tests (1, windows-2022, py313)
  • GitHub Check: tests (2, ubuntu-latest, py311)
  • GitHub Check: tests (1, ubuntu-latest, py313)
  • GitHub Check: apidocs
🔇 Additional comments (1)
src/snakemake/dag.py (1)

407-409: Type hint addition looks correct.

The explicit type annotation List[Union[Job, GroupJob]] accurately reflects the method's capability to handle both individual jobs and grouped jobs.

Comment on lines +416 to +434
if isinstance(job, GroupJob):
inner_jobs = job.jobs
else:
inner_jobs = [job]
for inner_job in inner_jobs:
for f in inner_job.input:
if (
f.is_storage
and not job.is_norun
and (
# if f exists in storage, retrieve below will check if it is
# newer than an eventual local copy
(also_missing_internal and await f.exists_in_storage())
or self.is_external_input(
f, job, not_needrun_is_external=True
)
)
):
to_retrieve[f].append(access_pattern(f))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Incorrect job reference in GroupJob handling.

When iterating over inner_jobs from a GroupJob, the code should reference inner_job properties, not the outer job (which is the GroupJob container):

  1. Line 424: job.is_norun should be inner_job.is_norun — we need to check whether the specific inner job is norun, not the GroupJob container.
  2. Line 430: The second parameter to is_external_input() should be inner_job, not job — the method checks if the file is in the job's dependencies (lines 3080-3094), so it needs the specific inner job whose input file we're examining.

These errors will cause incorrect retrieval decisions for grouped jobs.

Apply this diff to fix both issues:

             for inner_job in inner_jobs:
                 for f in inner_job.input:
                     if (
                         f.is_storage
-                        and not job.is_norun
+                        and not inner_job.is_norun
                         and (
                             # if f exists in storage, retrieve below will check if it is
                             # newer than an eventual local copy
                             (also_missing_internal and await f.exists_in_storage())
                             or self.is_external_input(
-                                f, job, not_needrun_is_external=True
+                                f, inner_job, not_needrun_is_external=True
                             )
                         )
                     ):
🤖 Prompt for AI Agents
In src/snakemake/dag.py around lines 416 to 434, the GroupJob inner-loop
incorrectly references the outer `job` when making decisions about each inner
job; change references so checks use `inner_job` instead: replace `job.is_norun`
with `inner_job.is_norun`, and pass `inner_job` (not `job`) as the second
argument to `is_external_input()`, ensuring retrieval logic evaluates each inner
job's properties and dependencies correctly.

@johanneskoester johanneskoester merged commit 8c818f4 into main Dec 5, 2025
60 checks passed
@johanneskoester johanneskoester deleted the fix/retrieve_group_job_input branch December 5, 2025 12:34
johanneskoester pushed a commit that referenced this pull request Dec 6, 2025
🤖 I have created a release *beep* *boop*
---


##
[9.14.2](v9.14.1...v9.14.2)
(2025-12-05)


### Bug Fixes

* correctly handle groups when retrieving storage input
([#3861](#3861))
([8c818f4](8c818f4))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
kjohnsen pushed a commit to kjohnsen/snakemake that referenced this pull request Dec 15, 2025
…#3861)

<!--Add a description of your PR here-->

### QC
<!-- Make sure that you can tick the boxes below. -->

* [x] The PR contains a test case for the changes or the changes are
already covered by an existing test case.
* [x] The documentation (`docs/`) is updated to reflect the changes or
this is not necessary (e.g. if the change does neither modify the
language nor the behavior or functionalities of Snakemake).


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Bug Fixes**
* Improved handling of grouped workflow tasks when retrieving storage
inputs, ensuring proper aggregation across all tasks within a group
rather than treating them as a single unit.
  * Enhanced type safety with explicit parameter validation.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
kjohnsen pushed a commit to kjohnsen/snakemake that referenced this pull request Dec 15, 2025
🤖 I have created a release *beep* *boop*
---


##
[9.14.2](snakemake/snakemake@v9.14.1...v9.14.2)
(2025-12-05)


### Bug Fixes

* correctly handle groups when retrieving storage input
([snakemake#3861](snakemake#3861))
([8c818f4](snakemake@8c818f4))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant