-
Notifications
You must be signed in to change notification settings - Fork 617
fix: correctly handle groups when retrieving storage input #3861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughRefactors Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Possibly related PRs
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/snakemake/dag.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
⚙️ CodeRabbit configuration file
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
src/snakemake/dag.py
🧠 Learnings (1)
📚 Learning: 2024-10-04T16:12:18.927Z
Learnt from: lczech
Repo: snakemake/snakemake PR: 3113
File: snakemake/scheduler.py:912-914
Timestamp: 2024-10-04T16:12:18.927Z
Learning: In `snakemake/scheduler.py`, avoid suggesting the use of `asyncio.gather` in the `jobs_rewards` method due to overhead concerns and the need for immediate results.
Applied to files:
src/snakemake/dag.py
🧬 Code graph analysis (1)
src/snakemake/dag.py (1)
src/snakemake/io/__init__.py (3)
is_storage(422-423)exists_in_storage(169-170)exists_in_storage(534-537)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (46)
- GitHub Check: tests (10, ubuntu-latest, py312)
- GitHub Check: tests (9, ubuntu-latest, py313)
- GitHub Check: tests (10, ubuntu-latest, py311)
- GitHub Check: tests (10, windows-2022, py313)
- GitHub Check: tests (9, windows-2022, py313)
- GitHub Check: tests (9, ubuntu-latest, py312)
- GitHub Check: tests (10, ubuntu-latest, py313)
- GitHub Check: tests (8, ubuntu-latest, py313)
- GitHub Check: tests (9, ubuntu-latest, py311)
- GitHub Check: tests (5, windows-2022, py313)
- GitHub Check: tests (7, ubuntu-latest, py312)
- GitHub Check: tests (7, ubuntu-latest, py311)
- GitHub Check: tests (8, windows-2022, py313)
- GitHub Check: tests (7, windows-2022, py313)
- GitHub Check: tests (8, ubuntu-latest, py311)
- GitHub Check: tests (6, ubuntu-latest, py311)
- GitHub Check: tests (8, ubuntu-latest, py312)
- GitHub Check: tests (7, ubuntu-latest, py313)
- GitHub Check: tests (6, ubuntu-latest, py313)
- GitHub Check: tests (6, windows-2022, py313)
- GitHub Check: tests (6, macos-latest, py313)
- GitHub Check: tests (4, windows-2022, py313)
- GitHub Check: tests (5, ubuntu-latest, py312)
- GitHub Check: tests (5, ubuntu-latest, py313)
- GitHub Check: tests (5, ubuntu-latest, py311)
- GitHub Check: tests (6, ubuntu-latest, py312)
- GitHub Check: tests (5, macos-latest, py313)
- GitHub Check: tests (4, ubuntu-latest, py312)
- GitHub Check: tests (4, macos-latest, py313)
- GitHub Check: tests (4, ubuntu-latest, py311)
- GitHub Check: tests (4, ubuntu-latest, py313)
- GitHub Check: tests (1, macos-latest, py313)
- GitHub Check: tests (2, macos-latest, py313)
- GitHub Check: tests (3, ubuntu-latest, py313)
- GitHub Check: tests (3, windows-2022, py313)
- GitHub Check: tests (2, ubuntu-latest, py313)
- GitHub Check: tests (2, windows-2022, py313)
- GitHub Check: tests (3, ubuntu-latest, py312)
- GitHub Check: tests (3, ubuntu-latest, py311)
- GitHub Check: tests (1, ubuntu-latest, py312)
- GitHub Check: tests (1, ubuntu-latest, py311)
- GitHub Check: tests (2, ubuntu-latest, py312)
- GitHub Check: tests (1, windows-2022, py313)
- GitHub Check: tests (2, ubuntu-latest, py311)
- GitHub Check: tests (1, ubuntu-latest, py313)
- GitHub Check: apidocs
🔇 Additional comments (1)
src/snakemake/dag.py (1)
407-409: Type hint addition looks correct.The explicit type annotation
List[Union[Job, GroupJob]]accurately reflects the method's capability to handle both individual jobs and grouped jobs.
| if isinstance(job, GroupJob): | ||
| inner_jobs = job.jobs | ||
| else: | ||
| inner_jobs = [job] | ||
| for inner_job in inner_jobs: | ||
| for f in inner_job.input: | ||
| if ( | ||
| f.is_storage | ||
| and not job.is_norun | ||
| and ( | ||
| # if f exists in storage, retrieve below will check if it is | ||
| # newer than an eventual local copy | ||
| (also_missing_internal and await f.exists_in_storage()) | ||
| or self.is_external_input( | ||
| f, job, not_needrun_is_external=True | ||
| ) | ||
| ) | ||
| ): | ||
| to_retrieve[f].append(access_pattern(f)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Incorrect job reference in GroupJob handling.
When iterating over inner_jobs from a GroupJob, the code should reference inner_job properties, not the outer job (which is the GroupJob container):
- Line 424:
job.is_norunshould beinner_job.is_norun— we need to check whether the specific inner job is norun, not the GroupJob container. - Line 430: The second parameter to
is_external_input()should beinner_job, notjob— the method checks if the file is in the job's dependencies (lines 3080-3094), so it needs the specific inner job whose input file we're examining.
These errors will cause incorrect retrieval decisions for grouped jobs.
Apply this diff to fix both issues:
for inner_job in inner_jobs:
for f in inner_job.input:
if (
f.is_storage
- and not job.is_norun
+ and not inner_job.is_norun
and (
# if f exists in storage, retrieve below will check if it is
# newer than an eventual local copy
(also_missing_internal and await f.exists_in_storage())
or self.is_external_input(
- f, job, not_needrun_is_external=True
+ f, inner_job, not_needrun_is_external=True
)
)
):🤖 Prompt for AI Agents
In src/snakemake/dag.py around lines 416 to 434, the GroupJob inner-loop
incorrectly references the outer `job` when making decisions about each inner
job; change references so checks use `inner_job` instead: replace `job.is_norun`
with `inner_job.is_norun`, and pass `inner_job` (not `job`) as the second
argument to `is_external_input()`, ensuring retrieval logic evaluates each inner
job's properties and dependencies correctly.
🤖 I have created a release *beep* *boop* --- ## [9.14.2](v9.14.1...v9.14.2) (2025-12-05) ### Bug Fixes * correctly handle groups when retrieving storage input ([#3861](#3861)) ([8c818f4](8c818f4)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
…#3861) <!--Add a description of your PR here--> ### QC <!-- Make sure that you can tick the boxes below. --> * [x] The PR contains a test case for the changes or the changes are already covered by an existing test case. * [x] The documentation (`docs/`) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Improved handling of grouped workflow tasks when retrieving storage inputs, ensuring proper aggregation across all tasks within a group rather than treating them as a single unit. * Enhanced type safety with explicit parameter validation. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->
🤖 I have created a release *beep* *boop* --- ## [9.14.2](snakemake/snakemake@v9.14.1...v9.14.2) (2025-12-05) ### Bug Fixes * correctly handle groups when retrieving storage input ([snakemake#3861](snakemake#3861)) ([8c818f4](snakemake@8c818f4)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
QC
docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.