Skip to content

Conversation

@kdm9
Copy link
Contributor

@kdm9 kdm9 commented May 6, 2025

This PR makes checkpoints.get() behave similarly to expand(), taking the Cartesian product of all wildcard values. This gets around issues with having to call .get() multiple times, like #3475. There is a bit more detail in a comment I left in that issue

I have tested this locally on a workflow and it appears to behave as expected, but I would appreciate any feedback on the logic here from core devs, because I am only 90% sure I understand the way incomplete checkpoints are handled.

best,
Kevin

Summary by CodeRabbit

  • New Features
    • Improved handling of wildcards in checkpoints, supporting multiple wildcard values and their combinations.
  • Bug Fixes
    • Enhanced exception handling for incomplete checkpoints to report multiple missing outputs with clearer error messages.
  • Refactor
    • Updated internal logic to process and report multiple checkpoint target files consistently during workflow expansions.

@kdm9 kdm9 requested a review from johanneskoester as a code owner May 6, 2025 01:17
@coderabbitai
Copy link
Contributor

coderabbitai bot commented May 6, 2025

📝 Walkthrough

"""

Walkthrough

The changes introduce support for handling multiple checkpoint outputs in Snakemake's workflow management. A new utility function generates Cartesian products of wildcard lists, and the checkpoint logic is updated to process all combinations, accumulating jobs or missing outputs as needed. Exception handling and related methods are updated to work with collections of files instead of single files. Internal methods in rules are adjusted to handle multiple missing checkpoint files consistently.

Changes

File(s) Change Summary
src/snakemake/checkpoints.py Added dictproduct function to yield all combinations of wildcard lists. Updated Checkpoint.get to convert wildcards to lists, iterate over all output combinations, accumulate CheckpointJob instances for complete outputs, or collect missing outputs, raising an exception if incomplete. Added wildcards attribute to CheckpointJob.
src/snakemake/exceptions.py Modified IncompleteCheckpointException constructor to accept and store a list of target files instead of a single file. Updated instance attribute and error message construction accordingly.
src/snakemake/rules.py Updated internal exception handling in expand_input and expand_params to process multiple target files from IncompleteCheckpointException. Adjusted logic to check and report all missing checkpoint files rather than a single file.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Rule
    participant Checkpoint
    participant Exception

    User->>Rule: expand_input / expand_params
    Rule->>Checkpoint: get(**wildcards)
    Checkpoint->>Checkpoint: dictproduct(wildcard_lists)
    loop For each wildcard combination
        Checkpoint->>Checkpoint: expand outputs
        Checkpoint->>Checkpoint: check if outputs exist
        alt Outputs complete
            Checkpoint->>Rule: accumulate CheckpointJob
        else Outputs missing
            Checkpoint->>Checkpoint: collect missing outputs
        end
    end
    alt No missing outputs
        Checkpoint->>Rule: return list of CheckpointJobs
    else Missing outputs exist
        Checkpoint->>Exception: raise IncompleteCheckpointException(missing_outputs)
        Exception-->>Rule: exception.targetfiles
    end
    Rule->>User: Handle jobs or missing outputs
Loading

Suggested reviewers

  • johanneskoester
    """
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@kdm9 kdm9 changed the title DRAFT: pluralise checkpoints.get() WIP fix: pluralise checkpoints.get() May 6, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
src/snakemake/exceptions.py (1)

551-569: Plural-aware attribute but still singular wording & potential backward-compat break

The constructor now accepts targetfiles and stores them on self.targetfiles, but
 • the explanatory error text still speaks of a single “checkpoint output”; consider making it plural for clarity.
 • callers that were accessing the previous exception.targetfile will now AttributeError unless all call-sites were updated (we already saw fixes in rules.py, but there may be more).

A low-cost way to keep old extensions working is to provide a deprecated alias:

@@
-        self.targetfiles = [checkpoint_target(targetfile) for targetfile in targetfiles]
+        self.targetfiles = [
+            checkpoint_target(targetfile) for targetfile in targetfiles
+        ]
+
+        # Backwards-compat shim (to be removed in a future major release)
+        if self.targetfiles:
+            self.targetfile = self.targetfiles[0]

You may also want to s/“output is”/“outputs are”/ in the message string.

src/snakemake/checkpoints.py (1)

50-56: nit: prefer def over assigned lambda & remove assert side-effect

Style guideline (E731) – avoid assigning a lambda to a variable.

-def dictproduct(listdict):
-    """itertools.product() for dict of lists that yields a dict for each combination of values"""
-    import itertools
-    assert(all(isinstance(v, list) for v in listdict.values()))
-    for x in itertools.product(*listdict.values()):
-        yield dict(zip(listdict.keys(), x))
+def dictproduct(listdict):
+    """
+    Cartesian product over a *dict of lists*.
+    Yields a new dict for every combination of values.
+    """
+    import itertools
+
+    if not all(isinstance(v, list) for v in listdict.values()):
+        raise ValueError("All wildcard values must be lists to build a cartesian product.")
+
+    for combination in itertools.product(*listdict.values()):
+        yield dict(zip(listdict.keys(), combination))
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 44b9536 and 2199369.

📒 Files selected for processing (3)
  • src/snakemake/checkpoints.py (2 hunks)
  • src/snakemake/exceptions.py (2 hunks)
  • src/snakemake/rules.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

  • src/snakemake/exceptions.py
  • src/snakemake/rules.py
  • src/snakemake/checkpoints.py
🧬 Code Graph Analysis (2)
src/snakemake/exceptions.py (1)
src/snakemake/io/__init__.py (1)
  • checkpoint_target (1297-1298)
src/snakemake/rules.py (1)
src/snakemake/common/tbdstring.py (1)
  • TBDString (3-96)
🪛 Ruff (0.8.2)
src/snakemake/checkpoints.py

71-71: Do not assign a lambda expression, use a def

Rewrite listify as a def

(E731)

⏰ Context from checks skipped due to timeout of 90000ms (36)
  • GitHub Check: tests (10, macos-latest, py312)
  • GitHub Check: tests (10, windows-latest, py312)
  • GitHub Check: tests (10, ubuntu-latest, py312)
  • GitHub Check: tests (10, ubuntu-latest, py311)
  • GitHub Check: tests (9, windows-latest, py312)
  • GitHub Check: tests (9, ubuntu-latest, py312)
  • GitHub Check: tests (9, ubuntu-latest, py311)
  • GitHub Check: tests (8, windows-latest, py312)
  • GitHub Check: tests (8, ubuntu-latest, py312)
  • GitHub Check: tests (8, ubuntu-latest, py311)
  • GitHub Check: tests (7, windows-latest, py312)
  • GitHub Check: tests (7, ubuntu-latest, py312)
  • GitHub Check: tests (7, ubuntu-latest, py311)
  • GitHub Check: tests (6, windows-latest, py312)
  • GitHub Check: tests (6, ubuntu-latest, py312)
  • GitHub Check: tests (6, ubuntu-latest, py311)
  • GitHub Check: tests (5, windows-latest, py312)
  • GitHub Check: tests (5, ubuntu-latest, py312)
  • GitHub Check: tests (5, ubuntu-latest, py311)
  • GitHub Check: tests (4, macos-latest, py312)
  • GitHub Check: tests (4, windows-latest, py312)
  • GitHub Check: tests (4, ubuntu-latest, py312)
  • GitHub Check: tests (4, ubuntu-latest, py311)
  • GitHub Check: tests (3, macos-latest, py312)
  • GitHub Check: tests (3, windows-latest, py312)
  • GitHub Check: tests (3, ubuntu-latest, py312)
  • GitHub Check: tests (3, ubuntu-latest, py311)
  • GitHub Check: tests (2, macos-latest, py312)
  • GitHub Check: tests (2, windows-latest, py312)
  • GitHub Check: tests (2, ubuntu-latest, py312)
  • GitHub Check: tests (2, ubuntu-latest, py311)
  • GitHub Check: tests (1, macos-latest, py312)
  • GitHub Check: tests (1, windows-latest, py312)
  • GitHub Check: tests (1, ubuntu-latest, py312)
  • GitHub Check: tests (1, ubuntu-latest, py311)
  • GitHub Check: apidocs
🔇 Additional comments (1)
src/snakemake/rules.py (1)

867-871: expand_input: returning a list may violate downstream type assumptions

handle_incomplete_checkpoint() now returns exception.targetfiles, i.e. a list.
_apply_wildcards() will subsequently iterate over that list and treat each entry like a regular path string, which is fine, but only if every consumer of the returned value is ready for multiple elements. Please double-check any logic that expected a single checkpoint target (e.g. duplicate-detection, ancient-flag propagation).

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
src/snakemake/checkpoints.py (1)

81-98: ⚠️ Potential issue

Logic bug: missing_outputs always populated → checkpoint never considered complete

Because missing_outputs.append(...) is executed unconditionally on line 95, the list is non-empty even when all outputs are present, so an IncompleteCheckpointException will always be raised.

             if self.checkpoints.created_output:
                 missing_output = set(output) - set(self.checkpoints.created_output)
                 if not missing_output:
                     complete_jobs.append(CheckpointJob(self.rule, output))
                 else:
                     logger.debug(
                         f"Missing checkpoint output for {self.rule.name} "
                         f"(wildcards: {wc}): {','.join(missing_output)} of {','.join(output)}"
                     )
-            missing_outputs.append(checkpoint_target(output[0]))
+                    # only record when something is actually missing
+                    missing_outputs.append(checkpoint_target(output[0]))

This fix will ensure that only actual missing outputs are tracked, allowing the function to properly return complete jobs when all required outputs exist.

🧹 Nitpick comments (3)
src/snakemake/checkpoints.py (3)

74-80: Rewrite lambda as a proper function

Instead of assigning a lambda expression to a variable, define a proper function. This improves readability and follows Python best practices.

-        listify = lambda x: (
-            [
-                x,
-            ]
-            if not isinstance(x, list)
-            else x
-        )
+        def listify(x):
+            return [x] if not isinstance(x, list) else x
🧰 Tools
🪛 Ruff (0.8.2)

74-80: Do not assign a lambda expression, use a def

Rewrite listify as a def

(E731)


81-98: Consider adding explicit condition checking for empty checkpoint outputs

The current logic doesn't handle the case where self.checkpoints.created_output is empty clearly. It would be better to have explicit handling for this case to make the logic more robust.

         for wc in dictproduct(listified):
             output, _ = self.rule.expand_output(wc)
             if self.checkpoints.created_output:
                 missing_output = set(output) - set(self.checkpoints.created_output)
                 if not missing_output:
                     complete_jobs.append(CheckpointJob(self.rule, output))
                 else:
                     logger.debug(
                         f"Missing checkpoint output for {self.rule.name} "
                         f"(wildcards: {wc}): {','.join(missing_output)} of {','.join(output)}"
                     )
+                    missing_outputs.append(checkpoint_target(output[0]))
+            else:
+                # If there are no created outputs yet, all outputs are missing
+                logger.debug(
+                    f"No checkpoint outputs created yet for {self.rule.name} (wildcards: {wc})"
+                )
+                missing_outputs.append(checkpoint_target(output[0]))
-            missing_outputs.append(checkpoint_target(output[0]))

96-98: Add a sanity check for the case when all outputs are complete

This would be a good opportunity to add a check to immediately return complete jobs if all combinations of wildcards produce complete outputs, before attempting to handle missing outputs.

         for wc in dictproduct(listified):
             # ... existing code ...
             
+        # If nothing is missing across *all* combinations, return immediately
+        if not missing_outputs:
+            return complete_jobs
+            
         if not missing_outputs:
             return complete_jobs
         raise IncompleteCheckpointException(self.rule, missing_outputs)

This redundancy ensures correct behavior even if the logic changes in the future.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2199369 and 1b88c36.

📒 Files selected for processing (1)
  • src/snakemake/checkpoints.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

  • src/snakemake/checkpoints.py
🧬 Code Graph Analysis (1)
src/snakemake/checkpoints.py (3)
src/snakemake/io/__init__.py (1)
  • checkpoint_target (1297-1298)
src/snakemake/rules.py (5)
  • output (301-302)
  • output (1367-1368)
  • expand_output (993-1004)
  • name (129-130)
  • name (133-134)
src/snakemake/exceptions.py (1)
  • IncompleteCheckpointException (550-568)
🪛 Ruff (0.8.2)
src/snakemake/checkpoints.py

74-80: Do not assign a lambda expression, use a def

Rewrite listify as a def

(E731)

⏰ Context from checks skipped due to timeout of 90000ms (33)
  • GitHub Check: tests (10, ubuntu-latest, py312)
  • GitHub Check: tests (10, ubuntu-latest, py311)
  • GitHub Check: tests (9, windows-latest, py312)
  • GitHub Check: tests (9, ubuntu-latest, py312)
  • GitHub Check: tests (9, ubuntu-latest, py311)
  • GitHub Check: tests (8, windows-latest, py312)
  • GitHub Check: tests (8, ubuntu-latest, py312)
  • GitHub Check: tests (8, ubuntu-latest, py311)
  • GitHub Check: tests (7, macos-latest, py312)
  • GitHub Check: tests (7, windows-latest, py312)
  • GitHub Check: tests (7, ubuntu-latest, py312)
  • GitHub Check: tests (7, ubuntu-latest, py311)
  • GitHub Check: tests (6, macos-latest, py312)
  • GitHub Check: tests (6, windows-latest, py312)
  • GitHub Check: tests (6, ubuntu-latest, py312)
  • GitHub Check: tests (6, ubuntu-latest, py311)
  • GitHub Check: tests (5, macos-latest, py312)
  • GitHub Check: tests (5, windows-latest, py312)
  • GitHub Check: tests (5, ubuntu-latest, py312)
  • GitHub Check: tests (5, ubuntu-latest, py311)
  • GitHub Check: tests (4, windows-latest, py312)
  • GitHub Check: tests (4, ubuntu-latest, py312)
  • GitHub Check: tests (4, ubuntu-latest, py311)
  • GitHub Check: tests (3, windows-latest, py312)
  • GitHub Check: tests (3, ubuntu-latest, py312)
  • GitHub Check: tests (3, ubuntu-latest, py311)
  • GitHub Check: tests (2, windows-latest, py312)
  • GitHub Check: tests (2, ubuntu-latest, py312)
  • GitHub Check: tests (2, ubuntu-latest, py311)
  • GitHub Check: tests (1, macos-latest, py312)
  • GitHub Check: tests (1, windows-latest, py312)
  • GitHub Check: tests (1, ubuntu-latest, py312)
  • GitHub Check: tests (1, ubuntu-latest, py311)
🔇 Additional comments (2)
src/snakemake/checkpoints.py (2)

51-58: Good implementation of dictionary Cartesian product utility

This new utility function provides a clean implementation for generating combinations of dictionary values, which is necessary for the pluralized checkpoints.get() function. The implementation is clear and well-documented, using Python's built-in itertools to efficiently generate all combinations.


51-98: Verify pluralization behavior with error-handling tests

The implementation for pluralizing checkpoints.get() looks good overall, but should be thoroughly tested with various edge cases:

  1. Multiple wildcards with list values
  2. Mix of single and list values for wildcards
  3. Empty lists for wildcard values
  4. Missing outputs for some combinations but not others

Do you have or can you develop test cases that verify these scenarios? This would help ensure the implementation correctly handles all potential use cases, especially around error conditions.

🧰 Tools
🪛 Ruff (0.8.2)

74-80: Do not assign a lambda expression, use a def

Rewrite listify as a def

(E731)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/snakemake/checkpoints.py (1)

74-80: Replace lambda with regular function definition

Using a lambda for function assignment is against Python best practices. Consider rewriting listify as a regular function.

-    listify = lambda x: (
-        [
-            x,
-        ]
-        if not isinstance(x, list)
-        else x
-    )
+    def listify(x):
+        if not isinstance(x, list):
+            return [x]
+        return x
🧰 Tools
🪛 Ruff (0.8.2)

74-80: Do not assign a lambda expression, use a def

Rewrite listify as a def

(E731)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1b88c36 and d6257b2.

📒 Files selected for processing (2)
  • src/snakemake/checkpoints.py (2 hunks)
  • src/snakemake/rules.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/snakemake/rules.py
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

  • src/snakemake/checkpoints.py
🧬 Code Graph Analysis (1)
src/snakemake/checkpoints.py (3)
src/snakemake/io/__init__.py (1)
  • checkpoint_target (1297-1298)
src/snakemake/rules.py (5)
  • output (301-302)
  • output (1367-1368)
  • expand_output (993-1004)
  • name (129-130)
  • name (133-134)
src/snakemake/exceptions.py (1)
  • IncompleteCheckpointException (550-568)
🪛 Ruff (0.8.2)
src/snakemake/checkpoints.py

74-80: Do not assign a lambda expression, use a def

Rewrite listify as a def

(E731)

⏰ Context from checks skipped due to timeout of 90000ms (35)
  • GitHub Check: tests (10, macos-latest, py312)
  • GitHub Check: tests (10, windows-latest, py312)
  • GitHub Check: tests (10, ubuntu-latest, py312)
  • GitHub Check: tests (10, ubuntu-latest, py311)
  • GitHub Check: tests (9, windows-latest, py312)
  • GitHub Check: tests (9, ubuntu-latest, py312)
  • GitHub Check: tests (9, ubuntu-latest, py311)
  • GitHub Check: tests (8, windows-latest, py312)
  • GitHub Check: tests (8, ubuntu-latest, py312)
  • GitHub Check: tests (8, ubuntu-latest, py311)
  • GitHub Check: tests (7, windows-latest, py312)
  • GitHub Check: tests (7, ubuntu-latest, py312)
  • GitHub Check: tests (7, ubuntu-latest, py311)
  • GitHub Check: tests (6, windows-latest, py312)
  • GitHub Check: tests (6, ubuntu-latest, py312)
  • GitHub Check: tests (6, ubuntu-latest, py311)
  • GitHub Check: tests (5, windows-latest, py312)
  • GitHub Check: tests (5, ubuntu-latest, py312)
  • GitHub Check: tests (5, ubuntu-latest, py311)
  • GitHub Check: tests (4, macos-latest, py312)
  • GitHub Check: tests (4, windows-latest, py312)
  • GitHub Check: tests (4, ubuntu-latest, py312)
  • GitHub Check: tests (4, ubuntu-latest, py311)
  • GitHub Check: tests (3, macos-latest, py312)
  • GitHub Check: tests (3, windows-latest, py312)
  • GitHub Check: tests (3, ubuntu-latest, py312)
  • GitHub Check: tests (3, ubuntu-latest, py311)
  • GitHub Check: tests (2, windows-latest, py312)
  • GitHub Check: tests (2, ubuntu-latest, py312)
  • GitHub Check: tests (2, ubuntu-latest, py311)
  • GitHub Check: tests (1, macos-latest, py312)
  • GitHub Check: tests (1, windows-latest, py312)
  • GitHub Check: tests (1, ubuntu-latest, py312)
  • GitHub Check: tests (1, ubuntu-latest, py311)
  • GitHub Check: apidocs
🔇 Additional comments (2)
src/snakemake/checkpoints.py (2)

51-58: Implementation of dictproduct looks good!

This utility function correctly generates the Cartesian product of dictionary values, effectively enabling the checkbox behavior to match expand() as described in the PR objectives. The implementation is clean and uses the efficient itertools.product function.


81-98: Logic for handling multiple wildcard combinations looks good

This implementation properly handles the Cartesian product of wildcard values, checks for completeness of outputs, and accumulates jobs or missing outputs as needed. The fix properly addresses the issue where missing outputs were always populated - now they're only added when something is actually missing.

The overall implementation aligns well with the PR objective of making checkpoints.get() behave similarly to expand() by taking the Cartesian product of all wildcard values.

@kdm9
Copy link
Contributor Author

kdm9 commented May 6, 2025

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented May 6, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
src/snakemake/checkpoints.py (3)

74-76: Consider defining listify as a regular function instead of a lambda

The helper function is correctly implemented but using a nested function definition with def would be more standard than a lambda assignment.

-        def listify(x):
-            return [x] if not isinstance(x, list) else x
+        def listify(x):
+            if not isinstance(x, list):
+                return [x]
+            return x

91-91: Consider adding a comment explaining the single-output tracking strategy

This line tracks only the first missing output per wildcard combination. Based on the retrieved learning, this is intentional and matches the previous behavior, but it might be helpful to add a comment clarifying this design decision for future developers.

-                    missing_outputs.append(checkpoint_target(output[0]))
+                    # Track only the first output - if one output is missing, the checkpoint is considered incomplete
+                    missing_outputs.append(checkpoint_target(output[0]))

93-93: Same tracking strategy in the "no outputs" case

As with line 91, this also tracks only the first output. The same comment recommendation applies here for consistency.

-                missing_outputs.append(checkpoint_target(output[0]))
+                # Track only the first output - if one output is missing, the checkpoint is considered incomplete
+                missing_outputs.append(checkpoint_target(output[0]))
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 84b6c7d and 1ccd404.

📒 Files selected for processing (1)
  • src/snakemake/checkpoints.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

  • src/snakemake/checkpoints.py
🧠 Learnings (1)
src/snakemake/checkpoints.py (1)
Learnt from: kdm9
PR: snakemake/snakemake#3562
File: src/snakemake/checkpoints.py:90-90
Timestamp: 2025-05-06T01:37:23.375Z
Learning: In Snakemake checkpoints implementation, tracking only the first missing output for each checkpoint is sufficient, because if one output is missing, all outputs for that checkpoint are considered incomplete. This was the behavior before PR #3562 and maintained in the pluralized `checkpoints.get()` implementation.
⏰ Context from checks skipped due to timeout of 90000ms (36)
  • GitHub Check: tests (10, macos-latest, py312)
  • GitHub Check: tests (10, windows-latest, py312)
  • GitHub Check: tests (10, ubuntu-latest, py312)
  • GitHub Check: tests (10, ubuntu-latest, py311)
  • GitHub Check: tests (9, windows-latest, py312)
  • GitHub Check: tests (9, ubuntu-latest, py312)
  • GitHub Check: tests (9, ubuntu-latest, py311)
  • GitHub Check: tests (8, windows-latest, py312)
  • GitHub Check: tests (8, ubuntu-latest, py312)
  • GitHub Check: tests (8, ubuntu-latest, py311)
  • GitHub Check: tests (7, windows-latest, py312)
  • GitHub Check: tests (7, ubuntu-latest, py312)
  • GitHub Check: tests (7, ubuntu-latest, py311)
  • GitHub Check: tests (6, windows-latest, py312)
  • GitHub Check: tests (6, ubuntu-latest, py312)
  • GitHub Check: tests (6, ubuntu-latest, py311)
  • GitHub Check: tests (5, windows-latest, py312)
  • GitHub Check: tests (5, ubuntu-latest, py312)
  • GitHub Check: tests (5, ubuntu-latest, py311)
  • GitHub Check: tests (4, macos-latest, py312)
  • GitHub Check: tests (4, windows-latest, py312)
  • GitHub Check: tests (4, ubuntu-latest, py312)
  • GitHub Check: tests (4, ubuntu-latest, py311)
  • GitHub Check: tests (3, macos-latest, py312)
  • GitHub Check: tests (3, windows-latest, py312)
  • GitHub Check: tests (3, ubuntu-latest, py312)
  • GitHub Check: tests (3, ubuntu-latest, py311)
  • GitHub Check: tests (2, macos-latest, py312)
  • GitHub Check: tests (2, windows-latest, py312)
  • GitHub Check: tests (2, ubuntu-latest, py312)
  • GitHub Check: tests (2, ubuntu-latest, py311)
  • GitHub Check: tests (1, macos-latest, py312)
  • GitHub Check: tests (1, windows-latest, py312)
  • GitHub Check: tests (1, ubuntu-latest, py312)
  • GitHub Check: apidocs
  • GitHub Check: tests (1, ubuntu-latest, py311)
🔇 Additional comments (2)
src/snakemake/checkpoints.py (2)

51-58: Well-implemented utility function for Cartesian product generation

This dictproduct function elegantly handles generating Cartesian products while maintaining the dictionary structure. It correctly implements the core functionality needed for the pluralized checkpoints.get() behavior described in the PR objectives.


77-96: Good implementation of pluralized checkpoint.get() with proper handling of wildcard combinations

The implementation correctly handles multiple wildcard combinations as intended by the PR objective. It:

  1. Transforms all wildcard values into lists
  2. Processes each combination using the new dictproduct function
  3. Correctly tracks complete and missing outputs
  4. Returns multiple complete checkpoint jobs when possible

The approach aligns with the behavior described in the retrieved learning - tracking only the first missing output for each checkpoint combination is sufficient since if one output is missing, all outputs for that combination are considered incomplete.

@kdm9 kdm9 changed the title WIP fix: pluralise checkpoints.get() fix: pluralise checkpoints.get() May 8, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/snakemake/checkpoints.py (1)

51-58: Well-implemented Cartesian product utility function

The dictproduct function provides a clean implementation to generate Cartesian products from dictionary values. It aligns well with the PR objective to make checkpoints.get() handle multiple wildcard values.

While the assertion ensures all values are lists, consider if other iterable types (tuples, sets) might be valid inputs in the future:

-    assert all(isinstance(v, list) for v in listdict.values())
+    assert all(hasattr(v, "__iter__") and not isinstance(v, str) for v in listdict.values())
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1ccd404 and 117490a.

📒 Files selected for processing (3)
  • src/snakemake/checkpoints.py (2 hunks)
  • src/snakemake/exceptions.py (2 hunks)
  • src/snakemake/rules.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/snakemake/exceptions.py
  • src/snakemake/rules.py
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

  • src/snakemake/checkpoints.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: kdm9
PR: snakemake/snakemake#3562
File: src/snakemake/checkpoints.py:90-90
Timestamp: 2025-05-06T01:37:23.382Z
Learning: In Snakemake checkpoints implementation, tracking only the first missing output for each checkpoint is sufficient, because if one output is missing, all outputs for that checkpoint are considered incomplete. This was the behavior before PR #3562 and maintained in the pluralized `checkpoints.get()` implementation.
src/snakemake/checkpoints.py (1)
Learnt from: kdm9
PR: snakemake/snakemake#3562
File: src/snakemake/checkpoints.py:90-90
Timestamp: 2025-05-06T01:37:23.382Z
Learning: In Snakemake checkpoints implementation, tracking only the first missing output for each checkpoint is sufficient, because if one output is missing, all outputs for that checkpoint are considered incomplete. This was the behavior before PR #3562 and maintained in the pluralized `checkpoints.get()` implementation.
⏰ Context from checks skipped due to timeout of 90000ms (35)
  • GitHub Check: tests (10, macos-latest, py312)
  • GitHub Check: tests (10, windows-latest, py312)
  • GitHub Check: tests (10, ubuntu-latest, py312)
  • GitHub Check: tests (10, ubuntu-latest, py311)
  • GitHub Check: tests (9, windows-latest, py312)
  • GitHub Check: tests (9, ubuntu-latest, py312)
  • GitHub Check: tests (9, ubuntu-latest, py311)
  • GitHub Check: tests (8, windows-latest, py312)
  • GitHub Check: tests (8, ubuntu-latest, py312)
  • GitHub Check: tests (8, ubuntu-latest, py311)
  • GitHub Check: tests (7, windows-latest, py312)
  • GitHub Check: tests (7, ubuntu-latest, py312)
  • GitHub Check: tests (7, ubuntu-latest, py311)
  • GitHub Check: tests (6, windows-latest, py312)
  • GitHub Check: tests (6, ubuntu-latest, py312)
  • GitHub Check: tests (6, ubuntu-latest, py311)
  • GitHub Check: tests (5, windows-latest, py312)
  • GitHub Check: tests (5, ubuntu-latest, py312)
  • GitHub Check: tests (5, ubuntu-latest, py311)
  • GitHub Check: tests (4, macos-latest, py312)
  • GitHub Check: tests (4, windows-latest, py312)
  • GitHub Check: tests (4, ubuntu-latest, py312)
  • GitHub Check: tests (4, ubuntu-latest, py311)
  • GitHub Check: tests (3, macos-latest, py312)
  • GitHub Check: tests (3, windows-latest, py312)
  • GitHub Check: tests (3, ubuntu-latest, py312)
  • GitHub Check: tests (3, ubuntu-latest, py311)
  • GitHub Check: tests (2, macos-latest, py312)
  • GitHub Check: tests (2, windows-latest, py312)
  • GitHub Check: tests (2, ubuntu-latest, py312)
  • GitHub Check: tests (2, ubuntu-latest, py311)
  • GitHub Check: tests (1, macos-latest, py312)
  • GitHub Check: tests (1, windows-latest, py312)
  • GitHub Check: tests (1, ubuntu-latest, py312)
  • GitHub Check: tests (1, ubuntu-latest, py311)
🔇 Additional comments (1)
src/snakemake/checkpoints.py (1)

74-96:

Details

❓ Verification inconclusive

Good implementation of pluralized get() method for checkpoints

The modified get() method now handles wildcards with list values by taking the Cartesian product of all combinations, similar to Snakemake's expand() function. This successfully addresses the issue of having to call .get() multiple times.

I notice you're maintaining the behavior of only tracking the first missing output per combination at line 91 and 93, which is consistent with the existing Snakemake behavior as mentioned in your feedback on a previous review.


🏁 Script executed:

#!/bin/bash
# Look for existing callers of checkpoints.get() to ensure they can handle multiple jobs
rg -A 3 "checkpoints\.[a-zA-Z_]+\.get\(" --glob "*.py" --glob "!src/snakemake/checkpoints.py"

Length of output: 95


I didn’t find any matches with the initial pattern, so let’s broaden the search to catch all potential calls to checkpoints.<name>.get():


🏁 Script executed:

#!/bin/bash
# Refined search for any checkpoints.<NAME>.get() usage across the repo
rg -n "checkpoints\.[[:alnum:]_]*\.get\(" --glob "*.py"

Length of output: 57


Please verify that all checkpoints.<rule>.get() calls handle multiple jobs

I didn’t find any internal usages of checkpoints.<name>.get() in the codebase—please manually confirm that any existing callers (including user-defined checkpoint invocations in Snakefiles or other workflow code) correctly process a list of CheckpointJob instances instead of a single job.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
src/snakemake/checkpoints.py (3)

51-58: dictproduct unnecessarily restricted to list – relax to any iterable & drop runtime assert

The helper breaks if a caller passes a tuple, set, or range, even though itertools.product() can handle any iterable.
By enforcing list, you create avoidable friction for users and clutter the call-site with list() casts that bring no semantic benefit.

-def dictproduct(listdict):
-    """itertools.product() for dict of lists that yields a dict for each combination of values"""
-    import itertools
-
-    assert all(isinstance(v, list) for v in listdict.values())
-    for x in itertools.product(*listdict.values()):
-        yield dict(zip(listdict.keys(), x))
+def dictproduct(mapping):
+    """Cartesian product of an arbitrary iterable-valued mapping.
+
+    Yields one dict per combination, preserving key order.
+    """
+    import itertools
+
+    for combo in itertools.product(*mapping.values()):
+        yield dict(zip(mapping.keys(), combo))

This change removes the overly tight contract while keeping behaviour identical for current callers.


74-78: Local listify() duplicates utility logic – consider extracting once

listify() is re-defined every call to get(). Moving it to a private module-level helper (or reusing an existing utility if one exists in the codebase) avoids the micro-overhead and improves discoverability.
Not critical, but worth tidying while you are touching this area.


80-96: Duplicate entries possible in missing_outputs – de-duplicate before raising

When several wildcard combinations miss the same first output file, that path is appended repeatedly.
This can bloat the IncompleteCheckpointException message without adding value.

-        missing_outputs = []
+        missing_outputs = set()
 ...
-                    missing_outputs.append(checkpoint_target(output[0]))
+                    missing_outputs.add(checkpoint_target(output[0]))
 ...
-                missing_outputs.append(checkpoint_target(output[0]))
+                missing_outputs.add(checkpoint_target(output[0]))
 ...
-        if not missing_outputs:
+        if not missing_outputs:
             return complete_jobs
-        raise IncompleteCheckpointException(self.rule, missing_outputs)
+        raise IncompleteCheckpointException(self.rule, sorted(missing_outputs))

Switching to a set keeps the user-facing diagnostics concise while preserving existing semantics.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 117490a and 1785a2e.

📒 Files selected for processing (3)
  • src/snakemake/checkpoints.py (2 hunks)
  • src/snakemake/exceptions.py (2 hunks)
  • src/snakemake/rules.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/snakemake/exceptions.py
  • src/snakemake/rules.py
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

  • src/snakemake/checkpoints.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: kdm9
PR: snakemake/snakemake#3562
File: src/snakemake/checkpoints.py:90-90
Timestamp: 2025-05-06T01:37:23.382Z
Learning: In Snakemake checkpoints implementation, tracking only the first missing output for each checkpoint is sufficient, because if one output is missing, all outputs for that checkpoint are considered incomplete. This was the behavior before PR #3562 and maintained in the pluralized `checkpoints.get()` implementation.
src/snakemake/checkpoints.py (1)
Learnt from: kdm9
PR: snakemake/snakemake#3562
File: src/snakemake/checkpoints.py:90-90
Timestamp: 2025-05-06T01:37:23.382Z
Learning: In Snakemake checkpoints implementation, tracking only the first missing output for each checkpoint is sufficient, because if one output is missing, all outputs for that checkpoint are considered incomplete. This was the behavior before PR #3562 and maintained in the pluralized `checkpoints.get()` implementation.
🪛 Pylint (3.3.7)
src/snakemake/checkpoints.py

[refactor] 99-99: Too few public methods (0/2)

(R0903)

⏰ Context from checks skipped due to timeout of 90000ms (35)
  • GitHub Check: tests (10, windows-latest, py312)
  • GitHub Check: tests (8, ubuntu-latest, py312)
  • GitHub Check: tests (9, ubuntu-latest, py312)
  • GitHub Check: tests (9, ubuntu-latest, py311)
  • GitHub Check: tests (9, windows-latest, py312)
  • GitHub Check: tests (10, ubuntu-latest, py311)
  • GitHub Check: tests (10, ubuntu-latest, py312)
  • GitHub Check: tests (8, windows-latest, py312)
  • GitHub Check: tests (8, ubuntu-latest, py311)
  • GitHub Check: tests (7, ubuntu-latest, py311)
  • GitHub Check: tests (5, windows-latest, py312)
  • GitHub Check: tests (7, ubuntu-latest, py312)
  • GitHub Check: tests (5, ubuntu-latest, py312)
  • GitHub Check: tests (6, macos-latest, py312)
  • GitHub Check: tests (7, windows-latest, py312)
  • GitHub Check: tests (6, windows-latest, py312)
  • GitHub Check: tests (4, windows-latest, py312)
  • GitHub Check: tests (4, macos-latest, py312)
  • GitHub Check: tests (6, ubuntu-latest, py312)
  • GitHub Check: tests (4, ubuntu-latest, py312)
  • GitHub Check: tests (4, ubuntu-latest, py311)
  • GitHub Check: tests (2, ubuntu-latest, py311)
  • GitHub Check: tests (5, ubuntu-latest, py311)
  • GitHub Check: tests (6, ubuntu-latest, py311)
  • GitHub Check: tests (3, windows-latest, py312)
  • GitHub Check: tests (2, windows-latest, py312)
  • GitHub Check: tests (3, ubuntu-latest, py311)
  • GitHub Check: tests (3, ubuntu-latest, py312)
  • GitHub Check: tests (2, ubuntu-latest, py312)
  • GitHub Check: tests (2, macos-latest, py312)
  • GitHub Check: tests (1, macos-latest, py312)
  • GitHub Check: tests (1, ubuntu-latest, py311)
  • GitHub Check: tests (1, windows-latest, py312)
  • GitHub Check: tests (1, ubuntu-latest, py312)
  • GitHub Check: apidocs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
src/snakemake/checkpoints.py (3)

51-58: Consider broadening dictproduct input contract and move import to top-level

Limiting values strictly to list prevents passing tuples, sets or other iterables that behave the same. In practice wildcard collections are often created with tuples. Dropping the hard assert (or relaxing it to collections.abc.Iterable) would make the helper more generally useful and avoid surprising AssertionErrors.

-assert all(isinstance(v, list) for v in listdict.values())
-for x in itertools.product(*listdict.values()):
+from collections.abc import Iterable
+assert all(isinstance(v, Iterable) and not isinstance(v, str) for v in listdict.values())
+for x in itertools.product(*(list(v) for v in listdict.values())):
     yield dict(zip(listdict.keys(), x))

This keeps behaviour identical for lists while accepting any non-string iterable.


74-78: Minor: local listify() could be inlined

listify is a one-liner used once to normalise wildcard values. Inlining it saves an indirection and avoids nesting an extra function inside get() (ruff(E731) complaints are gone already, but the helper is still arguably overkill).

-        def listify(x):
-            return [x] if not isinstance(x, list) else x
-
-        listified = {k: listify(v) for k, v in wildcards.items()}
+        listified = {k: v if isinstance(v, list) else [v] for k, v in wildcards.items()}

101-108: Expose wildcards in CheckpointJob.__slots__ but not in __repr__/equality

wildcards is now stored but CheckpointJob still lacks a customised __repr__, __eq__, etc.
This makes debugging harder and can lead to subtle bugs if two jobs with identical output but
different wildcard combinations compare equal.

Consider implementing __repr__ (and, if relied upon, __hash__/__eq__) that includes
wildcards for clarity.

🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 101-101: Too few public methods (0/2)

(R0903)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1785a2e and 694d047.

📒 Files selected for processing (1)
  • src/snakemake/checkpoints.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

  • src/snakemake/checkpoints.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: kdm9
PR: snakemake/snakemake#3562
File: src/snakemake/checkpoints.py:90-90
Timestamp: 2025-05-06T01:37:23.382Z
Learning: In Snakemake checkpoints implementation, tracking only the first missing output for each checkpoint is sufficient, because if one output is missing, all outputs for that checkpoint are considered incomplete. This was the behavior before PR #3562 and maintained in the pluralized `checkpoints.get()` implementation.
src/snakemake/checkpoints.py (1)
Learnt from: kdm9
PR: snakemake/snakemake#3562
File: src/snakemake/checkpoints.py:90-90
Timestamp: 2025-05-06T01:37:23.382Z
Learning: In Snakemake checkpoints implementation, tracking only the first missing output for each checkpoint is sufficient, because if one output is missing, all outputs for that checkpoint are considered incomplete. This was the behavior before PR #3562 and maintained in the pluralized `checkpoints.get()` implementation.
🧬 Code Graph Analysis (1)
src/snakemake/checkpoints.py (3)
src/snakemake/io/__init__.py (1)
  • checkpoint_target (1319-1320)
src/snakemake/rules.py (5)
  • output (301-302)
  • output (1383-1384)
  • expand_output (993-1004)
  • name (129-130)
  • name (133-134)
src/snakemake/exceptions.py (1)
  • IncompleteCheckpointException (550-568)
🪛 Pylint (3.3.7)
src/snakemake/checkpoints.py

[refactor] 101-101: Too few public methods (0/2)

(R0903)

⏰ Context from checks skipped due to timeout of 90000ms (35)
  • GitHub Check: tests (8, windows-latest, py312)
  • GitHub Check: tests (9, windows-latest, py312)
  • GitHub Check: tests (10, ubuntu-latest, py312)
  • GitHub Check: tests (10, windows-latest, py312)
  • GitHub Check: tests (8, macos-latest, py312)
  • GitHub Check: tests (9, ubuntu-latest, py312)
  • GitHub Check: tests (9, ubuntu-latest, py311)
  • GitHub Check: tests (10, ubuntu-latest, py311)
  • GitHub Check: tests (7, ubuntu-latest, py312)
  • GitHub Check: tests (8, ubuntu-latest, py312)
  • GitHub Check: tests (8, ubuntu-latest, py311)
  • GitHub Check: tests (5, ubuntu-latest, py312)
  • GitHub Check: tests (5, macos-latest, py312)
  • GitHub Check: tests (7, windows-latest, py312)
  • GitHub Check: tests (6, windows-latest, py312)
  • GitHub Check: tests (6, ubuntu-latest, py312)
  • GitHub Check: tests (4, ubuntu-latest, py311)
  • GitHub Check: tests (5, windows-latest, py312)
  • GitHub Check: tests (2, macos-latest, py312)
  • GitHub Check: tests (6, ubuntu-latest, py311)
  • GitHub Check: tests (7, ubuntu-latest, py311)
  • GitHub Check: tests (4, ubuntu-latest, py312)
  • GitHub Check: tests (3, ubuntu-latest, py312)
  • GitHub Check: tests (1, macos-latest, py312)
  • GitHub Check: tests (3, windows-latest, py312)
  • GitHub Check: tests (5, ubuntu-latest, py311)
  • GitHub Check: tests (4, windows-latest, py312)
  • GitHub Check: tests (2, ubuntu-latest, py311)
  • GitHub Check: tests (2, windows-latest, py312)
  • GitHub Check: tests (3, ubuntu-latest, py311)
  • GitHub Check: tests (2, ubuntu-latest, py312)
  • GitHub Check: tests (1, ubuntu-latest, py311)
  • GitHub Check: tests (1, ubuntu-latest, py312)
  • GitHub Check: tests (1, windows-latest, py312)
  • GitHub Check: apidocs
🔇 Additional comments (1)
src/snakemake/checkpoints.py (1)

95-98:

Details

✅ Verification successful

Return-type change (Checkpoint.get() may now return list[CheckpointJob]) – audit callers

Callers that previously assumed a single CheckpointJob (e.g. accessed .output directly) will now
receive a list when multiple wildcard combinations are complete. This silent API broadening can
cause TypeError: 'list' object has no attribute 'output'.

Please grep for .get( usages and ensure they handle both cases (or always a list). Example script:


🏁 Script executed:

#!/bin/bash
# Find potential call sites expecting a scalar result.
rg --python -n "\.get\([^)]*\)" | rg -v "src/snakemake/checkpoints.py"

Length of output: 481


🏁 Script executed:

#!/bin/bash
# Re-run grep for .get( calls in Python files, excluding the checkpoints implementation.
rg -n "\.get\(" -g "*.py" | rg -v "src/snakemake/checkpoints.py"

Length of output: 9239


No Checkpoint.get() usages detected – no audit required

I searched for calls to Checkpoint.get() outside of its own implementation and found none in the codebase. There are no external callers that would now receive a list instead of a single CheckpointJob.

(fixed version after coderabbit halucination)
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
src/snakemake/checkpoints.py (3)

51-58: Use WorkflowError and avoid per-call imports

dictproduct() currently:

  1. Raises a bare AssertionError when values aren’t lists – this leaks an internal detail to users.
  2. Imports itertools every time the function is called, which is unnecessary overhead.
-    import itertools
-    assert all(isinstance(v, list) for v in listdict.values())
-    for x in itertools.product(*listdict.values()):
+    from itertools import product
+
+    if not all(isinstance(v, list) for v in listdict.values()):
+        raise WorkflowError(
+            "All wildcard values passed to checkpoints.get() must be lists."
+        )
+
+    for x in product(*listdict.values()):
         yield dict(zip(listdict.keys(), x))

This surfaces a clearer, Snakemake-specific error and keeps the hot path import-free.


95-98: Document mixed return type

Checkpoint.get() can now return either a single CheckpointJob or a list of them.
Adding a short note to the method docstring will save users from having to inspect the implementation.


102-108: Prefer canonical Wildcards over raw dict

Downstream internals typically expect a snakemake.io.Wildcards instance. Storing a plain dict may force later conversions or cause subtle mismatches.

+from snakemake.io import Wildcards   # add near other imports
 ...
-        self.wildcards = wildcards
+        # Store as canonical Wildcards for consistency
+        self.wildcards = Wildcards(wildcards or {})

This keeps checkpoint jobs consistent with the rest of the Snakemake API.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 694d047 and 9d41f11.

📒 Files selected for processing (1)
  • src/snakemake/checkpoints.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

  • src/snakemake/checkpoints.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: kdm9
PR: snakemake/snakemake#3562
File: src/snakemake/checkpoints.py:90-90
Timestamp: 2025-05-06T01:37:23.382Z
Learning: In Snakemake checkpoints implementation, tracking only the first missing output for each checkpoint is sufficient, because if one output is missing, all outputs for that checkpoint are considered incomplete. This was the behavior before PR #3562 and maintained in the pluralized `checkpoints.get()` implementation.
src/snakemake/checkpoints.py (1)
Learnt from: kdm9
PR: snakemake/snakemake#3562
File: src/snakemake/checkpoints.py:90-90
Timestamp: 2025-05-06T01:37:23.382Z
Learning: In Snakemake checkpoints implementation, tracking only the first missing output for each checkpoint is sufficient, because if one output is missing, all outputs for that checkpoint are considered incomplete. This was the behavior before PR #3562 and maintained in the pluralized `checkpoints.get()` implementation.
🪛 Pylint (3.3.7)
src/snakemake/checkpoints.py

[refactor] 101-101: Too few public methods (0/2)

(R0903)

⏰ Context from checks skipped due to timeout of 90000ms (35)
  • GitHub Check: tests (10, windows-latest, py312)
  • GitHub Check: tests (10, ubuntu-latest, py311)
  • GitHub Check: tests (8, windows-latest, py312)
  • GitHub Check: tests (8, macos-latest, py312)
  • GitHub Check: tests (10, ubuntu-latest, py312)
  • GitHub Check: tests (9, ubuntu-latest, py312)
  • GitHub Check: tests (8, ubuntu-latest, py312)
  • GitHub Check: tests (9, windows-latest, py312)
  • GitHub Check: tests (9, ubuntu-latest, py311)
  • GitHub Check: tests (7, windows-latest, py312)
  • GitHub Check: tests (8, ubuntu-latest, py311)
  • GitHub Check: tests (7, macos-latest, py312)
  • GitHub Check: tests (7, ubuntu-latest, py311)
  • GitHub Check: tests (3, ubuntu-latest, py311)
  • GitHub Check: tests (6, macos-latest, py312)
  • GitHub Check: tests (6, ubuntu-latest, py311)
  • GitHub Check: tests (6, ubuntu-latest, py312)
  • GitHub Check: tests (1, ubuntu-latest, py311)
  • GitHub Check: tests (5, ubuntu-latest, py311)
  • GitHub Check: tests (7, ubuntu-latest, py312)
  • GitHub Check: tests (2, ubuntu-latest, py311)
  • GitHub Check: tests (6, windows-latest, py312)
  • GitHub Check: tests (5, windows-latest, py312)
  • GitHub Check: tests (5, ubuntu-latest, py312)
  • GitHub Check: tests (4, windows-latest, py312)
  • GitHub Check: tests (4, ubuntu-latest, py312)
  • GitHub Check: tests (3, ubuntu-latest, py312)
  • GitHub Check: tests (1, windows-latest, py312)
  • GitHub Check: tests (4, ubuntu-latest, py311)
  • GitHub Check: tests (3, windows-latest, py312)
  • GitHub Check: tests (2, windows-latest, py312)
  • GitHub Check: tests (2, macos-latest, py312)
  • GitHub Check: tests (1, ubuntu-latest, py312)
  • GitHub Check: tests (2, ubuntu-latest, py312)
  • GitHub Check: tests (1, macos-latest, py312)
🔇 Additional comments (1)
src/snakemake/checkpoints.py (1)

80-93: Double-wrapping of checkpoint targets resolved

Replacing checkpoint_target(output[0]) with output[0] eliminates the earlier double-wrapping bug and restores the original semantics (only the constructor wraps). Behaviour now matches the long-standing implementation – looks good.

@johanneskoester
Copy link
Contributor

The problem is that I think this would only solve the issue with calling mutliple instances of the same checkpoint rule. However, I would rather want to solve this for multiple checkpoint rules as well, ideally with one unified syntax. I have no good strategy yet how to do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants