Skip to content

Conversation

@ChristofferCOASD
Copy link
Contributor

@ChristofferCOASD ChristofferCOASD commented Sep 18, 2025

Hi,

I've observed an issue with the rate limiter where the main loop is being rerun by the scheduler before the specified wait time has elapsed.

Initially, I suspected the excessive logging of the message "Job rate limit reached..." was the culprit. However, after further investigation, it appears that the main loop is actually triggered multiple times, not just excessively logged.

Running the following with --max-jobs-per-timespan "1/1s" will produce the excesive logging messages. (They can become mucm much worse than this. As a workaound i removed the logging on our HPC)

nrs = range(20)
rule testRule:
    input:
        expand("OUTPUT/{i}.txt",i=nrs),

rule onefile:
    output:
        "OUTPUT/{number}.txt"
    shell:
        """
        mkdir -p OUTPUT
        echo "TEST" > {output}
        """
image

I believe the issue may stem from the semaphore being released when a job finishes. It's unclear to me why the release occurs at this point, as a completed job doesn't necessarily imply that a new slot in the rate limiter has become available.

In this PR, I've removed the semaphore release to initiate a discussion. I'm happy to explore other solutions or make additional changes — any clarification or feedback would be appreciated.

Thanks,
A happy Snakemake user

Summary by CodeRabbit

  • New Features

    • Scheduler now uses a slot-based model to track job occupancy, explicitly distinguishing local vs non-local jobs.
  • Performance

    • Fewer unnecessary wake-ups for non-local completions, lowering overhead on large workflows.
  • Bug Fixes

    • Corrected resource accounting so slots aren’t decremented incorrectly and updates are consistent across local/non-local completions.
    • Rate limiting now records submitted jobs accurately for local vs remote executions.

✏️ Tip: You can customize this high-level summary in your review settings.

@ChristofferCOASD ChristofferCOASD changed the title Do not release lock when job finished fix: Do not release lock when job finished Sep 18, 2025
@ChristofferCOASD
Copy link
Contributor Author

One idea I had was also to ignore localrules from the rate limit, since in our case the rate limit is mostly intended to limit the rate at which jobs are submitted to SLURM. Would this make sense in general?

@johanneskoester
Copy link
Contributor

One idea I had was also to ignore localrules from the rate limit, since in our case the rate limit is mostly intended to limit the rate at which jobs are submitted to SLURM. Would this make sense in general?

yes, absolutely!

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 23, 2025

📝 Walkthrough

Walkthrough

Replace count-based scheduler bookkeeping (_job_count) with a slot-based model (_job_slot, init sys.maxsize); update scheduling, allocation/deallocation, rate limiting, and semaphore wake logic to use _job_slot; annotate jobs with _job_slot (0 local, 1 non-local) and add _cores: 0 for non-local jobs.

Changes

Cohort / File(s) Summary
Scheduler: slot-based resource & wake logic
src/snakemake/scheduling/job_scheduler.py
Replace _job_count with _job_slot (initialized to sys.maxsize); update allocation/deallocation, knapsack/selection, rate-limited and non-rate-limited flows to use _job_slot; adjust _free_resources/update_available_resources to decrement _job_slot; modify semaphore wake/release behavior and wake-up control to follow slot semantics.
Job metadata: annotate job slot
src/snakemake/jobs.py
_get_scheduler_resources now emits _job_slot (0 for local, 1 for non-local) and adds _cores: 0 for non-local jobs; remove legacy _job_count and preserve filtering of non-TBD resources.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Runner as JobRunner
  participant Scheduler as JobScheduler
  participant RL as JobRateLimiter
  participant Sem as open_jobs (Semaphore)

  rect rgb(245,245,255)
    note left of Scheduler: Submission (rate-limited path)
    Runner->>Scheduler: submit(job with `_job_slot`)
    Scheduler->>RL: record_submission(job, local?/non-local?)
    RL-->>Scheduler: allow / queue
    alt dispatch
      Scheduler->>Sem: acquire()
      Sem-->>Scheduler: granted
      Scheduler->>Runner: dispatch job
    end
  end

  rect rgb(245,255,245)
    note left of Scheduler: Completion (local)
    Runner->>Scheduler: job_finished(local)
    Scheduler->>RL: notify_completion(local)
    Scheduler->>Sem: release()  %% release only for local jobs
    Scheduler->>Scheduler: update_available_resources (decrement `_job_slot`)
  end

  rect rgb(255,245,245)
    note left of Scheduler: Completion (non-local)
    Runner->>Scheduler: job_finished(non-local)
    Scheduler->>RL: notify_completion(non-local)
    Scheduler--xSem: no immediate release
    Scheduler->>Scheduler: update_available_resources (decrement `_job_slot`)
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Review focus: job_scheduler.py knapsack/selection and rate-limited paths, semaphore wake/release differences, and JobRateLimiter interaction points.
  • Check jobs.py resource annotation changes for compatibility with schedulers and any callers expecting _job_count.

Possibly related PRs

Suggested reviewers

  • johanneskoester

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings, 1 inconclusive)
Check name Status Explanation Resolution
Title check ⚠️ Warning The PR title 'fix: Do not release lock when job finished' is misleading; the actual changes involve replacing job_count with job_slot for resource control and adjusting rate limiter logic, not just removing a lock release. Update the title to accurately reflect the primary change, such as 'refactor: Replace job_count with job_slot for rate limiting' or 'fix: Refactor rate limiter to use slot-based resource control'.
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Description check ❓ Inconclusive The PR description provides context about the rate limiter issue but does not complete the required QC checklist sections for test coverage and documentation updates. Complete the QC checklist by confirming whether test cases exist for the job_slot changes and whether documentation updates are needed or already covered.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7500009 and a2023ca.

📒 Files selected for processing (1)
  • src/snakemake/scheduling/job_scheduler.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/snakemake/scheduling/job_scheduler.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (46)
  • GitHub Check: tests (10, ubuntu-latest, py313)
  • GitHub Check: tests (10, windows-2022, py313)
  • GitHub Check: tests (8, ubuntu-latest, py312)
  • GitHub Check: tests (9, ubuntu-latest, py312)
  • GitHub Check: tests (9, ubuntu-latest, py311)
  • GitHub Check: tests (9, macos-latest, py313)
  • GitHub Check: tests (10, ubuntu-latest, py312)
  • GitHub Check: tests (10, ubuntu-latest, py311)
  • GitHub Check: tests (9, windows-2022, py313)
  • GitHub Check: tests (9, ubuntu-latest, py313)
  • GitHub Check: tests (8, ubuntu-latest, py311)
  • GitHub Check: tests (8, windows-2022, py313)
  • GitHub Check: tests (8, ubuntu-latest, py313)
  • GitHub Check: tests (6, windows-2022, py313)
  • GitHub Check: tests (7, ubuntu-latest, py313)
  • GitHub Check: tests (7, windows-2022, py313)
  • GitHub Check: tests (7, ubuntu-latest, py311)
  • GitHub Check: tests (5, ubuntu-latest, py313)
  • GitHub Check: tests (5, windows-2022, py313)
  • GitHub Check: tests (7, ubuntu-latest, py312)
  • GitHub Check: tests (6, ubuntu-latest, py313)
  • GitHub Check: tests (6, ubuntu-latest, py311)
  • GitHub Check: tests (6, ubuntu-latest, py312)
  • GitHub Check: tests (3, ubuntu-latest, py312)
  • GitHub Check: tests (4, windows-2022, py313)
  • GitHub Check: tests (3, macos-latest, py313)
  • GitHub Check: tests (4, ubuntu-latest, py312)
  • GitHub Check: tests (4, macos-latest, py313)
  • GitHub Check: tests (3, ubuntu-latest, py313)
  • GitHub Check: tests (4, ubuntu-latest, py313)
  • GitHub Check: tests (2, ubuntu-latest, py311)
  • GitHub Check: tests (5, ubuntu-latest, py312)
  • GitHub Check: tests (5, ubuntu-latest, py311)
  • GitHub Check: tests (1, ubuntu-latest, py312)
  • GitHub Check: tests (2, windows-2022, py313)
  • GitHub Check: tests (4, ubuntu-latest, py311)
  • GitHub Check: tests (3, ubuntu-latest, py311)
  • GitHub Check: tests (3, windows-2022, py313)
  • GitHub Check: tests (2, ubuntu-latest, py313)
  • GitHub Check: tests (2, macos-latest, py313)
  • GitHub Check: tests (2, ubuntu-latest, py312)
  • GitHub Check: tests (1, macos-latest, py313)
  • GitHub Check: tests (1, ubuntu-latest, py311)
  • GitHub Check: tests (1, windows-2022, py313)
  • GitHub Check: tests (1, ubuntu-latest, py313)
  • GitHub Check: apidocs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ChristofferCOASD
Copy link
Contributor Author

ChristofferCOASD commented Sep 23, 2025

One idea I had was also to ignore localrules from the rate limit, since in our case the rate limit is mostly intended to limit the rate at which jobs are submitted to SLURM. Would this make sense in general?

yes, absolutely!

Great, here is my idea I will work on implementing:

  • I would like to have remote jobs have the "cores" resource be 0 such that the cores resource limit can be kept in place. This will allow the user still to limit resource usage of local jobs. It could be though of as "local cores".
  • Remote jobs should instead of being restricted by job_count be restricted by a new resource limit like job_slots which will represent the number of available remote execution slots either limited by the rate limiter or by the user specified max_jobs.
  • All jobs will still keep the ´job_count=1and thejob_count=sys.maxsize` will still stay to keep a hard limit on number of jobs.
  • The rate limiter will then be changed away from determining job_count to instead determine job_slots

From my understanding, this will result in the desired behaviour:

  • When executing on a cluster, only remote jobs will be rate limited
  • The when specifying jobs: 100 this limits the number of remote jobs, not the local ones
  • Local jobs will respect the "cores" limit even when using rate limiter and on the cluster

Edits/Notes:

  • I considered if _nodes could be used instead of introducing a new _jobs_slot, but they are not interchangeable I don't think since you can have jobs requiring multiple nodes but only acting as a single submission in terms of rate limiting.
  • There could maybe be found an improvement in schedule_reevaluation since local rules would no longer count in the rate limit, there might be several jobs ready even before there is space in the rate limiter.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
src/snakemake/scheduling/job_scheduler.py (2)

585-591: Assertion message nit: say “job slots” instead of “job count”.

Update the message to match the new resource name and avoid confusion.

-            ), f"Job count is {self.resources['_job_slot']}, but should be {sys.maxsize}"
+            ), f"Job slots is {self.resources['_job_slot']}, but should be {sys.maxsize}"

597-601: Register only remote jobs: good; minor cleanups.

  • Typo in comment (“remtoe”).
  • Avoid building a list just to take its length.
-            # update job rate limiter with remtoe jobs only
-            self.job_rate_limiter.register_jobs(len([job for job in selected if not job.is_local]))
+            # update job rate limiter with remote jobs only
+            n_remote = sum(1 for job in selected if not job.is_local)
+            if n_remote:
+                self.job_rate_limiter.register_jobs(n_remote)
src/snakemake/jobs.py (1)

120-121: Drop unused _job_count fieldres_dict["_job_count"] isn’t consumed anywhere in the repo; remove the assignment at src/snakemake/jobs.py:120 and plan full removal (or deprecation warning and removal in the next major release).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ff9c68f and 13fb2ae.

📒 Files selected for processing (2)
  • src/snakemake/jobs.py (2 hunks)
  • src/snakemake/scheduling/job_scheduler.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • src/snakemake/scheduling/job_scheduler.py
  • src/snakemake/jobs.py
🧬 Code graph analysis (1)
src/snakemake/scheduling/job_scheduler.py (2)
src/snakemake/jobs.py (6)
  • name (1285-1286)
  • name (1671-1675)
  • resources (492-513)
  • resources (1520-1534)
  • is_local (1035-1043)
  • is_local (1682-1683)
src/snakemake/workflow.py (3)
  • name (2326-2331)
  • resources (2242-2247)
  • is_local (704-708)
🪛 GitHub Check: Check for spelling errors
src/snakemake/jobs.py

[failure] 108-108:
ocupy ==> occupy

🪛 GitHub Actions: Codespell
src/snakemake/jobs.py

[error] 108-108: codespell: ocupy ==> occupy (typo found by codespell)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (45)
  • GitHub Check: tests (10, ubuntu-latest, py313)
  • GitHub Check: tests (10, ubuntu-latest, py312)
  • GitHub Check: tests (10, windows-2022, py313)
  • GitHub Check: tests (9, windows-2022, py313)
  • GitHub Check: tests (9, ubuntu-latest, py313)
  • GitHub Check: tests (10, ubuntu-latest, py311)
  • GitHub Check: tests (8, ubuntu-latest, py313)
  • GitHub Check: tests (9, ubuntu-latest, py312)
  • GitHub Check: tests (7, ubuntu-latest, py313)
  • GitHub Check: tests (9, ubuntu-latest, py311)
  • GitHub Check: tests (7, windows-2022, py313)
  • GitHub Check: tests (8, windows-2022, py313)
  • GitHub Check: tests (6, windows-2022, py313)
  • GitHub Check: tests (6, ubuntu-latest, py312)
  • GitHub Check: tests (8, ubuntu-latest, py312)
  • GitHub Check: tests (8, ubuntu-latest, py311)
  • GitHub Check: tests (6, ubuntu-latest, py313)
  • GitHub Check: tests (7, ubuntu-latest, py311)
  • GitHub Check: tests (7, ubuntu-latest, py312)
  • GitHub Check: tests (6, ubuntu-latest, py311)
  • GitHub Check: tests (6, macos-latest, py313)
  • GitHub Check: tests (5, ubuntu-latest, py313)
  • GitHub Check: tests (5, windows-2022, py313)
  • GitHub Check: tests (3, windows-2022, py313)
  • GitHub Check: tests (4, macos-latest, py313)
  • GitHub Check: tests (4, windows-2022, py313)
  • GitHub Check: tests (4, ubuntu-latest, py312)
  • GitHub Check: tests (5, ubuntu-latest, py311)
  • GitHub Check: tests (4, ubuntu-latest, py311)
  • GitHub Check: tests (5, ubuntu-latest, py312)
  • GitHub Check: tests (4, ubuntu-latest, py313)
  • GitHub Check: tests (3, ubuntu-latest, py313)
  • GitHub Check: tests (3, macos-latest, py313)
  • GitHub Check: tests (2, ubuntu-latest, py313)
  • GitHub Check: tests (2, macos-latest, py313)
  • GitHub Check: tests (3, ubuntu-latest, py312)
  • GitHub Check: tests (2, ubuntu-latest, py312)
  • GitHub Check: tests (1, ubuntu-latest, py311)
  • GitHub Check: tests (3, ubuntu-latest, py311)
  • GitHub Check: tests (1, windows-2022, py313)
  • GitHub Check: tests (1, macos-latest, py313)
  • GitHub Check: tests (2, ubuntu-latest, py311)
  • GitHub Check: tests (1, ubuntu-latest, py312)
  • GitHub Check: tests (1, ubuntu-latest, py313)
  • GitHub Check: apidocs
🔇 Additional comments (3)
src/snakemake/scheduling/job_scheduler.py (2)

501-505: Do not free _job_slot: LGTM.

Excluding _job_slot from free logic is correct since slots are governed by the rate limiter’s timespan, not by job completion.


604-607: Skip _job_slot in resource deduction: LGTM.

This preserves the rate limiter as the single source of truth for slot capacity within a timespan.

src/snakemake/jobs.py (1)

119-119: Non-local jobs occupy one job slot: LGTM.

This aligns job-level accounting with the scheduler’s _job_slot capacity.

Comment on lines 517 to 521
else:
# Finished local job could mean free execution slot
if job.is_local:
self._open_jobs.release()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: remote job completions no longer wake the scheduler when not rate-limited.

With this change, a normal (non–rate-limited) run won’t release the semaphore on remote finishes, risking the main loop blocking indefinitely after the first scheduling round.

Apply:

-            else:
-                # Finished local job could mean free execution slot
-                if job.is_local:
-                    self._open_jobs.release()
+            else:
+                # Without a rate limiter, any completion should wake the scheduler.
+                # With a rate limiter, only local completions should wake it; remote
+                # completions are governed by the timespan timer to avoid log spam.
+                if self.job_rate_limiter is None or job.is_local:
+                    self._open_jobs.release()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
else:
# Finished local job could mean free execution slot
if job.is_local:
self._open_jobs.release()
else:
# Without a rate limiter, any completion should wake the scheduler.
# With a rate limiter, only local completions should wake it; remote
# completions are governed by the timespan timer to avoid log spam.
if self.job_rate_limiter is None or job.is_local:
self._open_jobs.release()
🤖 Prompt for AI Agents
In src/snakemake/scheduling/job_scheduler.py around lines 517-521, the code only
releases self._open_jobs for local job completions which causes remote
completions to not wake the scheduler when the run is not rate-limited; change
the condition so that the semaphore is released for any finished job when the
scheduler is not currently rate-limited (e.g., replace the current if
job.is_local: release() with a check like if job.is_local or not
self.<rate_limited_flag_or_method> : self._open_jobs.release()), ensuring you
use the correct existing rate-limit attribute/method on the scheduler.

@ChristofferCOASD
Copy link
Contributor Author

For the sake of completeness i made the following to enable testing things related to remote execusion:

Snakefile:

rule all:
    input:
        "output.txt"

rule LOCAL:
    localrule: True
    output:
        "LOCAL/{i}.txt"
    shell:
        """
        echo "==== LOCAL ====="
        touch {output}
        """

rule REMOTE:
    output:
        "REMOTE/{i}.txt"
    shell:
        """
        sleep 10
        touch {output}
        """

rule C:
    input:
        expand("LOCAL/{i}.txt", i=range(1, 4)) +
        expand("REMOTE/{i}.txt", i=range(1, 4))
    output:
        "output.txt"
    shell:
        """
        cat {input} > {output}
        """

yaml profile:

executor: cluster-generic
jobs: 4
rerun-incomplete: True
default_resources: []
quiet: ["rules"]
max-jobs-per-timespan: "1/1s"
cluster-generic-submit-cmd: "ts -L {rule}-{jobid}"
cluster-generic-status-cmd: "./ts_status.sh"
cluster-generic-cancel-cmd: "ts -r {jobid}"

taskspooler status wrapper:

#!/usr/bin/env bash
jobid="$1"
state=$(ts -s "$jobid" 2>/dev/null || echo "unknown")

case "$state" in
    running|queued)
        echo "running"
        ;;
    finished)
        echo "success"
        ;;
    error|failed)
        echo "failed"
        ;;
    *)
        # if the job no longer exists, assume finished successfully
        echo "success"
        ;;
esac

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/snakemake/scheduling/job_scheduler.py (1)

517-518: Critical: unconditional semaphore release defeats rate limiting.

The unconditional release() causes the scheduler to wake on every remote job completion, even when rate-limited. This produces the excessive "Job rate limit reached..." log spam described in the PR objectives. When rate-limited, only local job completions should wake the scheduler—remote completions are governed by the timer at line 373.

Apply this diff to condition the wake-up:

             else:
-                self._open_jobs.release()
+                # Without a rate limiter, any completion should wake the scheduler.
+                # With a rate limiter, only local completions should wake it; remote
+                # completions are governed by the timespan timer to avoid log spam.
+                if self.job_rate_limiter is None or job.is_local:
+                    self._open_jobs.release()
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 99fae19 and bfa2971.

📒 Files selected for processing (1)
  • src/snakemake/scheduling/job_scheduler.py (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • src/snakemake/scheduling/job_scheduler.py
🧬 Code graph analysis (1)
src/snakemake/scheduling/job_scheduler.py (1)
src/snakemake/jobs.py (2)
  • is_local (1035-1043)
  • is_local (1682-1683)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (46)
  • GitHub Check: tests (10, ubuntu-latest, py313)
  • GitHub Check: tests (10, ubuntu-latest, py311)
  • GitHub Check: tests (10, ubuntu-latest, py312)
  • GitHub Check: tests (7, ubuntu-latest, py311)
  • GitHub Check: tests (10, windows-2022, py313)
  • GitHub Check: tests (9, windows-2022, py313)
  • GitHub Check: tests (5, ubuntu-latest, py311)
  • GitHub Check: tests (9, ubuntu-latest, py313)
  • GitHub Check: tests (8, ubuntu-latest, py311)
  • GitHub Check: tests (9, ubuntu-latest, py312)
  • GitHub Check: tests (8, macos-latest, py313)
  • GitHub Check: tests (5, macos-latest, py313)
  • GitHub Check: tests (8, ubuntu-latest, py312)
  • GitHub Check: tests (8, windows-2022, py313)
  • GitHub Check: tests (7, ubuntu-latest, py313)
  • GitHub Check: tests (8, ubuntu-latest, py313)
  • GitHub Check: tests (7, ubuntu-latest, py312)
  • GitHub Check: tests (6, ubuntu-latest, py312)
  • GitHub Check: tests (5, windows-2022, py313)
  • GitHub Check: tests (6, ubuntu-latest, py313)
  • GitHub Check: tests (6, windows-2022, py313)
  • GitHub Check: tests (4, macos-latest, py313)
  • GitHub Check: tests (9, ubuntu-latest, py311)
  • GitHub Check: tests (7, windows-2022, py313)
  • GitHub Check: tests (6, macos-latest, py313)
  • GitHub Check: tests (2, windows-2022, py313)
  • GitHub Check: tests (5, ubuntu-latest, py312)
  • GitHub Check: tests (4, windows-2022, py313)
  • GitHub Check: tests (6, ubuntu-latest, py311)
  • GitHub Check: tests (5, ubuntu-latest, py313)
  • GitHub Check: tests (3, ubuntu-latest, py311)
  • GitHub Check: tests (4, ubuntu-latest, py313)
  • GitHub Check: tests (2, ubuntu-latest, py313)
  • GitHub Check: tests (3, windows-2022, py313)
  • GitHub Check: tests (4, ubuntu-latest, py311)
  • GitHub Check: tests (4, ubuntu-latest, py312)
  • GitHub Check: tests (3, ubuntu-latest, py312)
  • GitHub Check: tests (3, ubuntu-latest, py313)
  • GitHub Check: tests (1, macos-latest, py313)
  • GitHub Check: tests (1, ubuntu-latest, py313)
  • GitHub Check: tests (2, ubuntu-latest, py311)
  • GitHub Check: tests (2, ubuntu-latest, py312)
  • GitHub Check: tests (1, ubuntu-latest, py311)
  • GitHub Check: tests (1, windows-2022, py313)
  • GitHub Check: tests (1, ubuntu-latest, py312)
  • GitHub Check: formatting
🔇 Additional comments (6)
src/snakemake/scheduling/job_scheduler.py (6)

103-103: Good: slot-based resource model introduced.

The rename from _job_count to _job_slot aligns with the PR's objective to separate local and remote job tracking. Based on past reviews, ensure that _job_count is fully replaced throughout the codebase, particularly in src/snakemake/jobs.py around line 120.


503-504: LGTM: correct resource freeing behavior.

Excluding _job_slot from resource freeing is correct because slot availability is managed by the JobRateLimiter timer, not by individual job completions.


585-588: LGTM: assertion updated correctly.

The assertion now correctly verifies that _job_slot remains unrestricted when not rate-limited, consistent with the slot-based model.


595-595: LGTM: slot resource updated before scheduling.

Correctly sets _job_slot to available capacity before job selection when rate-limited.


598-598: LGTM: local rules correctly excluded from rate limiting.

Filtering to register only remote jobs with the rate limiter implements the PR objective and discussion agreement to exclude localrules from rate limiting (which is primarily intended for SLURM/cluster submissions).


603-607: LGTM: slot resource excluded from job-level updates.

Correctly excludes _job_slot from per-job resource accounting, as it's managed by JobRateLimiter.register_jobs() at line 598.

ChristofferRS and others added 2 commits October 4, 2025 14:46
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@johanneskoester
Copy link
Contributor

Is this ready for a merge?

@ChristofferCOASD
Copy link
Contributor Author

@johanneskoester let me do another test today on our cluster just to make sure.

@ChristofferCOASD
Copy link
Contributor Author

ChristofferCOASD commented Dec 2, 2025

From my testing i still need to do a bit of work here. The local rules are properly excluded from the rate limit, but still there is an issue with how often the rate limitter is being "hit" in such a way it prints "Job rate limit reached, waiting for free slots.

From what i can tell #3421 also might have som overlap with any changes i might make in what resources are assigned to local rules to allow propper job selection. I will wait until it is merged to look at this any further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants