fix: Do not release lock when job finished #3744

ChristofferCOASD · 2025-09-18T08:35:16Z

Hi,

I've observed an issue with the rate limiter where the main loop is being rerun by the scheduler before the specified wait time has elapsed.

Initially, I suspected the excessive logging of the message "Job rate limit reached..." was the culprit. However, after further investigation, it appears that the main loop is actually triggered multiple times, not just excessively logged.

Running the following with --max-jobs-per-timespan "1/1s" will produce the excesive logging messages. (They can become mucm much worse than this. As a workaound i removed the logging on our HPC)

nrs = range(20)
rule testRule:
    input:
        expand("OUTPUT/{i}.txt",i=nrs),

rule onefile:
    output:
        "OUTPUT/{number}.txt"
    shell:
        """
        mkdir -p OUTPUT
        echo "TEST" > {output}
        """

I believe the issue may stem from the semaphore being released when a job finishes. It's unclear to me why the release occurs at this point, as a completed job doesn't necessarily imply that a new slot in the rate limiter has become available.

In this PR, I've removed the semaphore release to initiate a discussion. I'm happy to explore other solutions or make additional changes — any clarification or feedback would be appreciated.

Thanks,
A happy Snakemake user

Summary by CodeRabbit

New Features
- Scheduler now uses a slot-based model to track job occupancy, explicitly distinguishing local vs non-local jobs.
Performance
- Fewer unnecessary wake-ups for non-local completions, lowering overhead on large workflows.
Bug Fixes
- Corrected resource accounting so slots aren’t decremented incorrectly and updates are consistent across local/non-local completions.
- Rate limiting now records submitted jobs accurately for local vs remote executions.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

ChristofferCOASD · 2025-09-18T11:10:28Z

One idea I had was also to ignore localrules from the rate limit, since in our case the rate limit is mostly intended to limit the rate at which jobs are submitted to SLURM. Would this make sense in general?

johanneskoester · 2025-09-23T10:32:20Z

One idea I had was also to ignore localrules from the rate limit, since in our case the rate limit is mostly intended to limit the rate at which jobs are submitted to SLURM. Would this make sense in general?

yes, absolutely!

coderabbitai · 2025-09-23T10:32:37Z

📝 Walkthrough

Walkthrough

Replace count-based scheduler bookkeeping (_job_count) with a slot-based model (_job_slot, init sys.maxsize); update scheduling, allocation/deallocation, rate limiting, and semaphore wake logic to use _job_slot; annotate jobs with _job_slot (0 local, 1 non-local) and add _cores: 0 for non-local jobs.

Changes

Cohort / File(s)	Summary
Scheduler: slot-based resource & wake logic `src/snakemake/scheduling/job_scheduler.py`	Replace `_job_count` with `_job_slot` (initialized to `sys.maxsize`); update allocation/deallocation, knapsack/selection, rate-limited and non-rate-limited flows to use `_job_slot`; adjust `_free_resources`/`update_available_resources` to decrement `_job_slot`; modify semaphore wake/release behavior and wake-up control to follow slot semantics.
Job metadata: annotate job slot `src/snakemake/jobs.py`	`_get_scheduler_resources` now emits `_job_slot` (0 for local, 1 for non-local) and adds `_cores: 0` for non-local jobs; remove legacy `_job_count` and preserve filtering of non-TBD resources.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Runner as JobRunner
  participant Scheduler as JobScheduler
  participant RL as JobRateLimiter
  participant Sem as open_jobs (Semaphore)

  rect rgb(245,245,255)
    note left of Scheduler: Submission (rate-limited path)
    Runner->>Scheduler: submit(job with `_job_slot`)
    Scheduler->>RL: record_submission(job, local?/non-local?)
    RL-->>Scheduler: allow / queue
    alt dispatch
      Scheduler->>Sem: acquire()
      Sem-->>Scheduler: granted
      Scheduler->>Runner: dispatch job
    end
  end

  rect rgb(245,255,245)
    note left of Scheduler: Completion (local)
    Runner->>Scheduler: job_finished(local)
    Scheduler->>RL: notify_completion(local)
    Scheduler->>Sem: release()  %% release only for local jobs
    Scheduler->>Scheduler: update_available_resources (decrement `_job_slot`)
  end

  rect rgb(255,245,245)
    note left of Scheduler: Completion (non-local)
    Runner->>Scheduler: job_finished(non-local)
    Scheduler->>RL: notify_completion(non-local)
    Scheduler--xSem: no immediate release
    Scheduler->>Scheduler: update_available_resources (decrement `_job_slot`)
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Review focus: job_scheduler.py knapsack/selection and rate-limited paths, semaphore wake/release differences, and JobRateLimiter interaction points.
Check jobs.py resource annotation changes for compatibility with schedulers and any callers expecting _job_count.

Possibly related PRs

chore: assert that jobcount is unrestricted if no max-jobs-per-timespan is set #3066: Prior edits around asserting/unrestricting _job_count and related scheduler bookkeeping that overlap with this refactor.

Suggested reviewers

johanneskoester

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings, 1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The PR title 'fix: Do not release lock when job finished' is misleading; the actual changes involve replacing job_count with job_slot for resource control and adjusting rate limiter logic, not just removing a lock release.	Update the title to accurately reflect the primary change, such as 'refactor: Replace job_count with job_slot for rate limiting' or 'fix: Refactor rate limiter to use slot-based resource control'.
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Description check	❓ Inconclusive	The PR description provides context about the rate limiter issue but does not complete the required QC checklist sections for test coverage and documentation updates.	Complete the QC checklist by confirming whether test cases exist for the job_slot changes and whether documentation updates are needed or already covered.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7500009 and a2023ca.

📒 Files selected for processing (1)

src/snakemake/scheduling/job_scheduler.py (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/snakemake/scheduling/job_scheduler.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (46)

GitHub Check: tests (10, ubuntu-latest, py313)
GitHub Check: tests (10, windows-2022, py313)
GitHub Check: tests (8, ubuntu-latest, py312)
GitHub Check: tests (9, ubuntu-latest, py312)
GitHub Check: tests (9, ubuntu-latest, py311)
GitHub Check: tests (9, macos-latest, py313)
GitHub Check: tests (10, ubuntu-latest, py312)
GitHub Check: tests (10, ubuntu-latest, py311)
GitHub Check: tests (9, windows-2022, py313)
GitHub Check: tests (9, ubuntu-latest, py313)
GitHub Check: tests (8, ubuntu-latest, py311)
GitHub Check: tests (8, windows-2022, py313)
GitHub Check: tests (8, ubuntu-latest, py313)
GitHub Check: tests (6, windows-2022, py313)
GitHub Check: tests (7, ubuntu-latest, py313)
GitHub Check: tests (7, windows-2022, py313)
GitHub Check: tests (7, ubuntu-latest, py311)
GitHub Check: tests (5, ubuntu-latest, py313)
GitHub Check: tests (5, windows-2022, py313)
GitHub Check: tests (7, ubuntu-latest, py312)
GitHub Check: tests (6, ubuntu-latest, py313)
GitHub Check: tests (6, ubuntu-latest, py311)
GitHub Check: tests (6, ubuntu-latest, py312)
GitHub Check: tests (3, ubuntu-latest, py312)
GitHub Check: tests (4, windows-2022, py313)
GitHub Check: tests (3, macos-latest, py313)
GitHub Check: tests (4, ubuntu-latest, py312)
GitHub Check: tests (4, macos-latest, py313)
GitHub Check: tests (3, ubuntu-latest, py313)
GitHub Check: tests (4, ubuntu-latest, py313)
GitHub Check: tests (2, ubuntu-latest, py311)
GitHub Check: tests (5, ubuntu-latest, py312)
GitHub Check: tests (5, ubuntu-latest, py311)
GitHub Check: tests (1, ubuntu-latest, py312)
GitHub Check: tests (2, windows-2022, py313)
GitHub Check: tests (4, ubuntu-latest, py311)
GitHub Check: tests (3, ubuntu-latest, py311)
GitHub Check: tests (3, windows-2022, py313)
GitHub Check: tests (2, ubuntu-latest, py313)
GitHub Check: tests (2, macos-latest, py313)
GitHub Check: tests (2, ubuntu-latest, py312)
GitHub Check: tests (1, macos-latest, py313)
GitHub Check: tests (1, ubuntu-latest, py311)
GitHub Check: tests (1, windows-2022, py313)
GitHub Check: tests (1, ubuntu-latest, py313)
GitHub Check: apidocs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ChristofferCOASD · 2025-09-23T13:45:26Z

One idea I had was also to ignore localrules from the rate limit, since in our case the rate limit is mostly intended to limit the rate at which jobs are submitted to SLURM. Would this make sense in general?

yes, absolutely!

Great, here is my idea I will work on implementing:

I would like to have remote jobs have the "cores" resource be 0 such that the cores resource limit can be kept in place. This will allow the user still to limit resource usage of local jobs. It could be though of as "local cores".
Remote jobs should instead of being restricted by job_count be restricted by a new resource limit like job_slots which will represent the number of available remote execution slots either limited by the rate limiter or by the user specified max_jobs.
All jobs will still keep the ´job_count=1and thejob_count=sys.maxsize` will still stay to keep a hard limit on number of jobs.
The rate limiter will then be changed away from determining job_count to instead determine job_slots

From my understanding, this will result in the desired behaviour:

When executing on a cluster, only remote jobs will be rate limited
The when specifying jobs: 100 this limits the number of remote jobs, not the local ones
Local jobs will respect the "cores" limit even when using rate limiter and on the cluster

Edits/Notes:

I considered if _nodes could be used instead of introducing a new _jobs_slot, but they are not interchangeable I don't think since you can have jobs requiring multiple nodes but only acting as a single submission in terms of rate limiting.
There could maybe be found an improvement in schedule_reevaluation since local rules would no longer count in the rate limit, there might be several jobs ready even before there is space in the rate limiter.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (3)

src/snakemake/scheduling/job_scheduler.py (2)
585-591: Assertion message nit: say “job slots” instead of “job count”.

Update the message to match the new resource name and avoid confusion.
-            ), f"Job count is {self.resources['_job_slot']}, but should be {sys.maxsize}"
+            ), f"Job slots is {self.resources['_job_slot']}, but should be {sys.maxsize}"
597-601: Register only remote jobs: good; minor cleanups.

Typo in comment (“remtoe”).

Avoid building a list just to take its length.
-            # update job rate limiter with remtoe jobs only
-            self.job_rate_limiter.register_jobs(len([job for job in selected if not job.is_local]))
+            # update job rate limiter with remote jobs only
+            n_remote = sum(1 for job in selected if not job.is_local)
+            if n_remote:
+                self.job_rate_limiter.register_jobs(n_remote)
src/snakemake/jobs.py (1)

120-121: Drop unused _job_count field – res_dict["_job_count"] isn’t consumed anywhere in the repo; remove the assignment at src/snakemake/jobs.py:120 and plan full removal (or deprecation warning and removal in the next major release).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ff9c68f and 13fb2ae.

📒 Files selected for processing (2)

src/snakemake/jobs.py (2 hunks)
src/snakemake/scheduling/job_scheduler.py (4 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

src/snakemake/scheduling/job_scheduler.py
src/snakemake/jobs.py

🧬 Code graph analysis (1)

src/snakemake/scheduling/job_scheduler.py (2)

src/snakemake/jobs.py (6)

name (1285-1286)

name (1671-1675)

resources (492-513)

resources (1520-1534)

is_local (1035-1043)

is_local (1682-1683)

src/snakemake/workflow.py (3)

name (2326-2331)

resources (2242-2247)

is_local (704-708)

🪛 GitHub Check: Check for spelling errors

src/snakemake/jobs.py

[failure] 108-108:
ocupy ==> occupy

🪛 GitHub Actions: Codespell

src/snakemake/jobs.py

[error] 108-108: codespell: ocupy ==> occupy (typo found by codespell)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (45)

GitHub Check: tests (10, ubuntu-latest, py313)
GitHub Check: tests (10, ubuntu-latest, py312)
GitHub Check: tests (10, windows-2022, py313)
GitHub Check: tests (9, windows-2022, py313)
GitHub Check: tests (9, ubuntu-latest, py313)
GitHub Check: tests (10, ubuntu-latest, py311)
GitHub Check: tests (8, ubuntu-latest, py313)
GitHub Check: tests (9, ubuntu-latest, py312)
GitHub Check: tests (7, ubuntu-latest, py313)
GitHub Check: tests (9, ubuntu-latest, py311)
GitHub Check: tests (7, windows-2022, py313)
GitHub Check: tests (8, windows-2022, py313)
GitHub Check: tests (6, windows-2022, py313)
GitHub Check: tests (6, ubuntu-latest, py312)
GitHub Check: tests (8, ubuntu-latest, py312)
GitHub Check: tests (8, ubuntu-latest, py311)
GitHub Check: tests (6, ubuntu-latest, py313)
GitHub Check: tests (7, ubuntu-latest, py311)
GitHub Check: tests (7, ubuntu-latest, py312)
GitHub Check: tests (6, ubuntu-latest, py311)
GitHub Check: tests (6, macos-latest, py313)
GitHub Check: tests (5, ubuntu-latest, py313)
GitHub Check: tests (5, windows-2022, py313)
GitHub Check: tests (3, windows-2022, py313)
GitHub Check: tests (4, macos-latest, py313)
GitHub Check: tests (4, windows-2022, py313)
GitHub Check: tests (4, ubuntu-latest, py312)
GitHub Check: tests (5, ubuntu-latest, py311)
GitHub Check: tests (4, ubuntu-latest, py311)
GitHub Check: tests (5, ubuntu-latest, py312)
GitHub Check: tests (4, ubuntu-latest, py313)
GitHub Check: tests (3, ubuntu-latest, py313)
GitHub Check: tests (3, macos-latest, py313)
GitHub Check: tests (2, ubuntu-latest, py313)
GitHub Check: tests (2, macos-latest, py313)
GitHub Check: tests (3, ubuntu-latest, py312)
GitHub Check: tests (2, ubuntu-latest, py312)
GitHub Check: tests (1, ubuntu-latest, py311)
GitHub Check: tests (3, ubuntu-latest, py311)
GitHub Check: tests (1, windows-2022, py313)
GitHub Check: tests (1, macos-latest, py313)
GitHub Check: tests (2, ubuntu-latest, py311)
GitHub Check: tests (1, ubuntu-latest, py312)
GitHub Check: tests (1, ubuntu-latest, py313)
GitHub Check: apidocs

🔇 Additional comments (3)

src/snakemake/scheduling/job_scheduler.py (2)

501-505: Do not free _job_slot: LGTM.

Excluding _job_slot from free logic is correct since slots are governed by the rate limiter’s timespan, not by job completion.

604-607: Skip _job_slot in resource deduction: LGTM.

This preserves the rate limiter as the single source of truth for slot capacity within a timespan.

src/snakemake/jobs.py (1)

119-119: Non-local jobs occupy one job slot: LGTM.

This aligns job-level accounting with the scheduler’s _job_slot capacity.

src/snakemake/jobs.py

src/snakemake/scheduling/job_scheduler.py

coderabbitai · 2025-09-29T10:44:55Z

src/snakemake/scheduling/job_scheduler.py

+                else:
+                    # Finished local job could mean free execution slot
+                    if job.is_local:
+                        self._open_jobs.release()



⚠️ Potential issue | 🔴 Critical

Critical: remote job completions no longer wake the scheduler when not rate-limited.

With this change, a normal (non–rate-limited) run won’t release the semaphore on remote finishes, risking the main loop blocking indefinitely after the first scheduling round.

Apply:

- else: - # Finished local job could mean free execution slot - if job.is_local: - self._open_jobs.release() + else: + # Without a rate limiter, any completion should wake the scheduler. + # With a rate limiter, only local completions should wake it; remote + # completions are governed by the timespan timer to avoid log spam. + if self.job_rate_limiter is None or job.is_local: + self._open_jobs.release()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

else:

# Finished local job could mean free execution slot

if job.is_local:

self._open_jobs.release()

else:

# Without a rate limiter, any completion should wake the scheduler.

# With a rate limiter, only local completions should wake it; remote

# completions are governed by the timespan timer to avoid log spam.

if self.job_rate_limiter is None or job.is_local:

self._open_jobs.release()

🤖 Prompt for AI Agents

In src/snakemake/scheduling/job_scheduler.py around lines 517-521, the code only releases self._open_jobs for local job completions which causes remote completions to not wake the scheduler when the run is not rate-limited; change the condition so that the semaphore is released for any finished job when the scheduler is not currently rate-limited (e.g., replace the current if job.is_local: release() with a check like if job.is_local or not self.<rate_limited_flag_or_method> : self._open_jobs.release()), ensuring you use the correct existing rate-limit attribute/method on the scheduler.

ChristofferCOASD · 2025-10-04T11:57:53Z

For the sake of completeness i made the following to enable testing things related to remote execusion:

Snakefile:

rule all:
    input:
        "output.txt"

rule LOCAL:
    localrule: True
    output:
        "LOCAL/{i}.txt"
    shell:
        """
        echo "==== LOCAL ====="
        touch {output}
        """

rule REMOTE:
    output:
        "REMOTE/{i}.txt"
    shell:
        """
        sleep 10
        touch {output}
        """

rule C:
    input:
        expand("LOCAL/{i}.txt", i=range(1, 4)) +
        expand("REMOTE/{i}.txt", i=range(1, 4))
    output:
        "output.txt"
    shell:
        """
        cat {input} > {output}
        """

yaml profile:

executor: cluster-generic
jobs: 4
rerun-incomplete: True
default_resources: []
quiet: ["rules"]
max-jobs-per-timespan: "1/1s"
cluster-generic-submit-cmd: "ts -L {rule}-{jobid}"
cluster-generic-status-cmd: "./ts_status.sh"
cluster-generic-cancel-cmd: "ts -r {jobid}"

taskspooler status wrapper:

#!/usr/bin/env bash
jobid="$1"
state=$(ts -s "$jobid" 2>/dev/null || echo "unknown")

case "$state" in
    running|queued)
        echo "running"
        ;;
    finished)
        echo "success"
        ;;
    error|failed)
        echo "failed"
        ;;
    *)
        # if the job no longer exists, assume finished successfully
        echo "success"
        ;;
esac

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

src/snakemake/scheduling/job_scheduler.py (1)
517-518: Critical: unconditional semaphore release defeats rate limiting.

The unconditional release() causes the scheduler to wake on every remote job completion, even when rate-limited. This produces the excessive "Job rate limit reached..." log spam described in the PR objectives. When rate-limited, only local job completions should wake the scheduler—remote completions are governed by the timer at line 373.

Apply this diff to condition the wake-up:
             else:
-                self._open_jobs.release()
+                # Without a rate limiter, any completion should wake the scheduler.
+                # With a rate limiter, only local completions should wake it; remote
+                # completions are governed by the timespan timer to avoid log spam.
+                if self.job_rate_limiter is None or job.is_local:
+                    self._open_jobs.release()

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 99fae19 and bfa2971.

📒 Files selected for processing (1)

src/snakemake/scheduling/job_scheduler.py (3 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

src/snakemake/scheduling/job_scheduler.py

🧬 Code graph analysis (1)

src/snakemake/scheduling/job_scheduler.py (1)

src/snakemake/jobs.py (2)

is_local (1035-1043)

is_local (1682-1683)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (46)

GitHub Check: tests (10, ubuntu-latest, py313)
GitHub Check: tests (10, ubuntu-latest, py311)
GitHub Check: tests (10, ubuntu-latest, py312)
GitHub Check: tests (7, ubuntu-latest, py311)
GitHub Check: tests (10, windows-2022, py313)
GitHub Check: tests (9, windows-2022, py313)
GitHub Check: tests (5, ubuntu-latest, py311)
GitHub Check: tests (9, ubuntu-latest, py313)
GitHub Check: tests (8, ubuntu-latest, py311)
GitHub Check: tests (9, ubuntu-latest, py312)
GitHub Check: tests (8, macos-latest, py313)
GitHub Check: tests (5, macos-latest, py313)
GitHub Check: tests (8, ubuntu-latest, py312)
GitHub Check: tests (8, windows-2022, py313)
GitHub Check: tests (7, ubuntu-latest, py313)
GitHub Check: tests (8, ubuntu-latest, py313)
GitHub Check: tests (7, ubuntu-latest, py312)
GitHub Check: tests (6, ubuntu-latest, py312)
GitHub Check: tests (5, windows-2022, py313)
GitHub Check: tests (6, ubuntu-latest, py313)
GitHub Check: tests (6, windows-2022, py313)
GitHub Check: tests (4, macos-latest, py313)
GitHub Check: tests (9, ubuntu-latest, py311)
GitHub Check: tests (7, windows-2022, py313)
GitHub Check: tests (6, macos-latest, py313)
GitHub Check: tests (2, windows-2022, py313)
GitHub Check: tests (5, ubuntu-latest, py312)
GitHub Check: tests (4, windows-2022, py313)
GitHub Check: tests (6, ubuntu-latest, py311)
GitHub Check: tests (5, ubuntu-latest, py313)
GitHub Check: tests (3, ubuntu-latest, py311)
GitHub Check: tests (4, ubuntu-latest, py313)
GitHub Check: tests (2, ubuntu-latest, py313)
GitHub Check: tests (3, windows-2022, py313)
GitHub Check: tests (4, ubuntu-latest, py311)
GitHub Check: tests (4, ubuntu-latest, py312)
GitHub Check: tests (3, ubuntu-latest, py312)
GitHub Check: tests (3, ubuntu-latest, py313)
GitHub Check: tests (1, macos-latest, py313)
GitHub Check: tests (1, ubuntu-latest, py313)
GitHub Check: tests (2, ubuntu-latest, py311)
GitHub Check: tests (2, ubuntu-latest, py312)
GitHub Check: tests (1, ubuntu-latest, py311)
GitHub Check: tests (1, windows-2022, py313)
GitHub Check: tests (1, ubuntu-latest, py312)
GitHub Check: formatting

🔇 Additional comments (6)

src/snakemake/scheduling/job_scheduler.py (6)

103-103: Good: slot-based resource model introduced.

The rename from _job_count to _job_slot aligns with the PR's objective to separate local and remote job tracking. Based on past reviews, ensure that _job_count is fully replaced throughout the codebase, particularly in src/snakemake/jobs.py around line 120.

503-504: LGTM: correct resource freeing behavior.

Excluding _job_slot from resource freeing is correct because slot availability is managed by the JobRateLimiter timer, not by individual job completions.

585-588: LGTM: assertion updated correctly.

The assertion now correctly verifies that _job_slot remains unrestricted when not rate-limited, consistent with the slot-based model.

595-595: LGTM: slot resource updated before scheduling.

Correctly sets _job_slot to available capacity before job selection when rate-limited.

598-598: LGTM: local rules correctly excluded from rate limiting.

Filtering to register only remote jobs with the rate limiter implements the PR objective and discussion agreement to exclude localrules from rate limiting (which is primarily intended for SLURM/cluster submissions).

603-607: LGTM: slot resource excluded from job-level updates.

Correctly excludes _job_slot from per-job resource accounting, as it's managed by JobRateLimiter.register_jobs() at line 598.

src/snakemake/scheduling/job_scheduler.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

src/snakemake/scheduling/job_scheduler.py

johanneskoester · 2025-11-27T15:33:17Z

Is this ready for a merge?

ChristofferCOASD · 2025-11-28T06:58:36Z

@johanneskoester let me do another test today on our cluster just to make sure.

ChristofferCOASD · 2025-12-02T14:26:30Z

From my testing i still need to do a bit of work here. The local rules are properly excluded from the rate limit, but still there is an issue with how often the rate limitter is being "hit" in such a way it prints "Job rate limit reached, waiting for free slots.

From what i can tell #3421 also might have som overlap with any changes i might make in what resources are assigned to local rules to allow propper job selection. I will wait until it is merged to look at this any further.

ChristofferCOASD requested a review from johanneskoester as a code owner September 18, 2025 08:35

ChristofferCOASD changed the title ~~Do not release lock when job finished~~ fix: Do not release lock when job finished Sep 18, 2025

Do not release when finished

6d5267c

ChristofferCOASD force-pushed the excesive-job-rate-limit branch from cc8023f to 6d5267c Compare September 18, 2025 08:46

ChristofferCOASD added 2 commits September 19, 2025 13:31

Merge branch 'main' into excesive-job-rate-limit

a972531

Merge branch 'main' into excesive-job-rate-limit

79f174a

Merge branch 'main' into excesive-job-rate-limit

ff9c68f

ChristofferCOASD added 2 commits September 24, 2025 09:30

Initial scheduler job_slot rename

a3cfeba

_cores still relevant for local jobs now

13fb2ae

coderabbitai bot reviewed Sep 29, 2025

View reviewed changes

ChristofferCOASD and others added 3 commits October 3, 2025 19:42

Merge branch 'snakemake:main' into excesive-job-rate-limit

e59ce52

Spelling

d74db79

No cores used for remote jobs

99fae19

when job done release lock

bfa2971

coderabbitai bot reviewed Oct 4, 2025

View reviewed changes

src/snakemake/scheduling/job_scheduler.py Outdated Show resolved Hide resolved

ChristofferRS and others added 2 commits October 4, 2025 14:46

formatting

7500009

Apply suggestion from @coderabbitai[bot]

dfb6774

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

johanneskoester reviewed Nov 27, 2025

View reviewed changes

src/snakemake/scheduling/job_scheduler.py Outdated Show resolved Hide resolved

Apply suggestion from @johanneskoester

c3cbc21

johanneskoester approved these changes Nov 27, 2025

View reviewed changes

no remote slots could still mean local slots

a2023ca

fix: Do not release lock when job finished #3744

Are you sure you want to change the base?

fix: Do not release lock when job finished #3744

Conversation

ChristofferCOASD commented Sep 18, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

ChristofferCOASD commented Sep 18, 2025

Uh oh!

johanneskoester commented Sep 23, 2025

Uh oh!

coderabbitai bot commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

ChristofferCOASD commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

ChristofferCOASD commented Oct 4, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

johanneskoester commented Nov 27, 2025

Uh oh!

ChristofferCOASD commented Nov 28, 2025

Uh oh!

ChristofferCOASD commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ChristofferCOASD commented Sep 18, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 23, 2025 •

edited

Loading

ChristofferCOASD commented Sep 23, 2025 •

edited

Loading

ChristofferCOASD commented Dec 2, 2025 •

edited

Loading