Skip to content

runner-cleanup: event-driven _diag/pages wipe via systemd hooks#925

Merged
igorpecovnik merged 2 commits into
mainfrom
runner-clean-pages-hooks
May 18, 2026
Merged

runner-cleanup: event-driven _diag/pages wipe via systemd hooks#925
igorpecovnik merged 2 commits into
mainfrom
runner-clean-pages-hooks

Conversation

@igorpecovnik

Copy link
Copy Markdown
Member

The bug, again

Even with hourly `runner-cleanup` (PRs #919/#920), `actions-runner-01` keeps hitting:

```
Error: The file '/home/actions-runner-01/_diag/pages/.log' already exists.
```

The hourly timer eventually clears it, but a runner that restarts within that hour still collides on a stale page UUID. We need event-driven cleanup.

What this adds

Two tiny pieces in `tools/modules/system/runner-cleanup/`:

File Role
`runner-clean-pages` One-liner: wipes `$HOME/_diag/pages/` for whatever runner user systemd invoked it as. Logs to journald.
`install-runner-hooks` One-shot installer: finds every `actions.runner.*.service` and drops in a `10-clean-pages.conf` override adding `ExecStartPre=-` + `ExecStopPost=-`.

The `-` prefix means cleanup failures never block the runner unit itself.

Why systemd hooks, not a daemon

`ExecStopPost` fires on every stop reason — graceful exit, SIGTERM, SIGKILL, OOM, machine reboot. `ExecStartPre` covers the case where systemd was bypassed (hard power-cut). Together they form an event-driven cleanup chain with zero long-lived daemon process — systemd's own supervision IS the daemon.

The hourly `runner-cleanup` timer continues to run as the catch-all fallback for whatever the hooks miss.

Deployment

On each runner host, after pulling the updated configng:

```bash
sudo install -m 0755 tools/modules/system/runner-cleanup/runner-clean-pages /usr/local/bin/
sudo bash tools/modules/system/runner-cleanup/install-runner-hooks
```

(or just copy both files to /usr/local/bin and run the installer)

Test plan

  • After install, `systemctl cat actions.runner.` shows the new `ExecStartPre` / `ExecStopPost` lines from the drop-in
  • `systemctl stop actions.runner.` then `ls /home/actions-runner-NN/_diag/pages/` is empty
  • `systemctl start actions.runner.` runs ExecStartPre cleanly (journalctl shows `runner-clean-pages: wiped`)
  • `actions-runner-01` no longer hits "Initialize containers" UUID collisions on quick-restart workloads

The hourly runner-cleanup timer eventually clears stale page files,
but a runner that restarts within that hour still hits:

  The file '/home/actions-runner-NN/_diag/pages/<uuid>.log' already exists.

Add two small pieces that turn this into an event-driven cleanup:

  - runner-clean-pages: tiny helper, wipes $HOME/_diag/pages/ for
    whatever user systemd invoked it as. Logs to journald via
    systemd-cat. Idempotent.

  - install-runner-hooks: one-shot installer. Finds every
    /etc/systemd/system/actions.runner.*.service unit on the host
    and drops a 10-clean-pages.conf override that adds:
        ExecStartPre=-<helper>
        ExecStopPost=-<helper>
    The '-' prefix means cleanup failures don't block the runner.

ExecStopPost fires on every stop reason — clean exit, SIGTERM,
SIGKILL, OOM, machine reboot — so the page file is gone before the
next job lands. ExecStartPre covers the case where systemd was
bypassed (hard power-cut leaving stale pages on disk).

No long-lived daemon: systemd's own service supervision IS the
event hook. Hourly runner-cleanup keeps running as the catch-all
fallback.
@coderabbitai

coderabbitai Bot commented May 18, 2026

Copy link
Copy Markdown
Contributor

Warning

Rate limit exceeded

@igorpecovnik has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 54 minutes and 6 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c3426abd-ce0d-4cd3-9b5e-396d0bfe197c

📥 Commits

Reviewing files that changed from the base of the PR and between 2b625c6 and b67989d.

📒 Files selected for processing (3)
  • tools/modules/system/module_armbian_runners.sh
  • tools/modules/system/runner-cleanup/install-runner-hooks
  • tools/modules/system/runner-cleanup/runner-clean-pages
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch runner-clean-pages-hooks

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added 05 Milestone: Second quarter release size/medium PR with more then 50 and less then 250 lines labels May 18, 2026
Extension of the event-driven cleanup added two commits ago. The
helper + installer were shipped but had to be deployed by hand.
Wire them into the runner-cleanup install path so a single
`armbian-config module_armbian_runners install` now puts everything
on the host in one shot:

  - /usr/local/sbin/runner-cleanup            (existing, hourly)
  - /etc/systemd/system/runner-cleanup.timer  (existing)
  - /usr/local/sbin/runner-clean-pages        (new, event-driven helper)
  - 10-clean-pages.conf drop-ins for every actions.runner.*.service
    (new, written by install-runner-hooks invocation)

The install-runner-hooks invocation runs AFTER daemon-reload so the
new drop-ins take effect on the next runner restart. Existing runner
units that started moments earlier in the same install pass don't
have the hooks active yet; they pick them up on their next restart
(operator or systemd-initiated). Hourly runner-cleanup keeps
covering the gap.

Non-fatal: install-runner-hooks failure prints a warning and the
runner-cleanup timer setup proceeds.
@igorpecovnik igorpecovnik merged commit 9c8b30b into main May 18, 2026
12 checks passed
@igorpecovnik igorpecovnik deleted the runner-clean-pages-hooks branch May 18, 2026 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

05 Milestone: Second quarter release size/medium PR with more then 50 and less then 250 lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant