Skip to content

feat: add --sleep-on-failure to pause a failed spec before teardown#1676

Merged
onsi merged 1 commit into
onsi:masterfrom
qinqon:sleep
Jun 18, 2026
Merged

feat: add --sleep-on-failure to pause a failed spec before teardown#1676
onsi merged 1 commit into
onsi:masterfrom
qinqon:sleep

Conversation

@qinqon

@qinqon qinqon commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Closes #1671

What

Adds a --sleep-on-failure=<duration> flag (default 0 = off). The moment a spec fails, Ginkgo pauses for the configured duration before any teardown (AfterEach/JustAfterEach/DeferCleanup) runs, so you can inspect the live system before its state is torn down.

Why

When a spec fails against a real environment (cluster, DB, service), teardown destroys the very state you need to debug the failure. The existing building block (time.Sleep in a top-level JustAfterEach) works but isn't runtime-configurable. This packages that need as a first-class, well-behaved flag.

How

Per the review feedback, this hooks directly into the suite's existing failure-handling path rather than injecting any nodes:

  • In runNode, when a node finishes with a failure, Ginkgo emits the failure as it already does and then — if --sleep-on-failure is set — pauses right there, before returning to the spec loop (and therefore before any teardown runs). It emits a progress report with the helper text telling you the suite is paused and what to do.
  • Only failures in setup and subject nodes (It, Before*, BeforeSuite...) pause. Failures in teardown/cleanup or reporting nodes are skipped, since the system is already being torn down at that point. This is implemented by skipping NodeTypesAllowedDuringCleanupInterrupt | NodeTypesAllowedDuringReportInterrupt.
  • The pause selects on a timer and interruptHandler.Status().Channel, so the pause is interruptible: pressing ^C ends it early and the suite proceeds to run cleanup as usual rather than skipping it.

This addresses the three points raised in #1671:

  1. Clear messaging — the failure is emitted, followed by a progress report telling you the suite is paused and to press ^C to proceed to cleanup.
  2. ^C proceeds to cleanup — the pause ends on interrupt and teardown then runs (covered by tests).
  3. Parallel — since Ginkgo's parallelism is multi-process, a single failing spec can't meaningfully freeze the whole system. Rather than pretend otherwise, combining --sleep-on-failure with -p/--procs is rejected with a configuration error. This is intended as a serial debugging aid; run the failing spec serially (e.g. with --focus).

Also updates godoc and docs/index.md (under the interrupt/grace-period material).

Test plan

  • go vet ./... passes
  • New internal/internal_integration tests: pause happens at the point of failure before teardown, setup-node (BeforeEach) failures pause, teardown-node failures do not pause, ^C ends the pause and cleanup still runs, passing specs are never paused, and the flag is a no-op when 0
  • New integration (CLI subprocess) tests: serial run waits ~the configured duration with ordering failing body → pause notice → teardown; parallel run exits with the serial-only error and does not hang
  • Full suite passes (go test ./...)

@onsi

onsi commented Jun 17, 2026

Copy link
Copy Markdown
Owner

hey there, can you ask opus to try again with some different guidance? I'd prefer a different implementation that hooks into the suite's existing codepaths for handling failures. If a failure is identified the suite should - at that moment - pause. It should emit the full failure message, and then emit the helper text telling you to interact asking you to interact with the code.

Injecting a JustAfterEach is making some assumptions about test-ordering that may not hold. It's also just adding more complexity than need be, I think the aforementioned approach would be better.

Thanks!

@qinqon

qinqon commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

hey there, can you ask opus to try again with some different guidance? I'd prefer a different implementation that hooks into the suite's existing codepaths for handling failures. If a failure is identified the suite should - at that moment - pause. It should emit the full failure message, and then emit the helper text telling you to interact asking you to interact with the code.

Sure, let me refactor this with that.

Injecting a JustAfterEach is making some assumptions about test-ordering that may not hold. It's also just adding more complexity than need be, I think the aforementioned approach would be better.

At the end I was kind of productifying how I hack my way into this.

Thanks!

Happy to contribute, I have being using ginkgo for a long long time now.

When a spec fails against a live environment, its teardown
(AfterEach/JustAfterEach/DeferCleanup) tears down the very state needed
to debug the failure. --sleep-on-failure=<duration> pauses the suite the
moment a failure is identified - before any teardown runs - so the live
system can be inspected.

The pause hooks directly into the suite's existing failure-handling path
(runNode): when a node finishes with a failure, Ginkgo emits the failure
and then, if the flag is set, pauses right there. Only failures in setup
and subject nodes (It, Before*, BeforeSuite...) pause; failures in
teardown/cleanup or reporting nodes do not, since the system is already
being torn down at that point.

The pause is interruptible: pressing ^C (or any interrupt) ends the pause
early and the suite proceeds to run cleanup as usual rather than skipping
it.

This is a debugging aid for interactive, serial runs. Because Ginkgo's
parallelism is multi-process, a single failing spec cannot meaningfully
freeze the whole system, so combining --sleep-on-failure with -p/--procs
is rejected with a configuration error.

Adds unit (internal_integration) and CLI (integration) tests and docs.

Assisted-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Enrique Llorente <ellorent@redhat.com>
@onsi onsi merged commit 76a2074 into onsi:master Jun 18, 2026
6 checks passed
@onsi

onsi commented Jun 18, 2026

Copy link
Copy Markdown
Owner

LGTM - thanks!

@onsi

onsi commented Jun 18, 2026

Copy link
Copy Markdown
Owner

i'll cut a release later today/tomorrow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature request: --sleep-on-failure to pause a failed spec before teardown

2 participants