feat: add --sleep-on-failure to pause a failed spec before teardown#1676
Conversation
|
hey there, can you ask opus to try again with some different guidance? I'd prefer a different implementation that hooks into the suite's existing codepaths for handling failures. If a failure is identified the suite should - at that moment - pause. It should emit the full failure message, and then emit the helper text telling you to interact asking you to interact with the code. Injecting a Thanks! |
Sure, let me refactor this with that.
At the end I was kind of productifying how I hack my way into this.
Happy to contribute, I have being using ginkgo for a long long time now. |
When a spec fails against a live environment, its teardown (AfterEach/JustAfterEach/DeferCleanup) tears down the very state needed to debug the failure. --sleep-on-failure=<duration> pauses the suite the moment a failure is identified - before any teardown runs - so the live system can be inspected. The pause hooks directly into the suite's existing failure-handling path (runNode): when a node finishes with a failure, Ginkgo emits the failure and then, if the flag is set, pauses right there. Only failures in setup and subject nodes (It, Before*, BeforeSuite...) pause; failures in teardown/cleanup or reporting nodes do not, since the system is already being torn down at that point. The pause is interruptible: pressing ^C (or any interrupt) ends the pause early and the suite proceeds to run cleanup as usual rather than skipping it. This is a debugging aid for interactive, serial runs. Because Ginkgo's parallelism is multi-process, a single failing spec cannot meaningfully freeze the whole system, so combining --sleep-on-failure with -p/--procs is rejected with a configuration error. Adds unit (internal_integration) and CLI (integration) tests and docs. Assisted-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Enrique Llorente <ellorent@redhat.com>
|
LGTM - thanks! |
|
i'll cut a release later today/tomorrow |
Closes #1671
What
Adds a
--sleep-on-failure=<duration>flag (default0= off). The moment a spec fails, Ginkgo pauses for the configured duration before any teardown (AfterEach/JustAfterEach/DeferCleanup) runs, so you can inspect the live system before its state is torn down.Why
When a spec fails against a real environment (cluster, DB, service), teardown destroys the very state you need to debug the failure. The existing building block (
time.Sleepin a top-levelJustAfterEach) works but isn't runtime-configurable. This packages that need as a first-class, well-behaved flag.How
Per the review feedback, this hooks directly into the suite's existing failure-handling path rather than injecting any nodes:
runNode, when a node finishes with a failure, Ginkgo emits the failure as it already does and then — if--sleep-on-failureis set — pauses right there, before returning to the spec loop (and therefore before any teardown runs). It emits a progress report with the helper text telling you the suite is paused and what to do.It,Before*,BeforeSuite...) pause. Failures in teardown/cleanup or reporting nodes are skipped, since the system is already being torn down at that point. This is implemented by skippingNodeTypesAllowedDuringCleanupInterrupt | NodeTypesAllowedDuringReportInterrupt.selects on a timer andinterruptHandler.Status().Channel, so the pause is interruptible: pressing^Cends it early and the suite proceeds to run cleanup as usual rather than skipping it.This addresses the three points raised in #1671:
^Cto proceed to cleanup.^Cproceeds to cleanup — the pause ends on interrupt and teardown then runs (covered by tests).--sleep-on-failurewith-p/--procsis rejected with a configuration error. This is intended as a serial debugging aid; run the failing spec serially (e.g. with--focus).Also updates godoc and
docs/index.md(under the interrupt/grace-period material).Test plan
go vet ./...passesinternal/internal_integrationtests: pause happens at the point of failure before teardown, setup-node (BeforeEach) failures pause, teardown-node failures do not pause,^Cends the pause and cleanup still runs, passing specs are never paused, and the flag is a no-op when0integration(CLI subprocess) tests: serial run waits ~the configured duration with orderingfailing body → pause notice → teardown; parallel run exits with the serial-only error and does not hanggo test ./...)