Skip to content

preStop sleep continues after PID 1 exits #134338

@wedge-jarrad

Description

@wedge-jarrad

What happened?

I've configured the podSpec of a Job to include a preStop sleep lifecycle hook. If the Pod is asked to terminate before it completes, the preStop sleep begins and runs for its full duration - even if the Pod completes its processing and PID 1 exits before the preStop sleep is complete. The Pod remains in "Terminating" and the Job remains in "Running" state until the preStop sleep finishes, which delays the completion of the Job.

What did you expect to happen?

I expect the preStop sleep to end when PID 1 in the Pod exits. The Job should be marked as complete at that time and the Pod should finish terminating. This is the behavior when a preStop exec is used to run the sleep command in the container's shell.

How can we reproduce it (as minimally and precisely as possible)?

Define a Job with a preStop sleep lifecycle hook:

apiVersion: batch/v1
kind: Job
metadata:
  name: sleep-job
spec:
  podReplacementPolicy: Failed
  template:
    spec:
      containers:
        - command:
          - /bin/sleep
          - "60"
          image: busybox:latest
          name: sleep-container
          lifecycle:
            preStop:
              sleep:
                seconds: 120
      terminationGracePeriodSeconds: 120
      restartPolicy: Never

Apply the manifest and then delete the resulting Pod. Observe that the Pod is now in "Terminating" state. The preStop sleep will have begun.

wedge@pop-os:~/job-sleep$ kubectl apply -f sleep-job.yaml 
job.batch/sleep-job created
wedge@pop-os:~/job-sleep$ kubectl get jobs,pods
NAME                  STATUS    COMPLETIONS   DURATION   AGE
job.batch/sleep-job   Running   0/1           6s         6s

NAME                  READY   STATUS    RESTARTS   AGE
pod/sleep-job-qh2qk   1/1     Running   0          6s
wedge@pop-os:~/job-sleep$ kubectl delete pods sleep-job-qh2qk
pod "sleep-job-qh2qk" deleted
^Cwedge@pop-os:~/job-sleep$ kubectl get jobs,pods
NAME                  STATUS    COMPLETIONS   DURATION   AGE
job.batch/sleep-job   Running   0/1           24s        24s

NAME                  READY   STATUS        RESTARTS   AGE
pod/sleep-job-qh2qk   1/1     Terminating   0          24s

After 60 seconds the Pod should complete but instead it remains in Terminating state and the Job shows 0 completions. Only after 2 minutes has elapsed (from the time the delete command was issued) does the Pod finish terminating and the Job shows a completion.

wedge@pop-os:~/job-sleep$ kubectl get jobs,pods
NAME                  STATUS    COMPLETIONS   DURATION   AGE
job.batch/sleep-job   Running   0/1           59s        59s

NAME                  READY   STATUS        RESTARTS   AGE
pod/sleep-job-qh2qk   1/1     Terminating   0          59s
wedge@pop-os:~/job-sleep$ kubectl get jobs,pods
NAME                  STATUS    COMPLETIONS   DURATION   AGE
job.batch/sleep-job   Running   0/1           69s        69s

NAME                  READY   STATUS        RESTARTS   AGE
pod/sleep-job-qh2qk   1/1     Terminating   0          69s
wedge@pop-os:~/job-sleep$ kubectl get jobs,pods
NAME                  STATUS    COMPLETIONS   DURATION   AGE
job.batch/sleep-job   Running   0/1           2m19s      2m19s

NAME                  READY   STATUS        RESTARTS   AGE
pod/sleep-job-qh2qk   1/1     Terminating   0          2m19s
wedge@pop-os:~/job-sleep$ kubectl get jobs,pods
NAME                  STATUS     COMPLETIONS   DURATION   AGE
job.batch/sleep-job   Complete   1/1           2m21s      2m22s

Anything else we need to know?

This kind of configuration is useful to ensure that batch jobs run to completion even if they are asked to terminate for whatever reason (maintenance, preemption, autoscaler consolidation, etc.). The fact that the Job takes longer to complete when the Pod is asked to terminate is problematic when timely completion of the Job is desired. Imagine that the Job takes an hour instead of a minute and so the prestop sleep is set to 3600. Then imagine that the Pod is asked to terminate at minute 59. The duration of the Job will now be 119 minutes instead of 60.

Using a preStop exec to run /bin/sleep produces the expected behavior where the Pod finishes terminating and the Job shows as complete when PID 1 in the container exits. Of course this won't work in a shell-less container which AFAIK is the reason the preStop sleep was introduced.

apiVersion: batch/v1
kind: Job
metadata:
  name: exec-job
spec:
  podReplacementPolicy: Failed
  template:
    spec:
      containers:
        - command:
          - /bin/sleep
          - "60"
          image: busybox:latest
          name: exec-container
          lifecycle:
            preStop:
              exec:
                command: ["/bin/sleep", "120"]
      terminationGracePeriodSeconds: 120
      restartPolicy: Never
wedge@pop-os:~/job-sleep$ kubectl apply -f exec-job.yaml 
job.batch/exec-job created
wedge@pop-os:~/job-sleep$ kubectl get jobs,pods
NAME                 STATUS    COMPLETIONS   DURATION   AGE
job.batch/exec-job   Running   0/1           4s         4s

NAME                 READY   STATUS    RESTARTS   AGE
pod/exec-job-k62r4   1/1     Running   0          4s
wedge@pop-os:~/job-sleep$ kubectl delete pods exec-job-k62r4
pod "exec-job-k62r4" deleted
^Cwedge@pop-os:~/job-sleep$ kubectl get jobs,pods
NAME                 STATUS    COMPLETIONS   DURATION   AGE
job.batch/exec-job   Running   0/1           23s        23s

NAME                 READY   STATUS        RESTARTS   AGE
pod/exec-job-k62r4   1/1     Terminating   0          23s
wedge@pop-os:~/job-sleep$ kubectl get jobs,pods
NAME                 STATUS    COMPLETIONS   DURATION   AGE
job.batch/exec-job   Running   0/1           55s        55s

NAME                 READY   STATUS        RESTARTS   AGE
pod/exec-job-k62r4   1/1     Terminating   0          55s
wedge@pop-os:~/job-sleep$ kubectl get jobs,pods
NAME                 STATUS    COMPLETIONS   DURATION   AGE
job.batch/exec-job   Running   0/1           61s        61s

NAME                 READY   STATUS        RESTARTS   AGE
pod/exec-job-k62r4   1/1     Terminating   0          61s
wedge@pop-os:~/job-sleep$ kubectl get jobs,pods
NAME                 STATUS     COMPLETIONS   DURATION   AGE
job.batch/exec-job   Complete   1/1           63s        64s

Kubernetes version

Reproduced on KIND, but first observed on EKS 1.32.

$ kubectl version
Client Version: v1.33.2
Kustomize Version: v5.6.0
Server Version: v1.33.1

Cloud provider

AWS

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.priority/important-soonMust be staffed and worked on either currently, or very soon, ideally in time for the next release.sig/nodeCategorizes an issue or PR as relevant to SIG Node.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

    Type

    No type

    Projects

    Status

    Triaged

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions