preStop sleep continues after PID 1 exits

### What happened?

I've configured the podSpec of a Job to include a preStop sleep lifecycle hook. If the Pod is asked to terminate before it completes, the preStop sleep begins and runs for its full duration - even if the Pod completes its processing and PID 1 exits before the preStop sleep is complete. The Pod remains in "Terminating" and the Job remains in "Running" state until the preStop sleep finishes, which delays the completion of the Job.

### What did you expect to happen?

I expect the preStop sleep to end when PID 1 in the Pod exits. The Job should be marked as complete at that time and the Pod should finish terminating. This is the behavior when a preStop exec is used to run the sleep command in the container's shell.

### How can we reproduce it (as minimally and precisely as possible)?

Define a Job with a preStop sleep lifecycle hook:

```yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: sleep-job
spec:
  podReplacementPolicy: Failed
  template:
    spec:
      containers:
        - command:
          - /bin/sleep
          - "60"
          image: busybox:latest
          name: sleep-container
          lifecycle:
            preStop:
              sleep:
                seconds: 120
      terminationGracePeriodSeconds: 120
      restartPolicy: Never
```

Apply the manifest and then delete the resulting Pod. Observe that the Pod is now in "Terminating" state. The preStop sleep will have begun.

```console
wedge@pop-os:~/job-sleep$ kubectl apply -f sleep-job.yaml 
job.batch/sleep-job created
wedge@pop-os:~/job-sleep$ kubectl get jobs,pods
NAME                  STATUS    COMPLETIONS   DURATION   AGE
job.batch/sleep-job   Running   0/1           6s         6s

NAME                  READY   STATUS    RESTARTS   AGE
pod/sleep-job-qh2qk   1/1     Running   0          6s
wedge@pop-os:~/job-sleep$ kubectl delete pods sleep-job-qh2qk
pod "sleep-job-qh2qk" deleted
^Cwedge@pop-os:~/job-sleep$ kubectl get jobs,pods
NAME                  STATUS    COMPLETIONS   DURATION   AGE
job.batch/sleep-job   Running   0/1           24s        24s

NAME                  READY   STATUS        RESTARTS   AGE
pod/sleep-job-qh2qk   1/1     Terminating   0          24s
```

After 60 seconds the Pod should complete but instead it remains in Terminating state and the Job shows 0 completions. Only after 2 minutes has elapsed (from the time the `delete` command was issued) does the Pod finish terminating and the Job shows a completion.

```console
wedge@pop-os:~/job-sleep$ kubectl get jobs,pods
NAME                  STATUS    COMPLETIONS   DURATION   AGE
job.batch/sleep-job   Running   0/1           59s        59s

NAME                  READY   STATUS        RESTARTS   AGE
pod/sleep-job-qh2qk   1/1     Terminating   0          59s
wedge@pop-os:~/job-sleep$ kubectl get jobs,pods
NAME                  STATUS    COMPLETIONS   DURATION   AGE
job.batch/sleep-job   Running   0/1           69s        69s

NAME                  READY   STATUS        RESTARTS   AGE
pod/sleep-job-qh2qk   1/1     Terminating   0          69s
wedge@pop-os:~/job-sleep$ kubectl get jobs,pods
NAME                  STATUS    COMPLETIONS   DURATION   AGE
job.batch/sleep-job   Running   0/1           2m19s      2m19s

NAME                  READY   STATUS        RESTARTS   AGE
pod/sleep-job-qh2qk   1/1     Terminating   0          2m19s
wedge@pop-os:~/job-sleep$ kubectl get jobs,pods
NAME                  STATUS     COMPLETIONS   DURATION   AGE
job.batch/sleep-job   Complete   1/1           2m21s      2m22s
```


### Anything else we need to know?

This kind of configuration is useful to ensure that batch jobs run to completion even if they are asked to terminate for whatever reason (maintenance, preemption, autoscaler consolidation, etc.). The fact that the Job takes longer to complete when the Pod is asked to terminate is problematic when timely completion of the Job is desired. Imagine that the Job takes an hour instead of a minute and so the prestop sleep is set to 3600. Then imagine that the Pod is asked to terminate at minute 59. The duration of the Job will now be 119 minutes instead of 60.

Using a preStop exec to run /bin/sleep produces the expected behavior where the Pod finishes terminating and the Job shows as complete when PID 1 in the container exits. Of course this won't work in a shell-less container which AFAIK is the reason the preStop sleep was introduced.

```yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: exec-job
spec:
  podReplacementPolicy: Failed
  template:
    spec:
      containers:
        - command:
          - /bin/sleep
          - "60"
          image: busybox:latest
          name: exec-container
          lifecycle:
            preStop:
              exec:
                command: ["/bin/sleep", "120"]
      terminationGracePeriodSeconds: 120
      restartPolicy: Never
```

```console
wedge@pop-os:~/job-sleep$ kubectl apply -f exec-job.yaml 
job.batch/exec-job created
wedge@pop-os:~/job-sleep$ kubectl get jobs,pods
NAME                 STATUS    COMPLETIONS   DURATION   AGE
job.batch/exec-job   Running   0/1           4s         4s

NAME                 READY   STATUS    RESTARTS   AGE
pod/exec-job-k62r4   1/1     Running   0          4s
wedge@pop-os:~/job-sleep$ kubectl delete pods exec-job-k62r4
pod "exec-job-k62r4" deleted
^Cwedge@pop-os:~/job-sleep$ kubectl get jobs,pods
NAME                 STATUS    COMPLETIONS   DURATION   AGE
job.batch/exec-job   Running   0/1           23s        23s

NAME                 READY   STATUS        RESTARTS   AGE
pod/exec-job-k62r4   1/1     Terminating   0          23s
wedge@pop-os:~/job-sleep$ kubectl get jobs,pods
NAME                 STATUS    COMPLETIONS   DURATION   AGE
job.batch/exec-job   Running   0/1           55s        55s

NAME                 READY   STATUS        RESTARTS   AGE
pod/exec-job-k62r4   1/1     Terminating   0          55s
wedge@pop-os:~/job-sleep$ kubectl get jobs,pods
NAME                 STATUS    COMPLETIONS   DURATION   AGE
job.batch/exec-job   Running   0/1           61s        61s

NAME                 READY   STATUS        RESTARTS   AGE
pod/exec-job-k62r4   1/1     Terminating   0          61s
wedge@pop-os:~/job-sleep$ kubectl get jobs,pods
NAME                 STATUS     COMPLETIONS   DURATION   AGE
job.batch/exec-job   Complete   1/1           63s        64s
```


### Kubernetes version

Reproduced on KIND, but first observed on EKS 1.32.

<details>

```console
$ kubectl version
Client Version: v1.33.2
Kustomize Version: v5.6.0
Server Version: v1.33.1
```

</details>


### Cloud provider

<details>
AWS
</details>


### OS version

<details>

```console
# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here
```

</details>


### Install tools

<details>

</details>


### Container runtime (CRI) and version (if applicable)

<details>

</details>


### Related plugins (CNI, CSI, ...) and versions (if applicable)

<details>

</details>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

preStop sleep continues after PID 1 exits #134338

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

preStop sleep continues after PID 1 exits #134338

Description

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions