-
Notifications
You must be signed in to change notification settings - Fork 41.6k
Description
What happened?
I've configured the podSpec of a Job to include a preStop sleep lifecycle hook. If the Pod is asked to terminate before it completes, the preStop sleep begins and runs for its full duration - even if the Pod completes its processing and PID 1 exits before the preStop sleep is complete. The Pod remains in "Terminating" and the Job remains in "Running" state until the preStop sleep finishes, which delays the completion of the Job.
What did you expect to happen?
I expect the preStop sleep to end when PID 1 in the Pod exits. The Job should be marked as complete at that time and the Pod should finish terminating. This is the behavior when a preStop exec is used to run the sleep command in the container's shell.
How can we reproduce it (as minimally and precisely as possible)?
Define a Job with a preStop sleep lifecycle hook:
apiVersion: batch/v1
kind: Job
metadata:
name: sleep-job
spec:
podReplacementPolicy: Failed
template:
spec:
containers:
- command:
- /bin/sleep
- "60"
image: busybox:latest
name: sleep-container
lifecycle:
preStop:
sleep:
seconds: 120
terminationGracePeriodSeconds: 120
restartPolicy: NeverApply the manifest and then delete the resulting Pod. Observe that the Pod is now in "Terminating" state. The preStop sleep will have begun.
wedge@pop-os:~/job-sleep$ kubectl apply -f sleep-job.yaml
job.batch/sleep-job created
wedge@pop-os:~/job-sleep$ kubectl get jobs,pods
NAME STATUS COMPLETIONS DURATION AGE
job.batch/sleep-job Running 0/1 6s 6s
NAME READY STATUS RESTARTS AGE
pod/sleep-job-qh2qk 1/1 Running 0 6s
wedge@pop-os:~/job-sleep$ kubectl delete pods sleep-job-qh2qk
pod "sleep-job-qh2qk" deleted
^Cwedge@pop-os:~/job-sleep$ kubectl get jobs,pods
NAME STATUS COMPLETIONS DURATION AGE
job.batch/sleep-job Running 0/1 24s 24s
NAME READY STATUS RESTARTS AGE
pod/sleep-job-qh2qk 1/1 Terminating 0 24sAfter 60 seconds the Pod should complete but instead it remains in Terminating state and the Job shows 0 completions. Only after 2 minutes has elapsed (from the time the delete command was issued) does the Pod finish terminating and the Job shows a completion.
wedge@pop-os:~/job-sleep$ kubectl get jobs,pods
NAME STATUS COMPLETIONS DURATION AGE
job.batch/sleep-job Running 0/1 59s 59s
NAME READY STATUS RESTARTS AGE
pod/sleep-job-qh2qk 1/1 Terminating 0 59s
wedge@pop-os:~/job-sleep$ kubectl get jobs,pods
NAME STATUS COMPLETIONS DURATION AGE
job.batch/sleep-job Running 0/1 69s 69s
NAME READY STATUS RESTARTS AGE
pod/sleep-job-qh2qk 1/1 Terminating 0 69s
wedge@pop-os:~/job-sleep$ kubectl get jobs,pods
NAME STATUS COMPLETIONS DURATION AGE
job.batch/sleep-job Running 0/1 2m19s 2m19s
NAME READY STATUS RESTARTS AGE
pod/sleep-job-qh2qk 1/1 Terminating 0 2m19s
wedge@pop-os:~/job-sleep$ kubectl get jobs,pods
NAME STATUS COMPLETIONS DURATION AGE
job.batch/sleep-job Complete 1/1 2m21s 2m22sAnything else we need to know?
This kind of configuration is useful to ensure that batch jobs run to completion even if they are asked to terminate for whatever reason (maintenance, preemption, autoscaler consolidation, etc.). The fact that the Job takes longer to complete when the Pod is asked to terminate is problematic when timely completion of the Job is desired. Imagine that the Job takes an hour instead of a minute and so the prestop sleep is set to 3600. Then imagine that the Pod is asked to terminate at minute 59. The duration of the Job will now be 119 minutes instead of 60.
Using a preStop exec to run /bin/sleep produces the expected behavior where the Pod finishes terminating and the Job shows as complete when PID 1 in the container exits. Of course this won't work in a shell-less container which AFAIK is the reason the preStop sleep was introduced.
apiVersion: batch/v1
kind: Job
metadata:
name: exec-job
spec:
podReplacementPolicy: Failed
template:
spec:
containers:
- command:
- /bin/sleep
- "60"
image: busybox:latest
name: exec-container
lifecycle:
preStop:
exec:
command: ["/bin/sleep", "120"]
terminationGracePeriodSeconds: 120
restartPolicy: Neverwedge@pop-os:~/job-sleep$ kubectl apply -f exec-job.yaml
job.batch/exec-job created
wedge@pop-os:~/job-sleep$ kubectl get jobs,pods
NAME STATUS COMPLETIONS DURATION AGE
job.batch/exec-job Running 0/1 4s 4s
NAME READY STATUS RESTARTS AGE
pod/exec-job-k62r4 1/1 Running 0 4s
wedge@pop-os:~/job-sleep$ kubectl delete pods exec-job-k62r4
pod "exec-job-k62r4" deleted
^Cwedge@pop-os:~/job-sleep$ kubectl get jobs,pods
NAME STATUS COMPLETIONS DURATION AGE
job.batch/exec-job Running 0/1 23s 23s
NAME READY STATUS RESTARTS AGE
pod/exec-job-k62r4 1/1 Terminating 0 23s
wedge@pop-os:~/job-sleep$ kubectl get jobs,pods
NAME STATUS COMPLETIONS DURATION AGE
job.batch/exec-job Running 0/1 55s 55s
NAME READY STATUS RESTARTS AGE
pod/exec-job-k62r4 1/1 Terminating 0 55s
wedge@pop-os:~/job-sleep$ kubectl get jobs,pods
NAME STATUS COMPLETIONS DURATION AGE
job.batch/exec-job Running 0/1 61s 61s
NAME READY STATUS RESTARTS AGE
pod/exec-job-k62r4 1/1 Terminating 0 61s
wedge@pop-os:~/job-sleep$ kubectl get jobs,pods
NAME STATUS COMPLETIONS DURATION AGE
job.batch/exec-job Complete 1/1 63s 64sKubernetes version
Reproduced on KIND, but first observed on EKS 1.32.
$ kubectl version
Client Version: v1.33.2
Kustomize Version: v5.6.0
Server Version: v1.33.1Cloud provider
OS version
# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here
# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output hereInstall tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
Metadata
Metadata
Assignees
Labels
Type
Projects
Status