-
Notifications
You must be signed in to change notification settings - Fork 41.6k
Description
What happened?
After enabling In-Place Pod Vertical Scaling, if deploy a pod without setting memory (or cpu) request, kubelet will be failed at its second restart
What did you expect to happen?
kubelet restart successfully
How can we reproduce it (as minimally and precisely as possible)?
- start kubelet with In-Place Pod Vertical Scaling enabled
- deploy a pod without setting memory request, e.g:
apiVersion: v1
kind: Pod
metadata:
name: test-pod
spec:
containers:
- image: nginx:1.24.0
imagePullPolicy: IfNotPresent
name: nginx
resources:
requests:
cpu: 100m
- restart kubelet for the first time
- restart kubelet for the second time
- kubelet will fail with error:
panic: could not restore state from checkpoint: checkpoint is corrupted, please drain this node and delete pod allocation checkpoint file "/var/lib/kubelet/pod_status_manager_state" before restarting Kubelet
Anything else we need to know?
After deploying the pod, the following information is saved in the file pod_status_manager_state:
"nginx": {
"cpu": "100m"
}
Then restart kubelet for the first time, the relevant content in the file becomes:
"nginx": {
"cpu": "100m",
"memory": "0"
}
When deploying the pod, as there are no records in the checkpoint (pod_status_manager_state), kubelet will get resource config from pod.Spec.Container.Resources directly and save it to the checkpoint (without memory request)
kubernetes/pkg/kubelet/kubelet.go
Lines 2504 to 2521 in 3125009
| if utilfeature.DefaultFeatureGate.Enabled(features.InPlacePodVerticalScaling) { | |
| // To handle kubelet restarts, test pod admissibility using AllocatedResources values | |
| // (for cpu & memory) from checkpoint store. If found, that is the source of truth. | |
| podCopy := pod.DeepCopy() | |
| for _, c := range podCopy.Spec.Containers { | |
| allocatedResources, found := kl.statusManager.GetContainerResourceAllocation(string(pod.UID), c.Name) | |
| if c.Resources.Requests != nil && found { | |
| c.Resources.Requests[v1.ResourceCPU] = allocatedResources[v1.ResourceCPU] | |
| c.Resources.Requests[v1.ResourceMemory] = allocatedResources[v1.ResourceMemory] | |
| } | |
| } | |
| // Check if we can admit the pod; if not, reject it. | |
| if ok, reason, message := kl.canAdmitPod(activePods, podCopy); !ok { | |
| kl.rejectPod(pod, reason, message) | |
| continue | |
| } | |
| // For new pod, checkpoint the resource values at which the Pod has been admitted | |
| if err := kl.statusManager.SetPodAllocation(podCopy); err != nil { |
When restarting kubelet for the first time, it will restore the previously saved data from the checkpoint. Since the memory request was not previously set, it will be set to an empty value and resaved to the checkpoint
kubernetes/pkg/kubelet/kubelet.go
Lines 2508 to 2514 in 3125009
| for _, c := range podCopy.Spec.Containers { | |
| allocatedResources, found := kl.statusManager.GetContainerResourceAllocation(string(pod.UID), c.Name) | |
| if c.Resources.Requests != nil && found { | |
| c.Resources.Requests[v1.ResourceCPU] = allocatedResources[v1.ResourceCPU] | |
| c.Resources.Requests[v1.ResourceMemory] = allocatedResources[v1.ResourceMemory] | |
| } | |
| } |
When restarting kubelet for the second time, it will process
memory: 0 as a value of 0 (not empty), which resulting in the inconsistency between the calculated checksum during data recovery and the previous one
Kubernetes version
$ kubectl version
# paste output hereCloud provider
OS version
# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here
# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output hereInstall tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
Metadata
Metadata
Assignees
Labels
Type
Projects
Status