-
Notifications
You must be signed in to change notification settings - Fork 41.6k
Closed
Labels
kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.Denotes an issue or PR has remained open with no activity and has become stale.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.Indicates an issue or PR lacks a `triage/foo` label and requires one.sig/nodeCategorizes an issue or PR as relevant to SIG Node.Categorizes an issue or PR as relevant to SIG Node.triage/needs-informationIndicates an issue needs more information in order to work on it.Indicates an issue needs more information in order to work on it.
Description
What happened?
In cluster which enable InPlacePodVerticalScaling, If I only resize resources, I will watch .status.resize and .status.containerStatuses[x].resources to know whether the resize progress.
I have encountered some corner cases that are difficult to consistently reproduce:
- User changes cpu request from 200m to 100m
- Kubelet set
.status.resizetoInProgress - Kubelet set
.status.resizeto be nil and set.status.containerStatuses[x].resourcesto be 100m - Kubelet set
.status.resizeto beInProgressand set.status.containerStatuses[x].resourcesto be 200m - finally, Kubelet set
.status.resizeto be nil and set.status.containerStatuses[x].resourcesto be 100m again
What did you expect to happen?
Under normal circumstances, steps 4 and 5 should not take place.
I have discovered some relevant information in the Kubelet logs.
// step 2
Nov 12 10:15:53 iZbp12y2uyns2wwzue952tZ kubelet[12114]: I1112 10:15:53.062599 12114 status_manager.go:874] "Patch status for pod" pod="e2e-tests-inplace-vpa-dn82w/clone-plf57qc7jx-csxgn" podUID="1cb463b9-8fbe-4d1f-a8ac-277d981684cd" patch="{\"metadata\":{\"uid\":\"1cb463b9-8fbe-4d1f-a8ac-277d981684cd\"},\"status\":{\"containerStatuses\":[{\"allocatedResources\":{\"cpu\":\"100m\",\"memory\":\"100Mi\"},\"containerID\":\"containerd://f674d36be2e4a13a47a66a1763e1d5bf315e08dba3c7d36e63169b9e1fefe8cb\",\"image\":\"registry.cn-hangzhou.aliyuncs.com/abner1/nginx:alpine\",\"imageID\":\"registry.cn-hangzhou.aliyuncs.com/abner1/nginx@sha256:e3c23d48c0a8ae0021a66c65fd9218608572e2746e6b923b9ddbcb89f29ef128\",\"lastState\":{},\"name\":\"nginx\",\"ready\":true,\"resources\":{\"limits\":{\"cpu\":\"1\",\"memory\":\"1Gi\"},\"requests\":{\"cpu\":\"200m\",\"memory\":\"200Mi\"}},\"restartCount\":0,\"started\":true,\"state\":{\"running\":{\"startedAt\":\"2024-11-12T02:15:50Z\"}}}],\"resize\":\"InProgress\"}}"
Nov 12 10:15:53 iZbp12y2uyns2wwzue952tZ kubelet[12114]: I1112 10:15:53.062635 12114 status_manager.go:883] "Status for pod updated successfully" pod="e2e-tests-inplace-vpa-dn82w/clone-plf57qc7jx-csxgn" statusVersion=3 status={"phase":"Running","conditions":[{"type":"KruisePodReady","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:50Z"},{"type":"InPlaceUpdateReady","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:50Z"},{"type":"PodReadyToStartContainers","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:51Z"},{"type":"Initialized","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:50Z"},{"type":"Ready","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:51Z"},{"type":"ContainersReady","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:51Z"},{"type":"PodScheduled","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:50Z"}],"hostIP":"172.21.154.57","hostIPs":[{"ip":"172.21.154.57"}],"podIP":"172.21.154.96","podIPs":[{"ip":"172.21.154.96"}],"startTime":"2024-11-12T02:15:50Z","containerStatuses":[{"name":"nginx","state":{"running":{"startedAt":"2024-11-12T02:15:50Z"}},"lastState":{},"ready":true,"restartCount":0,"image":"registry.cn-hangzhou.aliyuncs.com/abner1/nginx:alpine","imageID":"registry.cn-hangzhou.aliyuncs.com/abner1/nginx@sha256:e3c23d48c0a8ae0021a66c65fd9218608572e2746e6b923b9ddbcb89f29ef128","containerID":"containerd://f674d36be2e4a13a47a66a1763e1d5bf315e08dba3c7d36e63169b9e1fefe8cb","started":true,"allocatedResources":{"cpu":"100m","memory":"100Mi"},"resources":{"limits":{"cpu":"1","memory":"1Gi"},"requests":{"cpu":"200m","memory":"200Mi"}}}],"qosClass":"Burstable","resize":"InProgress"}
// step 3
Nov 12 10:15:54 iZbp12y2uyns2wwzue952tZ kubelet[12114]: I1112 10:15:54.056885 12114 kuberuntime_manager.go:1051] "computePodActions got for pod" podActions="KillPod: false, CreateSandbox: false, UpdatePodResources: false, Attempt: 0, InitContainersToStart: [], ContainersToStart: [], EphemeralContainersToStart: [],ContainersToUpdate: map[], ContainersToKill: map[]" pod="e2e-tests-inplace-vpa-dn82w/clone-plf57qc7jx-csxgn"
Nov 12 10:15:54 iZbp12y2uyns2wwzue952tZ kubelet[12114]: I1112 10:15:54.067151 12114 status_manager.go:874] "Patch status for pod" pod="e2e-tests-inplace-vpa-dn82w/clone-plf57qc7jx-csxgn" podUID="1cb463b9-8fbe-4d1f-a8ac-277d981684cd" patch="{\"metadata\":{\"uid\":\"1cb463b9-8fbe-4d1f-a8ac-277d981684cd\"},\"status\":{\"containerStatuses\":[{\"allocatedResources\":{\"cpu\":\"100m\",\"memory\":\"100Mi\"},\"containerID\":\"containerd://f674d36be2e4a13a47a66a1763e1d5bf315e08dba3c7d36e63169b9e1fefe8cb\",\"image\":\"registry.cn-hangzhou.aliyuncs.com/abner1/nginx:alpine\",\"imageID\":\"registry.cn-hangzhou.aliyuncs.com/abner1/nginx@sha256:e3c23d48c0a8ae0021a66c65fd9218608572e2746e6b923b9ddbcb89f29ef128\",\"lastState\":{},\"name\":\"nginx\",\"ready\":true,\"resources\":{\"limits\":{\"cpu\":\"800m\",\"memory\":\"800Mi\"},\"requests\":{\"cpu\":\"100m\",\"memory\":\"100Mi\"}},\"restartCount\":0,\"started\":true,\"state\":{\"running\":{\"startedAt\":\"2024-11-12T02:15:50Z\"}}}],\"resize\":null}}"
Nov 12 10:15:54 iZbp12y2uyns2wwzue952tZ kubelet[12114]: I1112 10:15:54.067191 12114 status_manager.go:883] "Status for pod updated successfully" pod="e2e-tests-inplace-vpa-dn82w/clone-plf57qc7jx-csxgn" statusVersion=4 status={"phase":"Running","conditions":[{"type":"KruisePodReady","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:50Z"},{"type":"InPlaceUpdateReady","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:50Z"},{"type":"PodReadyToStartContainers","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:51Z"},{"type":"Initialized","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:50Z"},{"type":"Ready","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:51Z"},{"type":"ContainersReady","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:51Z"},{"type":"PodScheduled","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:50Z"}],"hostIP":"172.21.154.57","hostIPs":[{"ip":"172.21.154.57"}],"podIP":"172.21.154.96","podIPs":[{"ip":"172.21.154.96"}],"startTime":"2024-11-12T02:15:50Z","containerStatuses":[{"name":"nginx","state":{"running":{"startedAt":"2024-11-12T02:15:50Z"}},"lastState":{},"ready":true,"restartCount":0,"image":"registry.cn-hangzhou.aliyuncs.com/abner1/nginx:alpine","imageID":"registry.cn-hangzhou.aliyuncs.com/abner1/nginx@sha256:e3c23d48c0a8ae0021a66c65fd9218608572e2746e6b923b9ddbcb89f29ef128","containerID":"containerd://f674d36be2e4a13a47a66a1763e1d5bf315e08dba3c7d36e63169b9e1fefe8cb","started":true,"allocatedResources":{"cpu":"100m","memory":"100Mi"},"resources":{"limits":{"cpu":"800m","memory":"800Mi"},"requests":{"cpu":"100m","memory":"100Mi"}}}],"qosClass":"Burstable"}
// step 4
Nov 12 10:15:54 iZbp12y2uyns2wwzue952tZ kubelet[12114]: I1112 10:15:54.081966 12114 status_manager.go:874] "Patch status for pod" pod="e2e-tests-inplace-vpa-dn82w/clone-plf57qc7jx-csxgn" podUID="1cb463b9-8fbe-4d1f-a8ac-277d981684cd" patch="{\"metadata\":{\"uid\":\"1cb463b9-8fbe-4d1f-a8ac-277d981684cd\"},\"status\":{\"containerStatuses\":[{\"allocatedResources\":{\"cpu\":\"100m\",\"memory\":\"100Mi\"},\"containerID\":\"containerd://f674d36be2e4a13a47a66a1763e1d5bf315e08dba3c7d36e63169b9e1fefe8cb\",\"image\":\"registry.cn-hangzhou.aliyuncs.com/abner1/nginx:alpine\",\"imageID\":\"registry.cn-hangzhou.aliyuncs.com/abner1/nginx@sha256:e3c23d48c0a8ae0021a66c65fd9218608572e2746e6b923b9ddbcb89f29ef128\",\"lastState\":{},\"name\":\"nginx\",\"ready\":true,\"resources\":{\"limits\":{\"cpu\":\"1\",\"memory\":\"1Gi\"},\"requests\":{\"cpu\":\"200m\",\"memory\":\"200Mi\"}},\"restartCount\":0,\"started\":true,\"state\":{\"running\":{\"startedAt\":\"2024-11-12T02:15:50Z\"}}}],\"resize\":\"InProgress\"}}"
Nov 12 10:15:54 iZbp12y2uyns2wwzue952tZ kubelet[12114]: I1112 10:15:54.081998 12114 status_manager.go:883] "Status for pod updated successfully" pod="e2e-tests-inplace-vpa-dn82w/clone-plf57qc7jx-csxgn" statusVersion=5 status={"phase":"Running","conditions":[{"type":"KruisePodReady","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:50Z"},{"type":"InPlaceUpdateReady","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:50Z"},{"type":"PodReadyToStartContainers","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:51Z"},{"type":"Initialized","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:50Z"},{"type":"Ready","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:51Z"},{"type":"ContainersReady","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:51Z"},{"type":"PodScheduled","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:50Z"}],"hostIP":"172.21.154.57","hostIPs":[{"ip":"172.21.154.57"}],"podIP":"172.21.154.96","podIPs":[{"ip":"172.21.154.96"}],"startTime":"2024-11-12T02:15:50Z","containerStatuses":[{"name":"nginx","state":{"running":{"startedAt":"2024-11-12T02:15:50Z"}},"lastState":{},"ready":true,"restartCount":0,"image":"registry.cn-hangzhou.aliyuncs.com/abner1/nginx:alpine","imageID":"registry.cn-hangzhou.aliyuncs.com/abner1/nginx@sha256:e3c23d48c0a8ae0021a66c65fd9218608572e2746e6b923b9ddbcb89f29ef128","containerID":"containerd://f674d36be2e4a13a47a66a1763e1d5bf315e08dba3c7d36e63169b9e1fefe8cb","started":true,"allocatedResources":{"cpu":"100m","memory":"100Mi"},"resources":{"limits":{"cpu":"1","memory":"1Gi"},"requests":{"cpu":"200m","memory":"200Mi"}}}],"qosClass":"Burstable","resize":"InProgress"}
// step5
Nov 12 10:15:55 iZbp12y2uyns2wwzue952tZ kubelet[12114]: I1112 10:15:55.052256 12114 kuberuntime_manager.go:1051] "computePodActions got for pod" podActions="KillPod: false, CreateSandbox: false, UpdatePodResources: false, Attempt: 0, InitContainersToStart: [], ContainersToStart: [], EphemeralContainersToStart: [],ContainersToUpdate: map[], ContainersToKill: map[]" pod="e2e-tests-inplace-vpa-dn82w/clone-plf57qc7jx-csxgn"
Nov 12 10:15:55 iZbp12y2uyns2wwzue952tZ kubelet[12114]: I1112 10:15:55.067675 12114 status_manager.go:874] "Patch status for pod" pod="e2e-tests-inplace-vpa-dn82w/clone-plf57qc7jx-csxgn" podUID="1cb463b9-8fbe-4d1f-a8ac-277d981684cd" patch="{\"metadata\":{\"uid\":\"1cb463b9-8fbe-4d1f-a8ac-277d981684cd\"},\"status\":{\"containerStatuses\":[{\"allocatedResources\":{\"cpu\":\"100m\",\"memory\":\"100Mi\"},\"containerID\":\"containerd://f674d36be2e4a13a47a66a1763e1d5bf315e08dba3c7d36e63169b9e1fefe8cb\",\"image\":\"registry.cn-hangzhou.aliyuncs.com/abner1/nginx:alpine\",\"imageID\":\"registry.cn-hangzhou.aliyuncs.com/abner1/nginx@sha256:e3c23d48c0a8ae0021a66c65fd9218608572e2746e6b923b9ddbcb89f29ef128\",\"lastState\":{},\"name\":\"nginx\",\"ready\":true,\"resources\":{\"limits\":{\"cpu\":\"800m\",\"memory\":\"800Mi\"},\"requests\":{\"cpu\":\"100m\",\"memory\":\"100Mi\"}},\"restartCount\":0,\"started\":true,\"state\":{\"running\":{\"startedAt\":\"2024-11-12T02:15:50Z\"}}}],\"resize\":null}}"
Nov 12 10:15:55 iZbp12y2uyns2wwzue952tZ kubelet[12114]: I1112 10:15:55.067746 12114 status_manager.go:883] "Status for pod updated successfully" pod="e2e-tests-inplace-vpa-dn82w/clone-plf57qc7jx-csxgn" statusVersion=6 status={"phase":"Running","conditions":[{"type":"KruisePodReady","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:50Z"},{"type":"InPlaceUpdateReady","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:50Z"},{"type":"PodReadyToStartContainers","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:51Z"},{"type":"Initialized","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:50Z"},{"type":"Ready","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:51Z"},{"type":"ContainersReady","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:51Z"},{"type":"PodScheduled","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:50Z"}],"hostIP":"172.21.154.57","hostIPs":[{"ip":"172.21.154.57"}],"podIP":"172.21.154.96","podIPs":[{"ip":"172.21.154.96"}],"startTime":"2024-11-12T02:15:50Z","containerStatuses":[{"name":"nginx","state":{"running":{"startedAt":"2024-11-12T02:15:50Z"}},"lastState":{},"ready":true,"restartCount":0,"image":"registry.cn-hangzhou.aliyuncs.com/abner1/nginx:alpine","imageID":"registry.cn-hangzhou.aliyuncs.com/abner1/nginx@sha256:e3c23d48c0a8ae0021a66c65fd9218608572e2746e6b923b9ddbcb89f29ef128","containerID":"containerd://f674d36be2e4a13a47a66a1763e1d5bf315e08dba3c7d36e63169b9e1fefe8cb","started":true,"allocatedResources":{"cpu":"100m","memory":"100Mi"},"resources":{"limits":{"cpu":"800m","memory":"800Mi"},"requests":{"cpu":"100m","memory":"100Mi"}}}],"qosClass":"Burstable"}
How can we reproduce it (as minimally and precisely as possible)?
I am unable to identify a consistent method to reproduce this issue.
This is an intermittent case.
Anything else we need to know?
No response
Kubernetes version
$ kubectl version
# paste output hereCloud provider
OS version
# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here
# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output hereInstall tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
Metadata
Metadata
Assignees
Labels
kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.Denotes an issue or PR has remained open with no activity and has become stale.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.Indicates an issue or PR lacks a `triage/foo` label and requires one.sig/nodeCategorizes an issue or PR as relevant to SIG Node.Categorizes an issue or PR as relevant to SIG Node.triage/needs-informationIndicates an issue needs more information in order to work on it.Indicates an issue needs more information in order to work on it.
Type
Projects
Status
Done
Status
Done