Skip to content

[InPlacePodVerticalScaling]kubelet sometimes set .status.resize incorrectly #128993

@ABNER-1

Description

@ABNER-1

What happened?

In cluster which enable InPlacePodVerticalScaling, If I only resize resources, I will watch .status.resize and .status.containerStatuses[x].resources to know whether the resize progress.

I have encountered some corner cases that are difficult to consistently reproduce:

  1. User changes cpu request from 200m to 100m
  2. Kubelet set .status.resize to InProgress
  3. Kubelet set .status.resize to be nil and set .status.containerStatuses[x].resources to be 100m
  4. Kubelet set .status.resize to be InProgress and set .status.containerStatuses[x].resources to be 200m
  5. finally, Kubelet set .status.resize to be nil and set .status.containerStatuses[x].resources to be 100m again

What did you expect to happen?

Under normal circumstances, steps 4 and 5 should not take place.

I have discovered some relevant information in the Kubelet logs.

// step 2

Nov 12 10:15:53 iZbp12y2uyns2wwzue952tZ kubelet[12114]: I1112 10:15:53.062599   12114 status_manager.go:874] "Patch status for pod" pod="e2e-tests-inplace-vpa-dn82w/clone-plf57qc7jx-csxgn" podUID="1cb463b9-8fbe-4d1f-a8ac-277d981684cd" patch="{\"metadata\":{\"uid\":\"1cb463b9-8fbe-4d1f-a8ac-277d981684cd\"},\"status\":{\"containerStatuses\":[{\"allocatedResources\":{\"cpu\":\"100m\",\"memory\":\"100Mi\"},\"containerID\":\"containerd://f674d36be2e4a13a47a66a1763e1d5bf315e08dba3c7d36e63169b9e1fefe8cb\",\"image\":\"registry.cn-hangzhou.aliyuncs.com/abner1/nginx:alpine\",\"imageID\":\"registry.cn-hangzhou.aliyuncs.com/abner1/nginx@sha256:e3c23d48c0a8ae0021a66c65fd9218608572e2746e6b923b9ddbcb89f29ef128\",\"lastState\":{},\"name\":\"nginx\",\"ready\":true,\"resources\":{\"limits\":{\"cpu\":\"1\",\"memory\":\"1Gi\"},\"requests\":{\"cpu\":\"200m\",\"memory\":\"200Mi\"}},\"restartCount\":0,\"started\":true,\"state\":{\"running\":{\"startedAt\":\"2024-11-12T02:15:50Z\"}}}],\"resize\":\"InProgress\"}}"
Nov 12 10:15:53 iZbp12y2uyns2wwzue952tZ kubelet[12114]: I1112 10:15:53.062635   12114 status_manager.go:883] "Status for pod updated successfully" pod="e2e-tests-inplace-vpa-dn82w/clone-plf57qc7jx-csxgn" statusVersion=3 status={"phase":"Running","conditions":[{"type":"KruisePodReady","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:50Z"},{"type":"InPlaceUpdateReady","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:50Z"},{"type":"PodReadyToStartContainers","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:51Z"},{"type":"Initialized","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:50Z"},{"type":"Ready","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:51Z"},{"type":"ContainersReady","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:51Z"},{"type":"PodScheduled","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:50Z"}],"hostIP":"172.21.154.57","hostIPs":[{"ip":"172.21.154.57"}],"podIP":"172.21.154.96","podIPs":[{"ip":"172.21.154.96"}],"startTime":"2024-11-12T02:15:50Z","containerStatuses":[{"name":"nginx","state":{"running":{"startedAt":"2024-11-12T02:15:50Z"}},"lastState":{},"ready":true,"restartCount":0,"image":"registry.cn-hangzhou.aliyuncs.com/abner1/nginx:alpine","imageID":"registry.cn-hangzhou.aliyuncs.com/abner1/nginx@sha256:e3c23d48c0a8ae0021a66c65fd9218608572e2746e6b923b9ddbcb89f29ef128","containerID":"containerd://f674d36be2e4a13a47a66a1763e1d5bf315e08dba3c7d36e63169b9e1fefe8cb","started":true,"allocatedResources":{"cpu":"100m","memory":"100Mi"},"resources":{"limits":{"cpu":"1","memory":"1Gi"},"requests":{"cpu":"200m","memory":"200Mi"}}}],"qosClass":"Burstable","resize":"InProgress"}

// step 3

Nov 12 10:15:54 iZbp12y2uyns2wwzue952tZ kubelet[12114]: I1112 10:15:54.056885   12114 kuberuntime_manager.go:1051] "computePodActions got for pod" podActions="KillPod: false, CreateSandbox: false, UpdatePodResources: false, Attempt: 0, InitContainersToStart: [], ContainersToStart: [], EphemeralContainersToStart: [],ContainersToUpdate: map[], ContainersToKill: map[]" pod="e2e-tests-inplace-vpa-dn82w/clone-plf57qc7jx-csxgn"
Nov 12 10:15:54 iZbp12y2uyns2wwzue952tZ kubelet[12114]: I1112 10:15:54.067151   12114 status_manager.go:874] "Patch status for pod" pod="e2e-tests-inplace-vpa-dn82w/clone-plf57qc7jx-csxgn" podUID="1cb463b9-8fbe-4d1f-a8ac-277d981684cd" patch="{\"metadata\":{\"uid\":\"1cb463b9-8fbe-4d1f-a8ac-277d981684cd\"},\"status\":{\"containerStatuses\":[{\"allocatedResources\":{\"cpu\":\"100m\",\"memory\":\"100Mi\"},\"containerID\":\"containerd://f674d36be2e4a13a47a66a1763e1d5bf315e08dba3c7d36e63169b9e1fefe8cb\",\"image\":\"registry.cn-hangzhou.aliyuncs.com/abner1/nginx:alpine\",\"imageID\":\"registry.cn-hangzhou.aliyuncs.com/abner1/nginx@sha256:e3c23d48c0a8ae0021a66c65fd9218608572e2746e6b923b9ddbcb89f29ef128\",\"lastState\":{},\"name\":\"nginx\",\"ready\":true,\"resources\":{\"limits\":{\"cpu\":\"800m\",\"memory\":\"800Mi\"},\"requests\":{\"cpu\":\"100m\",\"memory\":\"100Mi\"}},\"restartCount\":0,\"started\":true,\"state\":{\"running\":{\"startedAt\":\"2024-11-12T02:15:50Z\"}}}],\"resize\":null}}"
Nov 12 10:15:54 iZbp12y2uyns2wwzue952tZ kubelet[12114]: I1112 10:15:54.067191   12114 status_manager.go:883] "Status for pod updated successfully" pod="e2e-tests-inplace-vpa-dn82w/clone-plf57qc7jx-csxgn" statusVersion=4 status={"phase":"Running","conditions":[{"type":"KruisePodReady","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:50Z"},{"type":"InPlaceUpdateReady","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:50Z"},{"type":"PodReadyToStartContainers","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:51Z"},{"type":"Initialized","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:50Z"},{"type":"Ready","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:51Z"},{"type":"ContainersReady","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:51Z"},{"type":"PodScheduled","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:50Z"}],"hostIP":"172.21.154.57","hostIPs":[{"ip":"172.21.154.57"}],"podIP":"172.21.154.96","podIPs":[{"ip":"172.21.154.96"}],"startTime":"2024-11-12T02:15:50Z","containerStatuses":[{"name":"nginx","state":{"running":{"startedAt":"2024-11-12T02:15:50Z"}},"lastState":{},"ready":true,"restartCount":0,"image":"registry.cn-hangzhou.aliyuncs.com/abner1/nginx:alpine","imageID":"registry.cn-hangzhou.aliyuncs.com/abner1/nginx@sha256:e3c23d48c0a8ae0021a66c65fd9218608572e2746e6b923b9ddbcb89f29ef128","containerID":"containerd://f674d36be2e4a13a47a66a1763e1d5bf315e08dba3c7d36e63169b9e1fefe8cb","started":true,"allocatedResources":{"cpu":"100m","memory":"100Mi"},"resources":{"limits":{"cpu":"800m","memory":"800Mi"},"requests":{"cpu":"100m","memory":"100Mi"}}}],"qosClass":"Burstable"}

// step 4

Nov 12 10:15:54 iZbp12y2uyns2wwzue952tZ kubelet[12114]: I1112 10:15:54.081966   12114 status_manager.go:874] "Patch status for pod" pod="e2e-tests-inplace-vpa-dn82w/clone-plf57qc7jx-csxgn" podUID="1cb463b9-8fbe-4d1f-a8ac-277d981684cd" patch="{\"metadata\":{\"uid\":\"1cb463b9-8fbe-4d1f-a8ac-277d981684cd\"},\"status\":{\"containerStatuses\":[{\"allocatedResources\":{\"cpu\":\"100m\",\"memory\":\"100Mi\"},\"containerID\":\"containerd://f674d36be2e4a13a47a66a1763e1d5bf315e08dba3c7d36e63169b9e1fefe8cb\",\"image\":\"registry.cn-hangzhou.aliyuncs.com/abner1/nginx:alpine\",\"imageID\":\"registry.cn-hangzhou.aliyuncs.com/abner1/nginx@sha256:e3c23d48c0a8ae0021a66c65fd9218608572e2746e6b923b9ddbcb89f29ef128\",\"lastState\":{},\"name\":\"nginx\",\"ready\":true,\"resources\":{\"limits\":{\"cpu\":\"1\",\"memory\":\"1Gi\"},\"requests\":{\"cpu\":\"200m\",\"memory\":\"200Mi\"}},\"restartCount\":0,\"started\":true,\"state\":{\"running\":{\"startedAt\":\"2024-11-12T02:15:50Z\"}}}],\"resize\":\"InProgress\"}}"
Nov 12 10:15:54 iZbp12y2uyns2wwzue952tZ kubelet[12114]: I1112 10:15:54.081998   12114 status_manager.go:883] "Status for pod updated successfully" pod="e2e-tests-inplace-vpa-dn82w/clone-plf57qc7jx-csxgn" statusVersion=5 status={"phase":"Running","conditions":[{"type":"KruisePodReady","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:50Z"},{"type":"InPlaceUpdateReady","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:50Z"},{"type":"PodReadyToStartContainers","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:51Z"},{"type":"Initialized","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:50Z"},{"type":"Ready","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:51Z"},{"type":"ContainersReady","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:51Z"},{"type":"PodScheduled","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:50Z"}],"hostIP":"172.21.154.57","hostIPs":[{"ip":"172.21.154.57"}],"podIP":"172.21.154.96","podIPs":[{"ip":"172.21.154.96"}],"startTime":"2024-11-12T02:15:50Z","containerStatuses":[{"name":"nginx","state":{"running":{"startedAt":"2024-11-12T02:15:50Z"}},"lastState":{},"ready":true,"restartCount":0,"image":"registry.cn-hangzhou.aliyuncs.com/abner1/nginx:alpine","imageID":"registry.cn-hangzhou.aliyuncs.com/abner1/nginx@sha256:e3c23d48c0a8ae0021a66c65fd9218608572e2746e6b923b9ddbcb89f29ef128","containerID":"containerd://f674d36be2e4a13a47a66a1763e1d5bf315e08dba3c7d36e63169b9e1fefe8cb","started":true,"allocatedResources":{"cpu":"100m","memory":"100Mi"},"resources":{"limits":{"cpu":"1","memory":"1Gi"},"requests":{"cpu":"200m","memory":"200Mi"}}}],"qosClass":"Burstable","resize":"InProgress"}

// step5

Nov 12 10:15:55 iZbp12y2uyns2wwzue952tZ kubelet[12114]: I1112 10:15:55.052256   12114 kuberuntime_manager.go:1051] "computePodActions got for pod" podActions="KillPod: false, CreateSandbox: false, UpdatePodResources: false, Attempt: 0, InitContainersToStart: [], ContainersToStart: [], EphemeralContainersToStart: [],ContainersToUpdate: map[], ContainersToKill: map[]" pod="e2e-tests-inplace-vpa-dn82w/clone-plf57qc7jx-csxgn"
Nov 12 10:15:55 iZbp12y2uyns2wwzue952tZ kubelet[12114]: I1112 10:15:55.067675   12114 status_manager.go:874] "Patch status for pod" pod="e2e-tests-inplace-vpa-dn82w/clone-plf57qc7jx-csxgn" podUID="1cb463b9-8fbe-4d1f-a8ac-277d981684cd" patch="{\"metadata\":{\"uid\":\"1cb463b9-8fbe-4d1f-a8ac-277d981684cd\"},\"status\":{\"containerStatuses\":[{\"allocatedResources\":{\"cpu\":\"100m\",\"memory\":\"100Mi\"},\"containerID\":\"containerd://f674d36be2e4a13a47a66a1763e1d5bf315e08dba3c7d36e63169b9e1fefe8cb\",\"image\":\"registry.cn-hangzhou.aliyuncs.com/abner1/nginx:alpine\",\"imageID\":\"registry.cn-hangzhou.aliyuncs.com/abner1/nginx@sha256:e3c23d48c0a8ae0021a66c65fd9218608572e2746e6b923b9ddbcb89f29ef128\",\"lastState\":{},\"name\":\"nginx\",\"ready\":true,\"resources\":{\"limits\":{\"cpu\":\"800m\",\"memory\":\"800Mi\"},\"requests\":{\"cpu\":\"100m\",\"memory\":\"100Mi\"}},\"restartCount\":0,\"started\":true,\"state\":{\"running\":{\"startedAt\":\"2024-11-12T02:15:50Z\"}}}],\"resize\":null}}"
Nov 12 10:15:55 iZbp12y2uyns2wwzue952tZ kubelet[12114]: I1112 10:15:55.067746   12114 status_manager.go:883] "Status for pod updated successfully" pod="e2e-tests-inplace-vpa-dn82w/clone-plf57qc7jx-csxgn" statusVersion=6 status={"phase":"Running","conditions":[{"type":"KruisePodReady","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:50Z"},{"type":"InPlaceUpdateReady","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:50Z"},{"type":"PodReadyToStartContainers","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:51Z"},{"type":"Initialized","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:50Z"},{"type":"Ready","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:51Z"},{"type":"ContainersReady","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:51Z"},{"type":"PodScheduled","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-11-12T02:15:50Z"}],"hostIP":"172.21.154.57","hostIPs":[{"ip":"172.21.154.57"}],"podIP":"172.21.154.96","podIPs":[{"ip":"172.21.154.96"}],"startTime":"2024-11-12T02:15:50Z","containerStatuses":[{"name":"nginx","state":{"running":{"startedAt":"2024-11-12T02:15:50Z"}},"lastState":{},"ready":true,"restartCount":0,"image":"registry.cn-hangzhou.aliyuncs.com/abner1/nginx:alpine","imageID":"registry.cn-hangzhou.aliyuncs.com/abner1/nginx@sha256:e3c23d48c0a8ae0021a66c65fd9218608572e2746e6b923b9ddbcb89f29ef128","containerID":"containerd://f674d36be2e4a13a47a66a1763e1d5bf315e08dba3c7d36e63169b9e1fefe8cb","started":true,"allocatedResources":{"cpu":"100m","memory":"100Mi"},"resources":{"limits":{"cpu":"800m","memory":"800Mi"},"requests":{"cpu":"100m","memory":"100Mi"}}}],"qosClass":"Burstable"}

How can we reproduce it (as minimally and precisely as possible)?

I am unable to identify a consistent method to reproduce this issue.
This is an intermittent case.

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
# paste output here

Cloud provider

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.sig/nodeCategorizes an issue or PR as relevant to SIG Node.triage/needs-informationIndicates an issue needs more information in order to work on it.

    Type

    No type

    Projects

    Status

    Done

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions