Skip to content

[FG:InPlacePodVerticalScaling] PLEG doesn't work well with alpha feature InPlacePodVerticalScaling #123940

@Jeffwan

Description

@Jeffwan

What happened?

Background

This is a follow up issue from https://github.com/kubernetes/kubernetes/pull/120432/files#r1489932247

Originally, we fix the InPlacePodVerticalScaling performance issue by fetching the runtime status in single sync loop which is not elegant. Later, I follow @smarterclayton's suggestion to leverage PLEG to emit events to fix it.

if utilfeature.DefaultFeatureGate.Enabled(features.InPlacePodVerticalScaling) && isPodResizeInProgress(pod, &apiPodStatus) {
// While resize is in progress, periodically call PLEG to update pod cache
runningPod := kubecontainer.ConvertPodStatusToRunningPod(kl.getRuntime().Type(), podStatus)
if err, _ := kl.pleg.UpdateCache(&runningPod, pod.UID); err != nil {
klog.ErrorS(err, "Failed to update pod cache", "pod", klog.KObj(pod))
return false, err
}
}

(InPlacePodVerticalScaling puts the resizing pod into PLEG cache for further reconcilation)

I notice there're two problems.

1. There's no way to generate a PodLifecycleEvent event for resized pod.

a. oldPod and newPod are exact same in in-place update scenarios. It can not distinguish the pod that has been resized. Since they are same, no events would be generated.

https://github.com/kubernetes/kubernetes/blob/89f03e3988a4e7fed90ffce22f355ff248520ad2/pkg/kubelet/pleg/generic.go#L252-L257C14

b. The running pod PLEG cache are not being used at all.
pleg.updateCache() will call runtime.GetPodStatus underneath. In VPA case, the latest CRI container status will be fetched. That means the cache store a new container status. However, it is not being used in the Relist flow. I think even we use it inpleg.Relist(), there's no way to distinguish the resized pod based on existing fields. Please check attached code snippets of their data structures.

2. plegContainerUnknown(ContainerChanged) is not correctly handled for resized container

Seems plegContainerUnknown(ContainerChanged) is the best state for the resized container. However, this is not correctly handle and it was not successfully passed in the event channel.

if events[i].Type == ContainerChanged {
continue

Proposal

I would suggest to remove the kl.pleg.UpdateCache(&runningPod, pod.UID); logic in kubelet syncPod loop since the pod would be fetched in pleg.Relist() and properly handle the ContainerChanged event.

I cut a new PR to fix this issue #123941, please check whether that makes sense?

What did you expect to happen?

I hope the resized container could be picked up by PLEG Relist logic (interval is 1 s) and trigger the status update and no need to wait for kubelet's next reconcile loop (interval is 1 minute)

How can we reproduce it (as minimally and precisely as possible)?

  1. Enable the InPlacePodVerticalScaling feature gate and start the cluster
  2. Create a pod with 1c.
  3. Bump to 2c.
  4. Check pod container status. ResourceAllocated would be updated but status won't be updated until next kubelet recocnile loop, which means normally you will wait ~1 mins.

Anything else we need to know?

relist pod status. The status is captured after the pod cpu was update from 1 to 2

old pod

{
  "ID": "4e2b3d81-129f-40e4-a579-631397aa718c",
  "Name": "tomcat",
  "Namespace": "default",
  "CreatedAt": 1710219686362299600,
  "Containers": [
    {
      "ID": "containerd://a06de85dbb87c2c0632df34b60ff9323ea29b9be362080ae75b7fdf0526e9c17",
      "Name": "tomcat",
      "Image": "sha256:ef6a7c98d192507d6066dcf24e44bec66d07ec9cf7c55d8d3d1ea0a24660bdef",
      "ImageID": "sha256:ef6a7c98d192507d6066dcf24e44bec66d07ec9cf7c55d8d3d1ea0a24660bdef",
      "ImageRef": "sha256:ef6a7c98d192507d6066dcf24e44bec66d07ec9cf7c55d8d3d1ea0a24660bdef",
      "ImageRuntimeHandler": "",
      "Hash": 2397588892,
      "HashWithoutResources": 3106650780,
      "State": "running"
    }
  ],
  "Sandboxes": [
    {
      "ID": "containerd://6bda38b639c84b369a169b94f8bf820bda49ab8f98a2ab365acc52883cebb25a",
      "Name": "",
      "Image": "",
      "ImageID": "",
      "ImageRef": "",
      "ImageRuntimeHandler": "",
      "Hash": 0,
      "HashWithoutResources": 0,
      "State": "running"
    }
  ]
}

newpod

{
  "ID": "4e2b3d81-129f-40e4-a579-631397aa718c",
  "Name": "tomcat",
  "Namespace": "default",
  "CreatedAt": 1710219686362299600,
  "Containers": [
    {
      "ID": "containerd://a06de85dbb87c2c0632df34b60ff9323ea29b9be362080ae75b7fdf0526e9c17",
      "Name": "tomcat",
      "Image": "sha256:ef6a7c98d192507d6066dcf24e44bec66d07ec9cf7c55d8d3d1ea0a24660bdef",
      "ImageID": "sha256:ef6a7c98d192507d6066dcf24e44bec66d07ec9cf7c55d8d3d1ea0a24660bdef",
      "ImageRef": "sha256:ef6a7c98d192507d6066dcf24e44bec66d07ec9cf7c55d8d3d1ea0a24660bdef",
      "ImageRuntimeHandler": "",
      "Hash": 2397588892,
      "HashWithoutResources": 3106650780,
      "State": "running"
    }
  ],
  "Sandboxes": [
    {
      "ID": "containerd://6bda38b639c84b369a169b94f8bf820bda49ab8f98a2ab365acc52883cebb25a",
      "Name": "",
      "Image": "",
      "ImageID": "",
      "ImageRef": "",
      "ImageRuntimeHandler": "",
      "Hash": 0,
      "HashWithoutResources": 0,
      "State": "running"
    }
  ]
}

pleg cached pod - from

if utilfeature.DefaultFeatureGate.Enabled(features.InPlacePodVerticalScaling) && isPodResizeInProgress(pod, &apiPodStatus) {
// While resize is in progress, periodically call PLEG to update pod cache
runningPod := kubecontainer.ConvertPodStatusToRunningPod(kl.getRuntime().Type(), podStatus)
if err, _ := kl.pleg.UpdateCache(&runningPod, pod.UID); err != nil {
klog.ErrorS(err, "Failed to update pod cache", "pod", klog.KObj(pod))
return false, err
}
}

{
  "ID": "4e2b3d81-129f-40e4-a579-631397aa718c",
  "Name": "tomcat",
  "Namespace": "default",
  "IPs": [
    "10.88.0.63",
    "2001:db8:4860::3f"
  ],
  "ContainerStatuses": [
    {
      "ID": "containerd://a06de85dbb87c2c0632df34b60ff9323ea29b9be362080ae75b7fdf0526e9c17",
      "Name": "tomcat",
      "State": "running",
      "CreatedAt": "2024-03-12T05:01:27.032584855Z",
      "StartedAt": "2024-03-12T05:01:27.083899049Z",
      "FinishedAt": "0001-01-01T00:00:00Z",
      "ExitCode": 0,
      "Image": "docker.io/library/tomcat:8.0",
      "ImageID": "docker.io/library/tomcat@sha256:8ecb10948deb32c34aeadf7bf95d12a93fbd3527911fa629c1a3e7823b89ce6f",
      "ImageRef": "docker.io/library/tomcat@sha256:8ecb10948deb32c34aeadf7bf95d12a93fbd3527911fa629c1a3e7823b89ce6f",
      "ImageRuntimeHandler": "",
      "Hash": 2397588892,
      "HashWithoutResources": 3106650780,
      "RestartCount": 0,
      "Reason": "",
      "Message": "",
      "Resources": {
        "CPURequest": "2",
        "CPULimit": "2",
        "MemoryRequest": null,
        "MemoryLimit": null
      }
    }
  ],
  "SandboxStatuses": [
    {
      "id": "6bda38b639c84b369a169b94f8bf820bda49ab8f98a2ab365acc52883cebb25a",
      "metadata": {
        "name": "tomcat",
        "uid": "4e2b3d81-129f-40e4-a579-631397aa718c",
        "namespace": "default"
      },
      "created_at": 1710219686362299600,
      "network": {
        "ip": "10.88.0.63",
        "additional_ips": [
          {
            "ip": "2001:db8:4860::3f"
          }
        ]
      },
      "linux": {
        "namespaces": {
          "options": {
            "pid": 1
          }
        }
      },
      "labels": {
        "io.kubernetes.pod.name": "tomcat",
        "io.kubernetes.pod.namespace": "default",
        "io.kubernetes.pod.uid": "4e2b3d81-129f-40e4-a579-631397aa718c"
      },
      "annotations": {
        "kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"v1\",\"kind\":\"Pod\",\"metadata\":{\"annotations\":{},\"name\":\"tomcat\",\"namespace\":\"default\"},\"spec\":{\"containers\":[{\"image\":\"tomcat:8.0\",\"imagePullPolicy\":\"Always\",\"name\":\"tomcat\",\"ports\":[{\"containerPort\":7500}],\"resizePolicy\":[{\"resourceName\":\"cpu\",\"restartPolicy\":\"NotRequired\"}],\"resources\":{\"limits\":{\"cpu\":1},\"requests\":{\"cpu\":1}}}]}}\n",
        "kubernetes.io/config.seen": "2024-03-12T05:01:26.046824676Z",
        "kubernetes.io/config.source": "api"
      }
    }
  ],
  "TimeStamp": "0001-01-01T00:00:00Z"
}

Kubernetes version

master version

Cloud provider

Common problem so it applies to any cloud providers.

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

Metadata

Metadata

Assignees

Labels

kind/bugCategorizes issue or PR as related to a bug.priority/important-longtermImportant over the long term, but may not be staffed and/or may need multiple releases to complete.sig/nodeCategorizes an issue or PR as relevant to SIG Node.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type

Projects

Status

Done

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions