Skip to content

[FG:InPlacePodVerticalScaling] Resizing pod gets stuck if limit is not configured #126388

@hshiina

Description

@hshiina

What happened?

Resizing CPU requests gets stuck in InProgress if the CPU limit is not configured:

$ kubectl patch pod resize-pod --patch '{"spec":{"containers":[{"name":"resize-container", "resources":{"requests":{"cpu":"300m"}}}]}}'
pod/resize-pod patched
$ sleep 300
$ kubectl get pod resize-pod -o jsonpath='spec: {.spec.containers[0].resources}{"\nallocatedResources: "}{.status.containerStatuses[0].allocatedResources}{"\nstatus: "}{.status.containerStatuses[0].resources}{"\nresize: "}{.status.resize}{"\n"}'
spec: {"requests":{"cpu":"300m","memory":"200Mi"}}
allocatedResources: {"cpu":"300m","memory":"200Mi"}
status: {"requests":{"cpu":"200m","memory":"200Mi"}}
resize: InProgress

The internal behavior in kubelet varies depending on other resource limits configuration.

  • If the limit for another resource is also not configured, no action is computed at computePodResizeAction():
    Jul 26 13:43:09 kind-control-plane kubelet[708]: I0726 13:43:09.494698     708 kuberuntime_manager.go:1051] "computePodActions got for pod" podActions="KillPod: false, CreateSandbox: false, UpdatePodResources: false, Attempt: 0, InitContainersToStart: [], ContainersToStart: [], EphemeralContainersToStart: [],ContainersToUpdate: map[], ContainersToKill: map[]" pod="default/resize-pod"
    
    This looks caused because container.Resources.Limits == nil:
    if container.Resources.Limits == nil || len(pod.Status.ContainerStatuses) == 0 {
    return true
    }
  • If the limit for another resource is configured, doPodResizeAction() gets failed:
    Jul 26 14:04:23 kind-control-plane kubelet[708]: E0726 14:04:23.063108     708 kuberuntime_manager.go:750] "podResources.CPUQuota or podResources.CPUShares is nil" pod="resize-pod"
    
    This message is logged here:
    if podResources.CPUQuota == nil || podResources.CPUShares == nil {
    klog.ErrorS(nil, "podResources.CPUQuota or podResources.CPUShares is nil", "pod", pod.Name)
    result.Fail(fmt.Errorf("podResources.CPUQuota or podResources.CPUShares is nil for pod %s", pod.Name))
    return

    This is caused because CPUQuota is not set if the CPU limit is not configured:
    if cpuLimitsDeclared {
    result.CPUQuota = &cpuQuota
    result.CPUPeriod = &cpuPeriod
    }

There is a similar problem in resizing memory limits. See the comment below: #126388 (comment)

What did you expect to happen?

This resizing should not get stuck in InProgress though I’m not sure whether this should be actuated or rejected yet.

How can we reproduce it (as minimally and precisely as possible)?

  1. Enable InPlacePodVerticalScaling.

  2. Create a pod where CPU request is configured and CPU limit is not configured:

    pod with no resource limits:

    apiVersion: v1
    kind: Pod
    metadata:
      creationTimestamp: null
      labels:
        run: resize-pod
      name: resize-pod
    spec:
      containers:
      - image: busybox
        name: resize-container
        command:
          - sh
          - -c
          - trap "exit 0" SIGTERM; while true; do sleep 1; done
        resources:
          requests:
            cpu: 200m
            memory: 200Mi
        resizePolicy:
        - resourceName: cpu
          restartPolicy: NotRequired
        - resourceName: memory
          restartPolicy: NotRequired
      restartPolicy: Always
    

    pod with memory limit:

    apiVersion: v1
    kind: Pod
    metadata:
      creationTimestamp: null
      labels:
        run: resize-pod
      name: resize-pod
    spec:
      containers:
      - image: busybox
        name: resize-container
        command:
          - sh
          - -c
          - trap "exit 0" SIGTERM; while true; do sleep 1; done
        resources:
          requests:
            cpu: 200m
            memory: 200Mi
          limits:
            memory: 200Mi
        resizePolicy:
        - resourceName: cpu
          restartPolicy: NotRequired
        - resourceName: memory
          restartPolicy: NotRequired
      restartPolicy: Always
    
  3. Patch the pod to update its CPU request:

    $ kubectl patch pod resize-pod --patch '{"spec":{"containers":[{"name":"resize-container", "resources":{"requests":{"cpu":"300m"}}}]}}'
    
  4. Wait for more than two minutes in order to confirm this issue is not the known issue.

  5. See the pod resize status:

    $ kubectl get pod resize-pod -o jsonpath='spec: {.spec.containers[0].resources}{"\nallocatedResources: "}{.status.containerStatuses[0].allocatedResources}{"\nstatus: "}{.status.containerStatuses[0].resources}{"\nresize: "}{.status.resize}{"\n"}'
    spec: {"requests":{"cpu":"300m","memory":"200Mi"}}
    allocatedResources: {"cpu":"300m","memory":"200Mi"}
    status: {"requests":{"cpu":"200m","memory":"200Mi"}}
    resize: InProgress
    

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
Client Version: v1.30.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.0

Cloud provider

N/A

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

Metadata

Metadata

Labels

kind/bugCategorizes issue or PR as related to a bug.sig/nodeCategorizes an issue or PR as relevant to SIG Node.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type

Projects

Status

Done

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions