Skip to content

Conversation

@iholder101
Copy link
Contributor

@iholder101 iholder101 commented Apr 8, 2024

What type of PR is this?

/kind feature
/sig node

What this PR does / why we need it:

In-place pod resize feature adds Resources field to ContainerStatuses struct. This is used to report CPU and memory requests & limits configured for the containers.

After this PR, extended resources would also be reflected as part of ContainerStatuses[i].Resources.

Sample output:

Add a dummy extended resource to the node:

> k describe node | grep Allocatable -w -A5
Allocatable:
  cpu:                40
  dummy.com/dummy:    4  # <---- Extended resource
  ephemeral-storage:  1293695812Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0

Add a pod using this resource:

apiVersion: v1
kind: Pod
metadata:
  name: dummy-pod
spec:
  containers:
  - name: dummy-container
    image: k8s.gcr.io/pause:2.0
    resources:
      requests:
        cpu: 100m
        dummy.com/dummy: 1
      limits:
        dummy.com/dummy: 1

Create the pod and check its ContainerStatuses[i].Resources:

> k get pod -o yaml | grep containerstatus -i -A40
    containerStatuses:
    - allocatedResources:
        cpu: 100m
        dummy.com/dummy: "1"
      # ....
      resources:
        limits:
          dummy.com/dummy: "1"
        requests:
          cpu: 100m
          dummy.com/dummy: "1"

Which issue(s) this PR fixes:

Fixes #114159

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Added status for extended Pod resources within the `status.containerStatuses[].resources` field.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- KEP: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1287-in-place-update-pod-resources

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. kind/feature Categorizes issue or PR as related to a new feature. sig/node Categorizes an issue or PR as relevant to SIG Node. labels Apr 8, 2024
@k8s-ci-robot
Copy link
Contributor

Please note that we're already in Test Freeze for the release-1.30 branch. This means every merged PR will be automatically fast-forwarded via the periodic ci-fast-forward job to the release branch of the upcoming v1.30.0 release.

Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Mon Apr 8 01:42:04 UTC 2024.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/kubelet labels Apr 8, 2024
@k8s-ci-robot k8s-ci-robot requested review from dims and tallclair April 8, 2024 07:12
@iholder101
Copy link
Contributor Author

/test pull-kubernetes-e2e-inplace-pod-resize-containerd-main-v2

@iholder101
Copy link
Contributor Author

/cc @vinaykul

@k8s-ci-robot k8s-ci-robot requested a review from vinaykul April 8, 2024 07:13
@iholder101
Copy link
Contributor Author

iholder101 commented Apr 8, 2024

/priority important-longterm

@k8s-ci-robot k8s-ci-robot added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Apr 8, 2024
@iholder101
Copy link
Contributor Author

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 8, 2024
@sftim
Copy link
Contributor

sftim commented Apr 8, 2024

Changelog suggestion

-Extended resources to be reflected as part of the pod.Status.ContainerStatuses[i].Resources struct.
+Added status for extended Pod resources within the `status.containerStatuses[].resources` field.

1 similar comment
@sftim

This comment was marked as duplicate.

@iholder101
Copy link
Contributor Author

Changelog suggestion

-Extended resources to be reflected as part of the pod.Status.ContainerStatuses[i].Resources struct.
+Added status for extended Pod resources within the `status.containerStatuses[].resources` field.

Done!
Thanks for your feedback 👍

@vinaykul
Copy link
Member

@iholder101 Thanks for the PR! 😄

If it is not too much to ask, can you please extend the E2E tests test/e2e/node/pod_resize.go:doPodResizeTests() by adding a test case that requests and verifies extended resources?

Eyeballing the code, verifyPodAllocations() and verifyPodStatusResources() should already check for it but please confirm that it indeed verifies extended resources as well. I think the nodes need to be patched for extended resources .. a possible example for reference:

// Update each node to advertise 3 available extended resources
nodeCopy := node.DeepCopy()
nodeCopy.Status.Capacity[testExtendedResource] = resource.MustParse("5")
nodeCopy.Status.Allocatable[testExtendedResource] = resource.MustParse("5")
err := patchNode(ctx, cs, &node, nodeCopy)
framework.ExpectNoError(err)

@vinaykul
Copy link
Member

Also adding a few more sharp eyes for this PR.
/cc @mrunalp @bobbypage @derekwaynecarr @dchen1107

Copy link
Member

@vinaykul vinaykul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments..

}

convertCustomResources := func(inResources, outResources v1.ResourceList) {
for extendedResourceName, extendedResourceQuantity := range inResources {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resourceName, resourceQuantity sounds more appropriate instead of extendedResourceName, extendedResourceQuantity

extendedResourceName == v1.ResourceStorage || extendedResourceName == v1.ResourceEphemeralStorage {
continue
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove this blank line please

},
},
},
} {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a test case with only ExendedResources?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't "BestEffort QoSPod with extended resources" covering that already? Can you please elaborate?

@tallclair
Copy link
Member

Who is the intended consumer for these resources? I think the status is supposed to reflect the actual state of resources in-use, so it seems like if we're including extended resources there should be some sort of confirmation from device drivers that the resources are in fact in use.

Also, AFAICT this isn't including extended resources in the allocated resources. Should it?

@vinaykul
Copy link
Member

vinaykul commented May 26, 2024

Who is the intended consumer for these resources? I think the status is supposed to reflect the actual state of resources in-use, so it seems like if we're including extended resources there should be some sort of confirmation from device drivers that the resources are in fact in use.

Also, AFAICT this isn't including extended resources in the allocated resources. Should it?

@tallclair AFAIK extended resources are not managed by the CRI. It is used for scheduling and kubelet is admission gatekeeper. If a pod requesting extended resources is admitted and successfully started, it can be assumed that the pod's containers got the extended resources they requested. And since it is immutable (at this time), reflecting it back in status just like ephemeral-storage would be accurate.

The AllocatedResources part is a bit non-obvious. It gets checkpointed upon admission and resize:

if utilfeature.DefaultFeatureGate.Enabled(features.InPlacePodVerticalScaling) {
// To handle kubelet restarts, test pod admissibility using AllocatedResources values
// (for cpu & memory) from checkpoint store. If found, that is the source of truth.
podCopy := pod.DeepCopy()
kl.updateContainerResourceAllocation(podCopy)
// Check if we can admit the pod; if not, reject it.
if ok, reason, message := kl.canAdmitPod(activePods, podCopy); !ok {
kl.rejectPod(pod, reason, message)
continue
}
// For new pod, checkpoint the resource values at which the Pod has been admitted
if err := kl.statusManager.SetPodAllocation(podCopy); err != nil {
//TODO(vinaykul,InPlacePodVerticalScaling): Can we recover from this in some way? Investigate
klog.ErrorS(err, "SetPodAllocation failed", "pod", klog.KObj(pod))
}
} else {

And it is retrieved from checkpoint during APIStatus generation:
// AllocatedResources values come from checkpoint. It is the source-of-truth.
found := false
status.AllocatedResources, found = kl.statusManager.GetContainerResourceAllocation(string(pod.UID), cName)
if !(container.Resources.Requests == nil && container.Resources.Limits == nil) && !found {
// Log error and fallback to AllocatedResources in oldStatus if it exists
klog.ErrorS(nil, "resource allocation not found in checkpoint store", "pod", pod.Name, "container", cName)
if oldStatusFound {
status.AllocatedResources = oldStatus.AllocatedResources
}
}

I suppose you can see the fragility of this code! Some added context: node-local checkpointing was much debated topic during design discussions.. Dawn and I eventually capitulated in order to break the impasse and make progress. But at that time we didn’t have KEP https://github.com/kubernetes/enhancements/tree/master/keps/sig-architecture/2527-clarify-status-observations-vs-rbac and the precedent PR #119665. I feel we can drop all of the checkpointing code and rely on pod.Status but the APF aspect needs a closer look.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 10, 2024
@iholder101 iholder101 force-pushed the in-pod-vertical-scaling/extended-resources branch from d8383be to d8950a7 Compare August 28, 2024 12:29
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 28, 2024
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 28, 2024
Signed-off-by: Itamar Holder <iholder@redhat.com>
Signed-off-by: Itamar Holder <iholder@redhat.com>
Signed-off-by: Itamar Holder <iholder@redhat.com>
Signed-off-by: Itamar Holder <iholder@redhat.com>
Signed-off-by: Itamar Holder <iholder@redhat.com>
Signed-off-by: Itamar Holder <iholder@redhat.com>
Signed-off-by: Itamar Holder <iholder@redhat.com>
Signed-off-by: Itamar Holder <iholder@redhat.com>
Signed-off-by: Itamar Holder <iholder@redhat.com>
@iholder101 iholder101 force-pushed the in-pod-vertical-scaling/extended-resources branch from d8950a7 to 85068bc Compare August 28, 2024 13:30
@iholder101
Copy link
Contributor Author

@tallclair @vinaykul @SergeyKanzhelev

Sorry for the delay. I was out of office for a while.
Would you mind having another look?

@tallclair
Copy link
Member

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 16, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 61ff3111e0919afd45d849861589f775340c8f6f

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: iholder101, tallclair, vinaykul

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 16, 2024
@k8s-ci-robot k8s-ci-robot merged commit d67e654 into kubernetes:master Oct 17, 2024
12 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.32 milestone Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Development

Successfully merging this pull request may close these issues.

[FG:InPlacePodVerticalScaling] Verify extended resources are correctly reported in ContainerStatus Resources

6 participants