-
Notifications
You must be signed in to change notification settings - Fork 41.6k
Updated version skew strategy for InPlacePodVerticalScaling #128186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated version skew strategy for InPlacePodVerticalScaling #128186
Conversation
|
@tallclair Please let me know if I’m in the right direction. Some tests are breaking, I’m still working on fixing them. This is how I tested the change:
Previous to these changes, I was able to update Pod's resources. In v1.30, the Pod gets restarted, and in v1.31, the Pod gets updated, but its not restarted so the resource update is not reflected until the next restart. After the changes I made in this PR, Pod resource update is not allowed for Pods created on both the worker nodes with InPlacePodVerticalScaling disabled. The validation checks if the Pod status has resources defined before update and I’ve used the same logic you wrote in the KEP. This is how the error looks like right now when you try to update a Pod resource without enabling the feature gate. I’m also getting the error for how Resources cannot be updated along with this. I’m assuming this is because # pods "nginx-pod" was not valid:
# * spec: Forbidden: Pod running on node without IPPVS enabled may not be updated
# * spec: Forbidden: pod updates may not change fields other than `spec.containers[*].image`,`spec.initContainers[*].image`,`spec.activeDeadlineSeconds`,`spec.tolerations` (only additions to existing tolerations),`spec.terminationGracePeriodSeconds` (allow it to be set to 1 if it was previously negative)
# core.PodSpec{
# Volumes: {{Name: "kube-api-access-njhml", VolumeSource: {Projected: &{Sources: {{ServiceAccountToken: &{ExpirationSeconds: 3607, Path: "token"}}, {ConfigMap: &{LocalObjectReference: {Name: "kube-root-ca.crt"}, Items: {{Key: "ca.crt", Path: "ca.crt"}}}}, {DownwardAPI: &{Items: {{Path: "namespace", FieldRef: &{APIVersion: "v1", FieldPath: "metadata.namespace"}}}}}}, DefaultMode: &420}}}},
# InitContainers: nil,
# Containers: []core.Container{
# {
# ... // 6 identical fields
# EnvFrom: nil,
# Env: nil,
# Resources: core.ResourceRequirements{
# Limits: nil,
# Requests: core.ResourceList{
# s"cpu": {i: {...}, s: "10m", Format: "DecimalSI"},
# - s"memory": {i: resource.int64Amount{value: 104857600}, s: "100Mi", Format: "BinarySI"},
# + s"memory": {i: resource.int64Amount{value: 105906176}, s: "101Mi", Format: "BinarySI"},
# },
# Claims: nil,
# },
# ResizePolicy: {{ResourceName: s"cpu", RestartPolicy: "NotRequired"}, {ResourceName: s"memory", RestartPolicy: "NotRequired"}},
# RestartPolicy: nil,
# ... // 13 identical fields
# },
# },
# EphemeralContainers: nil,
# RestartPolicy: "Always",
# ... // 28 identical fields
# } |
|
/assign |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the right track. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good to go once you rebase it on the /resize subresource PR.
Have you manually tested the changes? If not, I can share some instructions for setting up a version-skewed Kind cluster to test locally.
f2b73d6 to
3d75c80
Compare
|
@tallclair I've rebased my changes and squashed everything into a single commit, PTAL!
I had tested the skew before the rebase as described in #128186 (comment). I will test it once again today. |
3d75c80 to
303041d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the unit test needs to be fixed too.
|
/triage accepted |
aaf1c65 to
385d2b1
Compare
|
@tallclair Since we're removing the feature gate testing from the unit tests, we don't really test for skew anywhere. The tests I had added relied on the feature gates to test the skew error message. Should I add a unit/e2e test for this case without using the feature gate? The test freeze and code freeze deadlines are on the same day this release cycle. |
|
Hey @sreeram-venkitesh @tallclair |
|
@knabben Yep, we're planning to get this merged before code freeze tomorrow. This is the last of the top priority PRs needed for InPlacePodVerticalScaling for v1.32. |
|
Wait, where'd the unit tests go? We don't need to change the feature gate, since this function isn't checking the feature gate, but we still want to manually test the logic. I'd expect to see test cases with & without running containers, and with & without status resources for each. |
|
I'll push them now. |
|
/test pull-e2e-gci-gce-alpha-enabled-default |
|
/lgtm /assign @thockin |
|
LGTM label has been added. Git tree hash: eb8fc6f71fd823c7041cffd6b8f5860092ef2c58
|
|
Test failures look like unrelated flakes. That suite isn't pr blocking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
/lgtm
/approve
| allErrs = append(allErrs, field.Invalid(specPath, newPod.Status.QOSClass, "Pod QOS Class may not change as a result of resizing")) | ||
| } | ||
|
|
||
| if !isPodResizeRequestSupported(*oldPod) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ndixita you have similar logic (different reason, same result) - PTAL and see if we can join them, whoever merges last :)
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: sreeram-venkitesh, thockin The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass. This bot retests PRs for certain kubernetes repos according to the following rules:
You can:
/retest |
|
@sreeram-venkitesh: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/milestone v1.32 |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Updates the version skew strategy for InPlacePodVerticalScaling for beta graduation.
Which issue(s) this PR fixes:
Fixes #117767
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: