Skip to content

Conversation

@sreeram-venkitesh
Copy link
Member

What type of PR is this?

/kind feature

What this PR does / why we need it:

Updates the version skew strategy for InPlacePodVerticalScaling for beta graduation.

Which issue(s) this PR fixes:

Fixes #117767

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Version skew strategy update for InPlacePodVerticalScaling for beta graduation.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/issues/1287

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Oct 18, 2024
@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 18, 2024
@sreeram-venkitesh
Copy link
Member Author

@tallclair Please let me know if I’m in the right direction. Some tests are breaking, I’m still working on fixing them. This is how I tested the change:

  • Create kind cluster with 3 nodes:
    • control plane with kindest/node:latest image and InPlacePodVerticalScaling enabled
    • worker # 1 with kindest/node:v1.31.0 with InPlacePodVerticalScaling disabled
    • worker # 2 with kindest/node:v1.30.0 with InPlacePodVerticalScaling disabled

Previous to these changes, I was able to update Pod's resources. In v1.30, the Pod gets restarted, and in v1.31, the Pod gets updated, but its not restarted so the resource update is not reflected until the next restart. After the changes I made in this PR, Pod resource update is not allowed for Pods created on both the worker nodes with InPlacePodVerticalScaling disabled. The validation checks if the Pod status has resources defined before update and I’ve used the same logic you wrote in the KEP.

This is how the error looks like right now when you try to update a Pod resource without enabling the feature gate. I’m also getting the error for how Resources cannot be updated along with this. I’m assuming this is because cmp.Diff(oldPod.Spec, mungedPodSpec) is non-empty. I’m looking into this as well. Also the string Pod running on node without IPPVS enabled may not be updated is a placeholder, I will change this.

# pods "nginx-pod" was not valid:
# * spec: Forbidden: Pod running on node without IPPVS enabled may not be updated
# * spec: Forbidden: pod updates may not change fields other than `spec.containers[*].image`,`spec.initContainers[*].image`,`spec.activeDeadlineSeconds`,`spec.tolerations` (only additions to existing tolerations),`spec.terminationGracePeriodSeconds` (allow it to be set to 1 if it was previously negative)
#   core.PodSpec{
#       Volumes:        {{Name: "kube-api-access-njhml", VolumeSource: {Projected: &{Sources: {{ServiceAccountToken: &{ExpirationSeconds: 3607, Path: "token"}}, {ConfigMap: &{LocalObjectReference: {Name: "kube-root-ca.crt"}, Items: {{Key: "ca.crt", Path: "ca.crt"}}}}, {DownwardAPI: &{Items: {{Path: "namespace", FieldRef: &{APIVersion: "v1", FieldPath: "metadata.namespace"}}}}}}, DefaultMode: &420}}}},
#       InitContainers: nil,
#       Containers: []core.Container{
#               {
#                       ... // 6 identical fields
#                       EnvFrom: nil,
#                       Env:     nil,
#                       Resources: core.ResourceRequirements{
#                               Limits: nil,
#                               Requests: core.ResourceList{
#                                       s"cpu":    {i: {...}, s: "10m", Format: "DecimalSI"},
# -                                     s"memory": {i: resource.int64Amount{value: 104857600}, s: "100Mi", Format: "BinarySI"},
# +                                     s"memory": {i: resource.int64Amount{value: 105906176}, s: "101Mi", Format: "BinarySI"},
#                               },
#                               Claims: nil,
#                       },
#                       ResizePolicy:  {{ResourceName: s"cpu", RestartPolicy: "NotRequired"}, {ResourceName: s"memory", RestartPolicy: "NotRequired"}},
#                       RestartPolicy: nil,
#                       ... // 13 identical fields
#               },
#       },
#       EphemeralContainers: nil,
#       RestartPolicy:       "Always",
#       ... // 28 identical fields
#   }

@tallclair
Copy link
Member

/assign

Copy link
Member

@tallclair tallclair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the right track. Thanks!

Copy link
Member

@tallclair tallclair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good to go once you rebase it on the /resize subresource PR.

Have you manually tested the changes? If not, I can share some instructions for setting up a version-skewed Kind cluster to test locally.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 6, 2024
@sreeram-venkitesh sreeram-venkitesh force-pushed the 117767-in-place-pod-vertical-scaling-version-skew branch from f2b73d6 to 3d75c80 Compare November 6, 2024 11:53
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Nov 6, 2024
@sreeram-venkitesh
Copy link
Member Author

@tallclair I've rebased my changes and squashed everything into a single commit, PTAL!

Have you manually tested the changes?

I had tested the skew before the rebase as described in #128186 (comment). I will test it once again today.

@sreeram-venkitesh sreeram-venkitesh force-pushed the 117767-in-place-pod-vertical-scaling-version-skew branch from 3d75c80 to 303041d Compare November 6, 2024 13:43
Copy link
Member

@tallclair tallclair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Member

@tallclair tallclair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the unit test needs to be fixed too.

@tallclair
Copy link
Member

/triage accepted
/priority important-soon
/milestone v1.32

@k8s-ci-robot k8s-ci-robot added this to the v1.32 milestone Nov 6, 2024
@sreeram-venkitesh sreeram-venkitesh force-pushed the 117767-in-place-pod-vertical-scaling-version-skew branch from aaf1c65 to 385d2b1 Compare November 7, 2024 06:05
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Nov 7, 2024
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 7, 2024
@sreeram-venkitesh
Copy link
Member Author

@tallclair Since we're removing the feature gate testing from the unit tests, we don't really test for skew anywhere. The tests I had added relied on the feature gates to test the skew error message. Should I add a unit/e2e test for this case without using the feature gate? The test freeze and code freeze deadlines are on the same day this release cycle.

@knabben
Copy link
Member

knabben commented Nov 7, 2024

Hey @sreeram-venkitesh @tallclair
⚠️ Do we still intend to merge this for v1.32? Just a reminder that the code freeze is starting 02:00 UTC Friday November 8th 2024 (tomorrow :)). Please make sure the PR has both lgtm and approved labels before the code freeze. Thanks!

@sreeram-venkitesh
Copy link
Member Author

sreeram-venkitesh commented Nov 7, 2024

@knabben Yep, we're planning to get this merged before code freeze tomorrow. This is the last of the top priority PRs needed for InPlacePodVerticalScaling for v1.32.

@tallclair
Copy link
Member

Wait, where'd the unit tests go? We don't need to change the feature gate, since this function isn't checking the feature gate, but we still want to manually test the logic. I'd expect to see test cases with & without running containers, and with & without status resources for each.

@sreeram-venkitesh
Copy link
Member Author

I'll push them now.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Nov 7, 2024
@tallclair
Copy link
Member

/test pull-e2e-gci-gce-alpha-enabled-default

@tallclair
Copy link
Member

/lgtm

/assign @thockin
For validation approval.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 7, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: eb8fc6f71fd823c7041cffd6b8f5860092ef2c58

@tallclair
Copy link
Member

Test failures look like unrelated flakes. That suite isn't pr blocking.

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

/lgtm
/approve

allErrs = append(allErrs, field.Invalid(specPath, newPod.Status.QOSClass, "Pod QOS Class may not change as a result of resizing"))
}

if !isPodResizeRequestSupported(*oldPod) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ndixita you have similar logic (different reason, same result) - PTAL and see if we can join them, whoever merges last :)

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sreeram-venkitesh, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 7, 2024
@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

@k8s-ci-robot
Copy link
Contributor

@sreeram-venkitesh: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-e2e-gci-gce-alpha-enabled-default 851dbf2 link false /test pull-e2e-gci-gce-alpha-enabled-default

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@pacoxu
Copy link
Member

pacoxu commented Nov 8, 2024

/milestone v1.32
approved before code freeze
/skip

@k8s-ci-robot k8s-ci-robot merged commit 46b3d9b into kubernetes:master Nov 8, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Projects

Archived in project
Archived in project

Development

Successfully merging this pull request may close these issues.

[FG:InPlacePodVerticalScaling] Implement version skew handling for in-place pod resize

7 participants