Skip to content

Conversation

@natasha41575
Copy link
Contributor

@natasha41575 natasha41575 commented May 5, 2025

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

This moves the in-place pod resize allocation logic out of the sync loop. This PR is organized into the following 5 commits:

  1. Untangle the HandlePodResourcesResize unit tests and move them into the allocation package
  2. Add helper methods IsPodResizeInfeasible and IsPodResizeDeferred to the status_manager.
  3. Update the allocation_manager methods to hold all the control logic required for handling pod resize allocation + update the kubelet to no longer attempt to allocate pod resizes in the sync loop (and update unit tests accordingly).
  4. Update the allocation manager's unit tests to cover PushPendingResizes and RetryPendingResizes
  5. Skip pending resize evaluation if sources aren't ready (per discussion on the previous PR)

The intention of this PR is to reattempt pending resizes:

  • whenever HandlePodAdditions or HandlePodUpdates receives a resize request that it didn't already have,
  • upon deletion of another pod,
  • upon the successful actuation of another resize,
  • or periodically. This PR sets a timer for every 3 minutes, but we should probably think about if that is the right amount of time.

Special notes for your reviewer

Intended follow-ups:

  1. This PR is required for but does not include implementation of prioritized resizes. That is because the PR was already getting a bit too large to review, and because design for prioritized resizes is still pending (KEP-1287: Priority of Resize Requests enhancements#5266). This is also useful as its own standalone change without having prioritized resizes yet, but I left a TODO for that.
  2. Some cleanup (such as moving some unit tests around, unexporting functions that no longer need to be exported, removing some code that's not needed anymore etc), I left some of these things out of this PR to keep the size down

Which issue(s) this PR fixes:

Does not yet fix it, but this is part of #116971.

Does this PR introduce a user-facing change?

NONE

/sig node
/priority important-soon
/triage accepted
/cc @tallclair

TODO:

  • retry deferred resizes in HandlePodCleanups
    • I don't think anything in HandlePodCleanups affects the admission decision (but I could be wrong)? It looks like the admission decision depends on the pod manager as the source of truth (through kl.podManager.GetPods), and the pod manager is not updated in HandlePodCleanups, so I don't think retrying the pending resizes here is necessary
  • double check the logic in HandlePodAdditions and HandlePodUpdates is correct (maybe add unit tests covering resize cases)
  • allocation manager unit tests
  • need to fix an issue where even when the resize is deferred and not allocated or actuated, the pod status is showing updated allocated and actual resources
  • need to fix an issue where a pending resize that gets reverted does not have its pending condition cleared quickly enough
  • sanity check with running this e2e locally
  • there seems to be more latency than should be necessary in accepting a pending resize after another pod is scaled down to make room, want to investigate this (but this doesn't necessarily have to be blocking)
  • skip retry of pending resizes if sources aren't ready (!kl.sourcesReady.AllReady())
  • rebase on move pod admission and resize logic into the allocation manager #131801

@k8s-ci-robot k8s-ci-robot requested a review from tallclair May 5, 2025 16:33
@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. sig/node Categorizes an issue or PR as relevant to SIG Node. labels May 5, 2025
@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 5, 2025
@k8s-ci-robot k8s-ci-robot added area/kubelet area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels May 5, 2025
@natasha41575 natasha41575 changed the title Move resize logic [FG:InPlacePodVerticalScaling] Move resize allocation logic out of the sync loop May 5, 2025
@natasha41575 natasha41575 moved this from Triage to Work in progress in SIG Node: code and documentation PRs May 5, 2025
@natasha41575 natasha41575 moved this from Triage to Archive-it in SIG Node CI/Test Board May 5, 2025
@natasha41575 natasha41575 force-pushed the move-resize-logic branch 3 times, most recently from 433aa61 to d52210d Compare May 6, 2025 21:19
@natasha41575 natasha41575 force-pushed the move-resize-logic branch 2 times, most recently from 9b16320 to 42ececf Compare May 7, 2025 20:33
@natasha41575 natasha41575 marked this pull request as ready for review May 7, 2025 21:30
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 7, 2025
@natasha41575 natasha41575 force-pushed the move-resize-logic branch 3 times, most recently from 43b8d9d to 667681a Compare July 8, 2025 21:34
Copy link
Member

@tallclair tallclair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good, but I want to trace through the resize flows again before approving. Will do tomorrow.

// - Non-resizable containers: non-restartable init containers, ephemeral containers
// - Non-resizable resources: only CPU & memory are resizable
// - Non-running containers: they will be sized correctly when (re)started
func (m *manager) CheckPodResizeInProgress(allocatedPod *v1.Pod, podStatus *kubecontainer.PodStatus) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave this here for now, but maybe we should move all the actuated resources & in-progress logic out of allocation manager in a follow-up PR. It can almost all go to the runtime. That would also break the circular dependency with the kuberuntime. WDYT?

I need to think about this more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent far too long thinking about this, but I like the idea of making actuated resources an implementation detail of kuberuntime. Once this PR merges, I can take this as a follow-up task.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have anything against it, and it makes sense as a mental model, but you'd have to either plumb the status manager down into kuberuntime or bubble up the result somehow? I have a vague recollection of trying something like this elsewhere and running into depedency issues but I'm sure you'll have a better solution than me, good luck :)


// If the pod resize status has changed, we need to update the pod status.
newResizeStatus := m.statusManager.GetPodResizeConditions(uid)
if !apiequality.Semantic.DeepEqual(oldResizeStatus, newResizeStatus) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that handlePodResourcesResize doesn't set the InProgress condition, if the resize was accepted on the first try (no pending condition), wouldn't old & new be the same (empty), and skip the sync?

Maybe just make handlePodResourcesResize return whether it did a resize.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it looks like this method could use a unit test. Can you add a test, and make sure it would catch this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 sorry, thanks for the catch

The tests in allocation_manager_test.go TestHandlePodResourcesResize and TestHandlePodResourcesResizeWithSwap cover handlePodResourcesResize indirectly by calling RetryPendingResizes. I updated these tests to also ensure triggerPodSync is called as needed (and checked that it catches this case)

// Clear any errors that may have been surfaced from a previous resize. The condition will be
// added back as needed in the defer block, but this prevents old errors from being preserved.
// added back in the sync loop but this prevents old errors from being preserved.
m.statusManager.ClearPodResizeInProgressCondition(pod.UID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the pod is already syncing (doesn't have the updated allocation), but hasn't sent the status yet? It will end up clearing the InProgress condition, even though the resize is InProgress.

The window to hit this race condition is small, and the consequence doesn't seem that bad (it should resync quickly and re-add the condition if needed), and this seems like it would be annoying to fix. If you don't see a simple solution, I think it would be OK to file an issue for it, and leave a TODO linking to that issue. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all the ways I can think of to resolve this are a bit controversial, so I filed #132851 and will think harder about it after this merges

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated this to just clear the errors instead of clearing the condition - this should be equivalent to clearing the condition and unconditionally adding it back

@natasha41575
Copy link
Contributor Author

/retest

@tallclair
Copy link
Member

/lgtm
/approve

woohoo!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 9, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: f43f8f79cd3dcc82115f73c2e2d1d13b64ce1a11

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: natasha41575, tallclair

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 9, 2025
@k8s-ci-robot
Copy link
Contributor

@natasha41575: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-unit-windows-master 381b3f3 link false /test pull-kubernetes-unit-windows-master

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot merged commit 13dcec3 into kubernetes:master Jul 9, 2025
13 of 14 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.34 milestone Jul 9, 2025
@github-project-automation github-project-automation bot moved this from Archive-it to Done in SIG Node CI/Test Board Jul 9, 2025
@github-project-automation github-project-automation bot moved this from Needs Reviewer to Done in SIG Node: code and documentation PRs Jul 9, 2025
@natasha41575 natasha41575 deleted the move-resize-logic branch July 10, 2025 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Development

Successfully merging this pull request may close these issues.

3 participants