-
Notifications
You must be signed in to change notification settings - Fork 41.6k
[FG:InPlacePodVerticalScaling] Move resize allocation logic out of the sync loop #131612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FG:InPlacePodVerticalScaling] Move resize allocation logic out of the sync loop #131612
Conversation
|
Skipping CI for Draft Pull Request. |
433aa61 to
d52210d
Compare
9b16320 to
42ececf
Compare
43b8d9d to
667681a
Compare
667681a to
c925243
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good, but I want to trace through the resize flows again before approving. Will do tomorrow.
| // - Non-resizable containers: non-restartable init containers, ephemeral containers | ||
| // - Non-resizable resources: only CPU & memory are resizable | ||
| // - Non-running containers: they will be sized correctly when (re)started | ||
| func (m *manager) CheckPodResizeInProgress(allocatedPod *v1.Pod, podStatus *kubecontainer.PodStatus) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave this here for now, but maybe we should move all the actuated resources & in-progress logic out of allocation manager in a follow-up PR. It can almost all go to the runtime. That would also break the circular dependency with the kuberuntime. WDYT?
I need to think about this more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent far too long thinking about this, but I like the idea of making actuated resources an implementation detail of kuberuntime. Once this PR merges, I can take this as a follow-up task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have anything against it, and it makes sense as a mental model, but you'd have to either plumb the status manager down into kuberuntime or bubble up the result somehow? I have a vague recollection of trying something like this elsewhere and running into depedency issues but I'm sure you'll have a better solution than me, good luck :)
|
|
||
| // If the pod resize status has changed, we need to update the pod status. | ||
| newResizeStatus := m.statusManager.GetPodResizeConditions(uid) | ||
| if !apiequality.Semantic.DeepEqual(oldResizeStatus, newResizeStatus) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now that handlePodResourcesResize doesn't set the InProgress condition, if the resize was accepted on the first try (no pending condition), wouldn't old & new be the same (empty), and skip the sync?
Maybe just make handlePodResourcesResize return whether it did a resize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also it looks like this method could use a unit test. Can you add a test, and make sure it would catch this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦 sorry, thanks for the catch
The tests in allocation_manager_test.go TestHandlePodResourcesResize and TestHandlePodResourcesResizeWithSwap cover handlePodResourcesResize indirectly by calling RetryPendingResizes. I updated these tests to also ensure triggerPodSync is called as needed (and checked that it catches this case)
| // Clear any errors that may have been surfaced from a previous resize. The condition will be | ||
| // added back as needed in the defer block, but this prevents old errors from being preserved. | ||
| // added back in the sync loop but this prevents old errors from being preserved. | ||
| m.statusManager.ClearPodResizeInProgressCondition(pod.UID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the pod is already syncing (doesn't have the updated allocation), but hasn't sent the status yet? It will end up clearing the InProgress condition, even though the resize is InProgress.
The window to hit this race condition is small, and the consequence doesn't seem that bad (it should resync quickly and re-add the condition if needed), and this seems like it would be annoying to fix. If you don't see a simple solution, I think it would be OK to file an issue for it, and leave a TODO linking to that issue. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all the ways I can think of to resolve this are a bit controversial, so I filed #132851 and will think harder about it after this merges
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated this to just clear the errors instead of clearing the condition - this should be equivalent to clearing the condition and unconditionally adding it back
c31b377 to
f4076c7
Compare
f4076c7 to
b89ab30
Compare
|
/retest |
…en a new resize is allocated
|
/lgtm woohoo! |
|
LGTM label has been added. Git tree hash: f43f8f79cd3dcc82115f73c2e2d1d13b64ce1a11
|
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: natasha41575, tallclair The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@natasha41575: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
This moves the in-place pod resize allocation logic out of the sync loop. This PR is organized into the following 5 commits:
HandlePodResourcesResizeunit tests and move them into theallocationpackageIsPodResizeInfeasibleandIsPodResizeDeferredto the status_manager.The intention of this PR is to reattempt pending resizes:
Special notes for your reviewer
Intended follow-ups:
Which issue(s) this PR fixes:
Does not yet fix it, but this is part of #116971.
Does this PR introduce a user-facing change?
/sig node
/priority important-soon
/triage accepted
/cc @tallclair
TODO:
retry deferred resizes in HandlePodCleanupskl.podManager.GetPods), and the pod manager is not updated in HandlePodCleanups, so I don't think retrying the pending resizes here is necessary!kl.sourcesReady.AllReady())