Skip to content

Conversation

@natasha41575
Copy link
Contributor

@natasha41575 natasha41575 commented May 15, 2025

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

Move pod admission and resize logic into the allocation manager. This is broken out of this discussion: #131612 (comment). My goal in this one was to change as little business logic as possible, just trying to untangle some dependencies in preparation for #131612.

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels May 15, 2025
@k8s-ci-robot k8s-ci-robot added area/kubelet area/test sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels May 15, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label May 15, 2025
@natasha41575 natasha41575 force-pushed the move-handle-pod-additions branch 4 times, most recently from e4f98db to 2ddac8f Compare May 15, 2025 23:23
@natasha41575
Copy link
Contributor Author

/assign @tallclair
/sig node
/priority important-soon

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels May 15, 2025
@natasha41575 natasha41575 moved this from Triage to PRs - Needs Reviewer in SIG Node CI/Test Board May 15, 2025
Copy link
Member

@tallclair tallclair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for splitting this out. I didn't quite get through everything, will take another pass tomorrow.

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels May 16, 2025
@natasha41575 natasha41575 force-pushed the move-handle-pod-additions branch from 8bd6cd3 to e59802a Compare May 19, 2025 19:38
@natasha41575
Copy link
Contributor Author

/retest

@natasha41575 natasha41575 force-pushed the move-handle-pod-additions branch from e59802a to c4421fc Compare May 20, 2025 18:15
@natasha41575
Copy link
Contributor Author

/retest

@tallclair
Copy link
Member

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 30, 2025
Copy link
Member

@tallclair tallclair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just a nit & question. Sorry for the review delay!

kubelet.admitHandlers.AddPodAdmitHandler(lifecycle.NewPredicateAdmitHandler(kubelet.getNodeAnyWay, lifecycle.NewAdmissionFailureHandlerStub(), kubelet.containerManager.UpdatePluginResources))
handlers = append(handlers, lifecycle.NewPredicateAdmitHandler(kubelet.getNodeAnyWay, lifecycle.NewAdmissionFailureHandlerStub(), kubelet.containerManager.UpdatePluginResources))

if !excludeAdmitHandlers {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed?

Copy link
Contributor Author

@natasha41575 natasha41575 Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In its current form, TestHandlePluginResources clears all the existing admit handlers and adds a custom one:

kl.admitHandlers = lifecycle.PodAdmitHandlers{}
kl.admitHandlers.AddPodAdmitHandler(lifecycle.NewPredicateAdmitHandler(kl.getNodeAnyWay, lifecycle.NewAdmissionFailureHandlerStub(), updatePluginResourcesFunc))

To preserve the original intent of the test, the test needs a mechanism to do the same thing; in this case instead of clearing the existing admit handlers I gave it a mechanism to create a kubelet that doesn't have any to begin with. Without this, some of the handlers added by default end up changing the admission results of this test.

if !kl.podWorkers.IsPodTerminationRequested(pod.UID) && !podutil.IsPodPhaseTerminal(pod.Status.Phase) {
// We failed pods that we rejected, so allocatedPods include all admitted
// pods that are alive.
allocatedPods := kl.getAllocatedPods()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized - I think there's a race condition here? If a resize happens between here and calling allocationManager.AddPod.

Allocation manager is already the source of truth for the allocations though, so I think the fix should be to simply pass the active pods to allocationManager.AddPod, and handle the conversion to the allocated resources within the allocation manager while holding the lock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, thanks for the catch

@natasha41575 natasha41575 requested a review from tallclair June 2, 2025 17:58
@k8s-ci-robot
Copy link
Contributor

@natasha41575: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-unit-windows-master 9b8d7ce link false /test pull-kubernetes-unit-windows-master
pull-kubernetes-e2e-capz-windows-master 9b8d7ce link false /test pull-kubernetes-e2e-capz-windows-master

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@tallclair
Copy link
Member

/lgtm
/approve

The mutex changes in the add pod flow make me a little nervous, but I don't see any other issues. Let's proceed, so we can unblock the follow-up PRs.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 2, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 00d31506ef8c0d5a411f1b713541a6c0c58d3205

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: natasha41575, tallclair

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 2, 2025
@k8s-ci-robot k8s-ci-robot merged commit 901d124 into kubernetes:master Jun 2, 2025
15 of 17 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.34 milestone Jun 2, 2025
@github-project-automation github-project-automation bot moved this from PRs - Needs Reviewer to Done in SIG Node CI/Test Board Jun 2, 2025
@natasha41575 natasha41575 deleted the move-handle-pod-additions branch June 2, 2025 22:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Projects

Development

Successfully merging this pull request may close these issues.

4 participants