Skip to content

Conversation

@shiya0705
Copy link
Contributor

@shiya0705 shiya0705 commented Feb 24, 2025

What type of PR is this?
/kind feature

What this PR does / why we need it:
Add events when pod resize complete

Which issue(s) this PR fixes:
Fixes #127172

Special notes for your reviewer:
N/A

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
N/A

Does this PR introduce a user-facing change?
Event is a user-facing change

Added detailed event for in-place pod vertical scaling completed, improving cluster management and debugging

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Feb 24, 2025
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Feb 24, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 24, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @shiya0705. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 24, 2025
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Feb 24, 2025
@googs1025
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 24, 2025


func resizeMsgGenerate(allocatedPod *v1.Pod, podStatus *kubecontainer.PodStatus) string {
var msg string = "}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we just use "}" ? This might make the return value a bit weird. 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed code, var msg string = "}" is compared with msg = fmt.Sprintf("Pod resize complete, {") + msg

Copy link
Member

@googs1025 googs1025 Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that this is not a correct way to initialize. Can we use like [] string to get msg and Join operation at the end instead of this string concatenation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comments, I change the string definition.

@shiya0705 shiya0705 force-pushed the Pod_Resize_complete_event branch from 8159cfe to 1435a12 Compare February 24, 2025 08:34
@shiya0705 shiya0705 closed this Feb 24, 2025
@shiya0705 shiya0705 deleted the Pod_Resize_complete_event branch February 24, 2025 08:54
@shiya0705 shiya0705 changed the title first modified version Pod resize complete event Feb 24, 2025
@shiya0705 shiya0705 changed the title Pod resize complete event Add Pod resize complete event Feb 24, 2025
@shiya0705
Copy link
Contributor Author

/retest

@shiya0705 shiya0705 restored the Pod_Resize_complete_event branch February 24, 2025 09:18
@shiya0705 shiya0705 force-pushed the Pod_Resize_complete_event branch from 313ae74 to 745bc7d Compare July 17, 2025 14:25
Copy link
Contributor

@natasha41575 natasha41575 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this mostly LGTM. Thank you :)

/assign @tallclair

} else {
// Verify pod resize Inprogress condition is cleared
gotResizeConditions := am.(*manager).statusManager.GetPodResizeConditions(pod.UID)
for _, c := range gotResizeConditions {
Copy link
Contributor

@natasha41575 natasha41575 Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Is it sufficient to check if require.Len(t, gotResizeConditions, 0)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change require.Nil(t, c.Type, "PodResizeInProgress condition type should be nil") to require.Len(t, gotResizeConditions, 0) is better,applied to the new code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applied

Copy link
Member

@tallclair tallclair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly just nits

Comment on lines 242 to 239
podutil.VisitContainers(&resizedPod.Spec, podutil.InitContainers|podutil.Containers,
func(allocatedContainer *v1.Container, containerType podutil.ContainerType) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(optional) podutil.ContainerIter would be a little cleaner here

Suggested change
podutil.VisitContainers(&resizedPod.Spec, podutil.InitContainers|podutil.Containers,
func(allocatedContainer *v1.Container, containerType podutil.ContainerType) bool {
for allocatedContainer, containerType := range podutil.ContainerIter(&resizedPod.Spec, podutil.InitContainers|podutil.Containers) {

Copy link
Contributor Author

@shiya0705 shiya0705 Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To less the modification, I Keep podutil.VisitContainers here~


podutil.VisitContainers(&resizedPod.Spec, podutil.InitContainers|podutil.Containers,
func(allocatedContainer *v1.Container, containerType podutil.ContainerType) bool {
allocatedResources, exists := m.GetContainerResourceAllocation(resizedPod.UID, allocatedContainer.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pod is already using the allocated values here, so you can just use the container resources directly (allocatedResources := container.Resources). In fact, this is better, since the resize we're processing here is using a snapshot of the allocated resources, but it's possible that a new resize was admitted in parallel.

oldPodResizeInProgressCondition: false,
expectHasResize: true,
}, {
name: "several containers",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for this case, omit some resources in each container, e.g.

  • one just has CPU requests & limits
  • one just has memory requests & limits
  • one just has CPU & memory requests

it would also be good to add a test with init containers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, consider adding a separate test for emitPodResizeCompletionEvent that has the more complete coverage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to simplify modifications, I just changed "several container" case to cover these three cases, both have init and normal container. I think it could cover the test case for emitPodResizeCompletionEvent. If there is other request
, we can add a separate func to test it.
- one just has CPU requests & limits (sidecar init container)
- one just has memory requests & limits (container)
- one just has CPU & memory requests (container)

@shiya0705 shiya0705 force-pushed the Pod_Resize_complete_event branch 5 times, most recently from 99ef8d1 to 89cd35b Compare July 18, 2025 12:21
if exists {
allocation := containerAllocation{
Name: allocatedContainer.Name,
Resources: allocatedResources,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per Tim's suggestion (#130387 (comment)):

Suggested change
Resources: allocatedResources,
Resources: allocatedContainer.Resources,

You already have the allocated container, so no need to call m.GetContainerResourceAllocation to fetch allocatedResouces.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I I misunderstood his meaning before, will change it.

podResizeMsgDetailsJSON, err := json.Marshal(podResizeSource)
if err != nil {
klog.ErrorS(err, "Failed to serialize resource summary", "pod", format.Pod(allocatedPod))
return "Pod resize completed: "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return "Pod resize completed: "
return "Pod resize completed"

why the :? Are we expecting something to be appended to this in the error case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you are correct, I deleted it

@shiya0705
Copy link
Contributor Author

/test pull-kubernetes-unit-windows-master

@natasha41575
Copy link
Contributor

Don't worry about the failing windows test, it's failing on one of my PRs too so I think it is unrelated, and it won't block the PR from merging.

Ping me when you push your changes, will do a final pass

@shiya0705 shiya0705 force-pushed the Pod_Resize_complete_event branch from a541808 to 3a2ea0e Compare July 18, 2025 15:42
@shiya0705 shiya0705 force-pushed the Pod_Resize_complete_event branch from 3a2ea0e to 2256f57 Compare July 18, 2025 15:44
@shiya0705
Copy link
Contributor Author

@natasha41575 Thank you so much, I have changed and pushed the code, please help to check

Copy link
Contributor

@natasha41575 natasha41575 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

ping @tallclair for approval

Thanks so much for being patient and working with us :)

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 18, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: cc823c76e3632ab9e2fa0dcae676666135165dd1

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Jul 18, 2025

@shiya0705: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-e2e-kind-ipv6 780529e link true /test pull-kubernetes-e2e-kind-ipv6
pull-kubernetes-unit-windows-master 2256f57 link false /test pull-kubernetes-unit-windows-master

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@tallclair
Copy link
Member

/approve

+1 for what Natasha said, thanks for being patient and sticking with this!

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: shiya0705, tallclair

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 18, 2025
@tallclair
Copy link
Member

/retest-required

@SergeyKanzhelev SergeyKanzhelev moved this from Triage to Needs Approver in SIG Node: code and documentation PRs Jul 18, 2025
@k8s-ci-robot k8s-ci-robot merged commit 5c219a7 into kubernetes:master Jul 18, 2025
13 of 14 checks passed
@github-project-automation github-project-automation bot moved this from Needs Approver to Done in SIG Node: code and documentation PRs Jul 18, 2025
@github-project-automation github-project-automation bot moved this from Tracked to Done in [sig-release] Bug Triage Jul 18, 2025
@github-project-automation github-project-automation bot moved this from In Progress to Done in SIG Apps Jul 18, 2025
@shiya0705
Copy link
Contributor Author

shiya0705 commented Jul 20, 2025

@tallclair @natasha41575 Thank you so much for your patient review and feedback. I've learned a lot from all of you. Hope to make more contribution to this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/code-generation area/e2e-test-framework Issues or PRs related to refactoring the kubernetes e2e test framework area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on. wg/device-management Categorizes an issue or PR as relevant to WG Device Management.

Projects

Archived in project
Archived in project
Archived in project
Archived in project

Development

Successfully merging this pull request may close these issues.

[FG:InPlacePodVerticalScaling] Emit a events when resize status changes