Skip to content

Conversation

@vinaykul
Copy link
Member

What type of PR is this?

/kind bug

What this PR does / why we need it: Overwriting pod object with with updated pod object can cause potential data race conditions if the original object is being accessed elsewhere. This change fixes the issue.

Which issue(s) this PR fixes:

Fixes #116694

Special notes for your reviewer:

Does this PR introduce a user-facing change? no


Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


/assign @smarterclayton

@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 17, 2023
@vinaykul
Copy link
Member Author

/sig node
/release-note-none

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. area/kubelet and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 17, 2023
@vinaykul
Copy link
Member Author

/assign @SergeyKanzhelev

@vinaykul vinaykul force-pushed the restart-free-pod-vertical-scaling-podmutation-fix branch from 8eae126 to 6591d15 Compare March 17, 2023 04:18
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 17, 2023
@vinaykul vinaykul force-pushed the restart-free-pod-vertical-scaling-podmutation-fix branch from 6591d15 to d49b936 Compare March 17, 2023 08:49
@vinaykul vinaykul force-pushed the restart-free-pod-vertical-scaling-podmutation-fix branch from d49b936 to f66e884 Compare March 17, 2023 08:51
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 17, 2023
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 17, 2023
@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 17, 2023
@vinaykul
Copy link
Member Author

/assign @liggitt

@vinaykul
Copy link
Member Author

/test pull-kubernetes-e2e-gce-cos-alpha-features
/test pull-kubernetes-e2e-inplace-pod-resize-containerd-main-v2

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 18, 2023
@vinaykul
Copy link
Member Author

/test pull-kubernetes-e2e-gce-cos-alpha-features
/test pull-kubernetes-e2e-inplace-pod-resize-containerd-main-v2

kl.podManager.UpdatePod(pod)
kl.statusManager.SetPodStatus(pod, pod.Status)
kl.podManager.UpdatePod(updatedPod)
kl.statusManager.SetPodStatus(updatedPod, updatedPod.Status)
Copy link
Contributor

@smarterclayton smarterclayton Mar 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you expecting the SyncPod loop to continue with the updated pod values? If so you need to return updatedPod here and ensure this is called before other pod actions.

Also note - calling UpdatePod like this isn't safe - if another actor updates the pod in between this call and the time you resync, the value of pod in pod manager will be overwritten by that value (may be older) - or worse, you might overwrite a newer value of pod spec that was delivered. That might cause (for instance) a pod that was previously completed to be seen as newly created, and the kubelet to run it twice.

That is the root of my concern here is that you can't use podManager as a store of temporary state (the kubelet uses it as a cache of upstream sources of truth). If you need to create a store of truth for "changes that have been requested AND accepted by the kubelet" we need to create a new map to hold them, which probably involves designing this into the state model of the kubelet as I mentioned.

The approach here will be enough for alpha but is probably relying on the update from the API actually propagating quickly to be correct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have modified the code to return updated pod, it looks like a fairly safe change for now. PTAL.
The implementation does not rely on quick propagation, but it is desired.

My initial approach in the KEP was to not do this in SyncPod at all and do this in HandlePodUpdates coupled with event-based retries for Deferred resizes (e.g. revisit Deferred resizes upon deletion. IMHO it makes sense here because there is fair chance a resize try will be successful upon after other resource-using pods departed the node vs periodic retries). IIRC there were concerns about the reliability and it was conventional/accepted for K8s to retry even when it may not change the result. We also discussed this point during early reviews but opted to have a beta TODO to focus on this change, tracking issue #109547

What are your thoughts on my initial suggested approach (HandlePodUpdates + event based retries)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #116826 for addressing in the next version

Copy link
Member Author

@vinaykul vinaykul Mar 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the root of my concern here is that you can't use podManager as a store of temporary state (the kubelet uses it as a cache of upstream sources of truth). If you need to create a store of truth for "changes that have been requested AND accepted by the kubelet" we need to create a new map to hold them, which probably involves designing this into the state model of the kubelet as I mentioned.

Yes, I plan to revisit computePodResizeAction for this purpose. Another possibility is to borrow from the original implementation of the initially approved KEP and patch the PodStatus and proceed only upon successful patch but that has the downside of an additional call to the API server, which could get rate-limited. This approach intends to leverage pod status as persistent, reliable source of truth ( xref: KEP 2527 )

@liggitt liggitt added kind/regression Categorizes issue or PR as related to a regression from a prior release. release-blocker labels Mar 21, 2023
@cici37
Copy link
Contributor

cici37 commented Mar 21, 2023

This Issue/PR has been marked as release blocker. Please have the PR merged by eod today otherwise the feature has to be reverted. For the Issue/PR owner: please have a revert/feature disable PR open today in case the fix failed to get in. Thank you! See the announcement here

-- Kubernetes 1.27 release manager

@liggitt liggitt removed their assignment Mar 21, 2023
Copy link
Member

@SergeyKanzhelev SergeyKanzhelev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 22, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 26c4109017e3d06128992ca37a5c83f2beb22208

@dims
Copy link
Member

dims commented Mar 22, 2023

/approve

Looks like we have enough for alpha and there is a feature gate to protect things. thanks @SergeyKanzhelev for opening up a follow up issue.

@dims
Copy link
Member

dims commented Mar 22, 2023

/retest

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dims, SergeyKanzhelev, spatecon, vinaykul

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 22, 2023
@k8s-ci-robot k8s-ci-robot merged commit c7cc788 into kubernetes:master Mar 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/regression Categorizes issue or PR as related to a regression from a prior release. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-blocker release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Mutation of a cached object in kubelet through in place resizing of pods

8 participants