Fix pod object update that may cause data race #116702

vinaykul · 2023-03-17T03:10:13Z

What type of PR is this?

/kind bug

What this PR does / why we need it: Overwriting pod object with with updated pod object can cause potential data race conditions if the original object is being accessed elsewhere. This change fixes the issue.

Which issue(s) this PR fixes:

Fixes #116694

Special notes for your reviewer:

Does this PR introduce a user-facing change? no

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

/assign @smarterclayton

vinaykul · 2023-03-17T03:10:33Z

/sig node
/release-note-none

vinaykul · 2023-03-17T03:59:12Z

/assign @SergeyKanzhelev

vinaykul · 2023-03-17T23:04:38Z

/assign @liggitt

pkg/kubelet/kubelet.go

vinaykul · 2023-03-18T16:01:46Z

/test pull-kubernetes-e2e-gce-cos-alpha-features
/test pull-kubernetes-e2e-inplace-pod-resize-containerd-main-v2

vinaykul · 2023-03-18T18:00:14Z

/test pull-kubernetes-e2e-gce-cos-alpha-features
/test pull-kubernetes-e2e-inplace-pod-resize-containerd-main-v2

smarterclayton · 2023-03-20T15:26:30Z

pkg/kubelet/kubelet.go

-	kl.podManager.UpdatePod(pod)
-	kl.statusManager.SetPodStatus(pod, pod.Status)
+	kl.podManager.UpdatePod(updatedPod)
+	kl.statusManager.SetPodStatus(updatedPod, updatedPod.Status)


Are you expecting the SyncPod loop to continue with the updated pod values? If so you need to return updatedPod here and ensure this is called before other pod actions.

Also note - calling UpdatePod like this isn't safe - if another actor updates the pod in between this call and the time you resync, the value of pod in pod manager will be overwritten by that value (may be older) - or worse, you might overwrite a newer value of pod spec that was delivered. That might cause (for instance) a pod that was previously completed to be seen as newly created, and the kubelet to run it twice.

That is the root of my concern here is that you can't use podManager as a store of temporary state (the kubelet uses it as a cache of upstream sources of truth). If you need to create a store of truth for "changes that have been requested AND accepted by the kubelet" we need to create a new map to hold them, which probably involves designing this into the state model of the kubelet as I mentioned.

The approach here will be enough for alpha but is probably relying on the update from the API actually propagating quickly to be correct.

I have modified the code to return updated pod, it looks like a fairly safe change for now. PTAL.
The implementation does not rely on quick propagation, but it is desired.

My initial approach in the KEP was to not do this in SyncPod at all and do this in HandlePodUpdates coupled with event-based retries for Deferred resizes (e.g. revisit Deferred resizes upon deletion. IMHO it makes sense here because there is fair chance a resize try will be successful upon after other resource-using pods departed the node vs periodic retries). IIRC there were concerns about the reliability and it was conventional/accepted for K8s to retry even when it may not change the result. We also discussed this point during early reviews but opted to have a beta TODO to focus on this change, tracking issue #109547

What are your thoughts on my initial suggested approach (HandlePodUpdates + event based retries)?

Filed #116826 for addressing in the next version

That is the root of my concern here is that you can't use podManager as a store of temporary state (the kubelet uses it as a cache of upstream sources of truth). If you need to create a store of truth for "changes that have been requested AND accepted by the kubelet" we need to create a new map to hold them, which probably involves designing this into the state model of the kubelet as I mentioned.

Yes, I plan to revisit computePodResizeAction for this purpose. Another possibility is to borrow from the original implementation of the initially approved KEP and patch the PodStatus and proceed only upon successful patch but that has the downside of an additional call to the API server, which could get rate-limited. This approach intends to leverage pod status as persistent, reliable source of truth ( xref: KEP 2527 )

cici37 · 2023-03-21T19:56:42Z

This Issue/PR has been marked as release blocker. Please have the PR merged by eod today otherwise the feature has to be reverted. For the Issue/PR owner: please have a revert/feature disable PR open today in case the fix failed to get in. Thank you! See the announcement here

-- Kubernetes 1.27 release manager

…d resources

SergeyKanzhelev

/lgtm

With the caveat discussed: https://kubernetes.slack.com/archives/C0BP8PW9G/p1679443165418009

k8s-ci-robot · 2023-03-22T00:48:36Z

LGTM label has been added.

Git tree hash: 26c4109017e3d06128992ca37a5c83f2beb22208

dims · 2023-03-22T00:57:03Z

/approve

Looks like we have enough for alpha and there is a feature gate to protect things. thanks @SergeyKanzhelev for opening up a follow up issue.

dims · 2023-03-22T00:57:14Z

/retest

k8s-ci-robot · 2023-03-22T00:57:26Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dims, SergeyKanzhelev, spatecon, vinaykul

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/kubelet/OWNERS~~ [dims]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot assigned smarterclayton Mar 17, 2023

k8s-ci-robot requested review from SergeyKanzhelev and dchen1107 March 17, 2023 03:11

k8s-ci-robot assigned SergeyKanzhelev Mar 17, 2023

vinaykul force-pushed the restart-free-pod-vertical-scaling-podmutation-fix branch from 8eae126 to 6591d15 Compare March 17, 2023 04:18

k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 17, 2023

vinaykul force-pushed the restart-free-pod-vertical-scaling-podmutation-fix branch from 6591d15 to d49b936 Compare March 17, 2023 08:49

Fix pod object update that may cause data race

f66e884

vinaykul force-pushed the restart-free-pod-vertical-scaling-podmutation-fix branch from d49b936 to f66e884 Compare March 17, 2023 08:51

k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 17, 2023

pacoxu mentioned this pull request Mar 17, 2023

kubelet: fix data races #116706

Merged

Explicitly return from checkpoint update failures. SyncPod will retry

358474b

k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 17, 2023

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 17, 2023

k8s-ci-robot assigned liggitt Mar 17, 2023

SergeyKanzhelev reviewed Mar 17, 2023

View reviewed changes

pkg/kubelet/kubelet.go Outdated Show resolved Hide resolved

Do not modify original pod object when processing pod resource resize

d753893

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 18, 2023

spatecon approved these changes Mar 19, 2023

View reviewed changes

smarterclayton reviewed Mar 20, 2023

View reviewed changes

liggitt added kind/regression Categorizes issue or PR as related to a regression from a prior release. release-blocker labels Mar 21, 2023

liggitt removed their assignment Mar 21, 2023

Return updatedPod if resize upon successful checkpointing of allocate…

f41702b

…d resources

SergeyKanzhelev approved these changes Mar 22, 2023

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 22, 2023

SergeyKanzhelev mentioned this pull request Mar 22, 2023

In-Place Update of Pod Resources - resizing of pod may race with other pod updates #116826

Closed

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 22, 2023

k8s-ci-robot merged commit c7cc788 into kubernetes:master Mar 22, 2023

SergeyKanzhelev mentioned this pull request Mar 22, 2023

added known issue for 1.27 release #116863

Merged

smarterclayton mentioned this pull request Mar 28, 2023

In place pod resizing should be designed into the kubelet config state loop, not alongside it #116971

Closed

pacoxu mentioned this pull request Sep 5, 2023

In-Place Update of Pod Resources kubernetes/enhancements#1287

Open

95 tasks

Fix pod object update that may cause data race #116702

Fix pod object update that may cause data race #116702

Uh oh!

Conversation

vinaykul commented Mar 17, 2023

What type of PR is this?

What this PR does / why we need it: Overwriting pod object with with updated pod object can cause potential data race conditions if the original object is being accessed elsewhere. This change fixes the issue.

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change? no

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Uh oh!

vinaykul commented Mar 17, 2023

Uh oh!

vinaykul commented Mar 17, 2023

Uh oh!

vinaykul commented Mar 17, 2023

Uh oh!

Uh oh!

vinaykul commented Mar 18, 2023

Uh oh!

vinaykul commented Mar 18, 2023

Uh oh!

smarterclayton Mar 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vinaykul Mar 22, 2023

Choose a reason for hiding this comment

Uh oh!

SergeyKanzhelev Mar 22, 2023

Choose a reason for hiding this comment

Uh oh!

vinaykul Mar 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cici37 commented Mar 21, 2023

Uh oh!

SergeyKanzhelev left a comment

Choose a reason for hiding this comment

Uh oh!

k8s-ci-robot commented Mar 22, 2023

Uh oh!

dims commented Mar 22, 2023

Uh oh!

dims commented Mar 22, 2023

Uh oh!

k8s-ci-robot commented Mar 22, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

smarterclayton Mar 20, 2023 •

edited

Loading

vinaykul Mar 22, 2023 •

edited

Loading