-
Notifications
You must be signed in to change notification settings - Fork 41.6k
Fix pod object update that may cause data race #116702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix pod object update that may cause data race #116702
Conversation
|
/sig node |
|
/assign @SergeyKanzhelev |
8eae126 to
6591d15
Compare
6591d15 to
d49b936
Compare
d49b936 to
f66e884
Compare
|
/assign @liggitt |
|
/test pull-kubernetes-e2e-gce-cos-alpha-features |
|
/test pull-kubernetes-e2e-gce-cos-alpha-features |
| kl.podManager.UpdatePod(pod) | ||
| kl.statusManager.SetPodStatus(pod, pod.Status) | ||
| kl.podManager.UpdatePod(updatedPod) | ||
| kl.statusManager.SetPodStatus(updatedPod, updatedPod.Status) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you expecting the SyncPod loop to continue with the updated pod values? If so you need to return updatedPod here and ensure this is called before other pod actions.
Also note - calling UpdatePod like this isn't safe - if another actor updates the pod in between this call and the time you resync, the value of pod in pod manager will be overwritten by that value (may be older) - or worse, you might overwrite a newer value of pod spec that was delivered. That might cause (for instance) a pod that was previously completed to be seen as newly created, and the kubelet to run it twice.
That is the root of my concern here is that you can't use podManager as a store of temporary state (the kubelet uses it as a cache of upstream sources of truth). If you need to create a store of truth for "changes that have been requested AND accepted by the kubelet" we need to create a new map to hold them, which probably involves designing this into the state model of the kubelet as I mentioned.
The approach here will be enough for alpha but is probably relying on the update from the API actually propagating quickly to be correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have modified the code to return updated pod, it looks like a fairly safe change for now. PTAL.
The implementation does not rely on quick propagation, but it is desired.
My initial approach in the KEP was to not do this in SyncPod at all and do this in HandlePodUpdates coupled with event-based retries for Deferred resizes (e.g. revisit Deferred resizes upon deletion. IMHO it makes sense here because there is fair chance a resize try will be successful upon after other resource-using pods departed the node vs periodic retries). IIRC there were concerns about the reliability and it was conventional/accepted for K8s to retry even when it may not change the result. We also discussed this point during early reviews but opted to have a beta TODO to focus on this change, tracking issue #109547
What are your thoughts on my initial suggested approach (HandlePodUpdates + event based retries)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #116826 for addressing in the next version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is the root of my concern here is that you can't use podManager as a store of temporary state (the kubelet uses it as a cache of upstream sources of truth). If you need to create a store of truth for "changes that have been requested AND accepted by the kubelet" we need to create a new map to hold them, which probably involves designing this into the state model of the kubelet as I mentioned.
Yes, I plan to revisit computePodResizeAction for this purpose. Another possibility is to borrow from the original implementation of the initially approved KEP and patch the PodStatus and proceed only upon successful patch but that has the downside of an additional call to the API server, which could get rate-limited. This approach intends to leverage pod status as persistent, reliable source of truth ( xref: KEP 2527 )
|
This Issue/PR has been marked as release blocker. Please have the PR merged by eod today otherwise the feature has to be reverted. For the Issue/PR owner: please have a revert/feature disable PR open today in case the fix failed to get in. Thank you! See the announcement here -- Kubernetes 1.27 release manager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
With the caveat discussed: https://kubernetes.slack.com/archives/C0BP8PW9G/p1679443165418009
|
LGTM label has been added. Git tree hash: 26c4109017e3d06128992ca37a5c83f2beb22208
|
|
/approve Looks like we have enough for alpha and there is a feature gate to protect things. thanks @SergeyKanzhelev for opening up a follow up issue. |
|
/retest |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dims, SergeyKanzhelev, spatecon, vinaykul The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind bug
What this PR does / why we need it: Overwriting pod object with with updated pod object can cause potential data race conditions if the original object is being accessed elsewhere. This change fixes the issue.
Which issue(s) this PR fixes:
Fixes #116694
Special notes for your reviewer:
Does this PR introduce a user-facing change? no
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
/assign @smarterclayton