-
Notifications
You must be signed in to change notification settings - Fork 41.6k
Fix pod object update that may cause data race #116702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
k8s-ci-robot
merged 4 commits into
kubernetes:master
from
vinaykul:restart-free-pod-vertical-scaling-podmutation-fix
Mar 22, 2023
Merged
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
f66e884
Fix pod object update that may cause data race
vinaykul 358474b
Explicitly return from checkpoint update failures. SyncPod will retry
vinaykul d753893
Do not modify original pod object when processing pod resource resize
vinaykul f41702b
Return updatedPod if resize upon successful checkpointing of allocate…
vinaykul File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you expecting the SyncPod loop to continue with the updated pod values? If so you need to return updatedPod here and ensure this is called before other pod actions.
Also note - calling UpdatePod like this isn't safe - if another actor updates the pod in between this call and the time you resync, the value of pod in pod manager will be overwritten by that value (may be older) - or worse, you might overwrite a newer value of pod spec that was delivered. That might cause (for instance) a pod that was previously completed to be seen as newly created, and the kubelet to run it twice.
That is the root of my concern here is that you can't use podManager as a store of temporary state (the kubelet uses it as a cache of upstream sources of truth). If you need to create a store of truth for "changes that have been requested AND accepted by the kubelet" we need to create a new map to hold them, which probably involves designing this into the state model of the kubelet as I mentioned.
The approach here will be enough for alpha but is probably relying on the update from the API actually propagating quickly to be correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have modified the code to return updated pod, it looks like a fairly safe change for now. PTAL.
The implementation does not rely on quick propagation, but it is desired.
My initial approach in the KEP was to not do this in SyncPod at all and do this in HandlePodUpdates coupled with event-based retries for Deferred resizes (e.g. revisit Deferred resizes upon deletion. IMHO it makes sense here because there is fair chance a resize try will be successful upon after other resource-using pods departed the node vs periodic retries). IIRC there were concerns about the reliability and it was conventional/accepted for K8s to retry even when it may not change the result. We also discussed this point during early reviews but opted to have a beta TODO to focus on this change, tracking issue #109547
What are your thoughts on my initial suggested approach (HandlePodUpdates + event based retries)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #116826 for addressing in the next version
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I plan to revisit computePodResizeAction for this purpose. Another possibility is to borrow from the original implementation of the initially approved KEP and patch the PodStatus and proceed only upon successful patch but that has the downside of an additional call to the API server, which could get rate-limited. This approach intends to leverage pod status as persistent, reliable source of truth ( xref: KEP 2527 )