-
Notifications
You must be signed in to change notification settings - Fork 1.1k
OCPBUGS-55485: Fix deadlock when stopping uninterruptible container #9256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #9256 +/- ##
==========================================
- Coverage 67.05% 66.91% -0.14%
==========================================
Files 198 198
Lines 27176 27189 +13
==========================================
- Hits 18222 18193 -29
- Misses 7449 7495 +46
+ Partials 1505 1501 -4 🚀 New features to boost your workflow:
|
internal/oci/container.go
Outdated
// because it is not controlled by the timeout anymore. | ||
stopTimeoutChan chan int64 | ||
stopWatchers []chan struct{} | ||
stopKillLoop bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe stopKillLoopBegun or something? making clear it's part of the 'stop' operation and not an instruction whether to stop the kill loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! done
Signed-off-by: Ayato Tokubi <atokubi@redhat.com>
@bitoku: This pull request references Jira Issue OCPBUGS-55485, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/jira refresh |
@bitoku: This pull request references Jira Issue OCPBUGS-55485, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/jira refresh |
@bitoku: This pull request references Jira Issue OCPBUGS-55485, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@openshift-ci-robot: GitHub didn't allow me to request PR reviews from the following users: lyman9966. Note that only cri-o members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/override ci/prow/ci-e2e-evented-pleg LGTM, @cri-o/cri-o-maintainers PTAL |
@haircommander: Overrode contexts on behalf of haircommander: ci/prow/ci-e2e-evented-pleg In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bitoku, haircommander, sohankunkerkar The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
/retest |
@bitoku: Jira Issue OCPBUGS-55485: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-55485 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/cherry-pick release-1.33 |
@bitoku: new pull request created: #9320 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/cherry-pick release-1.32 |
/cherry-pick release-1.31 |
@bitoku: #9256 failed to apply on top of branch "release-1.32":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/cherry-pick release-1.30 |
@bitoku: #9256 failed to apply on top of branch "release-1.31":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@bitoku: #9256 failed to apply on top of branch "release-1.30":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Because
c.stopTimeoutChan
is consumed only until it starts to SIGKILL, it could be deadlock in this scenario.SetAsStopping
StopLoopForContainer
killContainer
c.stopLock.Lock()
c.stopTimeoutChan <- timeout
(len(c.stopTimeoutChan) == 1
)c.stopLock.ULock()
c.stopLock.Lock()
c.stopTimeoutChan <- timeout
(len(c.stopTimeoutChan) == n
)c.stopLock.ULock()
c.stopLock.Lock()
c.stopTimeoutChan <- timeout
(BLOCKED!!)StopLoopForContainer
doneKillExecPIDs
c.stopLock.Lock()
(DEADLOCK!!)Which issue(s) this PR fixes:
https://issues.redhat.com/browse/OCPBUGS-55485
Special notes for your reviewer:
Does this PR introduce a user-facing change?