-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[OCPBUGS-58229]: server: Fix network cleanup failures when NetNS path is empty #9410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OCPBUGS-58229]: server: Fix network cleanup failures when NetNS path is empty #9410
Conversation
e9cd871
to
a662b7c
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #9410 +/- ##
==========================================
- Coverage 67.24% 67.06% -0.18%
==========================================
Files 202 202
Lines 28025 28043 +18
==========================================
- Hits 18845 18807 -38
- Misses 7602 7662 +60
+ Partials 1578 1574 -4 🚀 New features to boost your workflow:
|
a662b7c
to
082abb6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM, do we need an additional integration test for that?
/hold
082abb6
to
2d476be
Compare
4ef1089
to
0a6a3d4
Compare
ba9c580
to
1b98d6f
Compare
I validated that upgrading CRI-O to the latest build resolves the issue with the help of this PR: Steps Performed:
Output: The pod was successfully created in the test-ai namespace after reboot. |
/lgtm |
@asahay19: changing LGTM is restricted to collaborators In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/hold cancel |
CI is broken due to the known issues with ansible snippet and the container-selinux change. |
/approve LGTM, @cri-o/cri-o-maintainers PTAL |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: haircommander, saschagrunert, sohankunkerkar The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Fixes an issue where empty network namespace paths cause CNI teardown failures, preventing pod deletion and creating stuck pods. This happens when infra containers die, leaving NetNsPath() returning empty strings that CNI plugins cannot handle. The failure cascade: dead infra container → empty NetNS → CNI failure → StopPodSandbox failure → stuck pod → systemd cgroup conflicts → new pod creation failures. Observed in MicroShift environments. Signed-off-by: Sohan Kunkerkar <sohank2602@gmail.com>
1b98d6f
to
02cd675
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes network cleanup failures that occur when the network namespace path is empty, which prevents pod deletion and can create stuck pods in production environments. The issue arises when infra containers die, leaving NetNsPath() returning empty strings that CNI plugins cannot handle properly.
- Added early detection and handling for empty network namespace paths during network teardown
- Modified test to expect network as already stopped when sandbox is stopped
- Added comprehensive integration test for pod deletion with missing NetNS scenarios
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
server/sandbox_network.go | Adds logic to handle empty NetNS paths gracefully during network cleanup |
server/sandbox_stop_test.go | Updates test expectations and removes unused import |
test/network.bats | Adds integration test for pod deletion when NetNS path is missing |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
. "github.com/onsi/ginkgo/v2" | ||
. "github.com/onsi/gomega" | ||
"go.uber.org/mock/gomock" | ||
types "k8s.io/cri-api/pkg/apis/runtime/v1" | ||
) |
Copilot
AI
Sep 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The removal of the gomock
import appears incomplete. The removed test case that used gomock.InOrder()
and gomock.Any()
suggests this import was necessary for proper mock testing. Consider whether this test removal reduces coverage for error handling scenarios.
Copilot uses AI. Check for mistakes.
log.Debugf(ctx, "CNI teardown failed due to missing/invalid network namespace for pod sandbox %s(%s): %v", sb.Name(), sb.ID(), err) | ||
|
||
// Clean up CNI result files even when NetNS is invalid. | ||
s.cleanupCNIResultFiles(ctx, sb.ID()) |
Copilot
AI
Sep 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cleanupCNIResultFiles
method is called but there's no documentation or context about what this method does or whether it can fail. Consider adding a comment explaining its purpose and whether its failure should be handled.
Copilot uses AI. Check for mistakes.
/retest |
/retest-required |
/lgtm |
/hold cancel |
/cherry-pick release-1.34 |
/cherry-pick release-1.33 |
@sohankunkerkar: new pull request created: #9471 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@sohankunkerkar: new pull request created: #9472 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Fixes an issue where empty network namespace paths cause CNI teardown failures, preventing pod deletion and creating stuck pods. This happens when infra containers die, leaving NetNsPath() returning empty strings that CNI plugins cannot handle.
The failure cascade: dead infra container → empty NetNS → CNI failure → StopPodSandbox failure → stuck pod → systemd cgroup conflicts → new pod creation failures. Observed in MicroShift production environments.
What type of PR is this?
/kind bug
What this PR does / why we need it:
Which issue(s) this PR fixes:
Special notes for your reviewer:
The network cleanup failure is blocking pod deletion, which is causing the systemd transaction conflicts.
Does this PR introduce a user-facing change?