Skip to content

cri-o passes (possibly) reused namespaces on network teardown #2849

@squeed

Description

@squeed

Description
CRI-O tears down networking after a reboot, risking pid reuse.

Steps to reproduce the issue:
Sadly, reproducing this is probabilistic. It should still be easy to fix, though

  1. Reboot the node. Create some containers, so they have a low pid number
  2. Reboot the node again
  3. Kubelet starts tearing down sandboxes that were killed because of the reboot
  4. cri-o issues a CNI delete with /proc/$pid/ns/net, even though $pid is meaningless since the reboot.

Even if you don't get a pid collision, I was able to see pretty clearly getting a CNI DEL for a stale pid. For example, from crio logs at level Info:

About to add CNI network lo (type=loopback)
Got pod network &{Name:alertmanager-main-1 Namespace:openshift-monitoring ID:... NetNS:/proc/8036/ns/net Networks:[] RuntimeConfig:map[]}

-- reboot --

About to del CNI network lo (type=loopback)
Error deleting network: failed to Statfs "/proc/8036/ns/net": no such file or directory

This clearly shows that it is looking for /proc/8036..., and it happens to not be a process. However, reboot enough times and you will eventually lose and it will point to a running pid (but not the one started by cri-o). We typically see this in about 1-in-10 reboots.

Describe the results you received:
We got a CNI Delete with the netns of /proc/<pid>/ns/net, which is correct, except that the node was rebooted in the mean time, and /proc/<pid>/ns/net pointed to the root netns.

Describe the results you expected:
The CNI delete should be with an empty netns parameter, which signifies to the plugins that the namespace is gone and only bookkeeping operations (e.g IPAM cleanup) are to be done. CRI-O should only pass the netns parameter if it points to a known-good crio-created process that is still running.

Output of crio --version:

crio version 1.14.10-0.19.dev.rhaos4.2.gita86dae7.el8

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions