Skip to content

VF device not returned to host or any visible network namespace after workload deletion #349

@Mattavamsi

Description

@Mattavamsi

What happened?

In our SR-IOV CNI environment (using Multus + SR-IOV CNI), we observed a case where, after a VM (or pod) using an SR-IOV VF is terminated, the VF is no longer visible:
- It is not present in the host network namespace
- It is not seen in any active container or VM network namespace
- Yet the PCI device still exists and is bound to the iavf driver

This results in the VF becoming orphaned and unavailable for reuse by future workloads, unless it is manually unbound and rebound.

What did you expect to happen?

When a workload using a VF is terminated, the CNI DEL operation (or equivalent cleanup) should return the VF to the host namespace, or otherwise make it available for reuse.

What are the minimal steps needed to reproduce the bug?

Anything else we need to know?

  1. "ip link show"do not show the VF interface
  2. lspci -nn shows the VF is bound to iavf
  3. Rebinding the device to iavf manually does not create a network interface
  4. Re-creating VFs via PF (sriov_numvfs) resolves the issue temporarily

Component Versions

Please fill in the below table with the version numbers of applicable components used.

Component Version
SR-IOV CNI Plugin 2.8.1
Multus 3.7.0
SR-IOV Network Device Plugin v3.4.0
Kubernetes v1.30.4-k3s
OS 4.18.0-553.50
IAVF v4.3.19
ICE v1.7.16-1

Troubleshooting Performed:

  • Verified VF PCI device is still present and bound to correct driver (iavf)
  • Searched all active network namespaces (via /proc/*/ns/net and nsenter)
  • Tried manual unbind/rebind:
  • Reinitialising VFs from PF (echo 0 && echo N > sriov_numvfs) makes the VF usable again

SRIOV-CNI Logs:

time="2025-06-19T17:06:58.915446581Z" level="debug" msg="In Setupmanager" cniName="sriov-cni"
time="2025-06-19T17:07:00.541066161Z" level="debug" msg="1. Set link down" cniName="sriov-cni" func="SetupVF" linkObj="&{LinkAttrs:{Index:269 MTU:1500 TxQLen:1000 Name:eno1v1 HardwareAddr:14:a9:d0:5d:3e:0c Flags:up|broadcast|multicast RawFlags:4099 ParentIndex:0 MasterIndex:0 Namespace:<nil> Alias: AltNames:[enp236s0f4v1] Statistics:0xc00022e300 Promisc:0 Allmulti:0 Multi:1 Xdp:0xc000216240 EncapType:ether Protinfo:<nil> OperState:down PhysSwitchID:0 NetNsID:-1 NumTxQueues:16 NumRxQueues:16 TSOMaxSegs:0 TSOMaxSize:0 GSOMaxSegs:65535 GSOMaxSize:65536 GROMaxSize:0 GSOIPv4MaxSize:0 GROIPv4MaxSize:0 Vfs:[] Group:0 PermHWAddr:14:a9:d0:5d:3e:0c Slave:<nil>}}"
time="2025-06-19T17:07:00.860378327Z" level="debug" msg="2. Set temp name" cniName="sriov-cni" func="SetupVF" linkObj="&{LinkAttrs:{Index:269 MTU:1500 TxQLen:1000 Name:eno1v1 HardwareAddr:14:a9:d0:5d:3e:0c Flags:up|broadcast|multicast RawFlags:4099 ParentIndex:0 MasterIndex:0 Namespace:<nil> Alias: AltNames:[enp236s0f4v1] Statistics:0xc00022e300 Promisc:0 Allmulti:0 Multi:1 Xdp:0xc000216240 EncapType:ether Protinfo:<nil> OperState:down PhysSwitchID:0 NetNsID:-1 NumTxQueues:16 NumRxQueues:16 TSOMaxSegs:0 TSOMaxSize:0 GSOMaxSegs:65535 GSOMaxSize:65536 GROMaxSize:0 GSOIPv4MaxSize:0 GROIPv4MaxSize:0 Vfs:[] Group:0 PermHWAddr:14:a9:d0:5d:3e:0c Slave:<nil>}}" tempName="temp_269"
time="2025-06-19T17:07:01.170086373Z" level="debug" msg="3. Remove interface original name from alt names" cniName="sriov-cni" func="SetupVF" linkObj="&{LinkAttrs:{Index:269 MTU:1500 TxQLen:1000 Name:eno1v1 HardwareAddr:14:a9:d0:5d:3e:0c Flags:up|broadcast|multicast RawFlags:4099 ParentIndex:0 MasterIndex:0 Namespace:<nil> Alias: AltNames:[enp236s0f4v1] Statistics:0xc00022e300 Promisc:0 Allmulti:0 Multi:1 Xdp:0xc000216240 EncapType:ether Protinfo:<nil> OperState:down PhysSwitchID:0 NetNsID:-1 NumTxQueues:16 NumRxQueues:16 TSOMaxSegs:0 TSOMaxSize:0 GSOMaxSegs:65535 GSOMaxSize:65536 GROMaxSize:0 GSOIPv4MaxSize:0 GROIPv4MaxSize:0 Vfs:[] Group:0 PermHWAddr:14:a9:d0:5d:3e:0c Slave:<nil>}}" OriginalLinkName="eno1v1" tempName="temp_269"
time="2025-06-19T17:07:01.470508897Z" level="debug" msg="4. Change netns" cniName="sriov-cni" func="SetupVF" linkObj="&{LinkAttrs:{Index:269 MTU:1500 TxQLen:1000 Name:temp_269 HardwareAddr:14:a9:d0:5d:3e:0c Flags:broadcast|multicast RawFlags:4098 ParentIndex:0 MasterIndex:0 Namespace:<nil> Alias: AltNames:[enp236s0f4v1] Statistics:0xc00022e480 Promisc:0 Allmulti:0 Multi:1 Xdp:0xc000216360 EncapType:ether Protinfo:<nil> OperState:down PhysSwitchID:0 NetNsID:-1 NumTxQueues:16 NumRxQueues:16 TSOMaxSegs:0 TSOMaxSize:0 GSOMaxSegs:65535 GSOMaxSize:65536 GROMaxSize:0 GSOIPv4MaxSize:0 GROIPv4MaxSize:0 Vfs:[] Group:0 PermHWAddr:14:a9:d0:5d:3e:0c Slave:<nil>}}" netns.Fd()="5"
time="2025-06-19T17:07:01.491333248Z" level="debug" msg="5. Set Pod IF name" cniName="sriov-cni" func="SetupVF" linkObj="&{LinkAttrs:{Index:269 MTU:1500 TxQLen:1000 Name:temp_269 HardwareAddr:14:a9:d0:5d:3e:0c Flags:broadcast|multicast RawFlags:4098 ParentIndex:0 MasterIndex:0 Namespace:<nil> Alias: AltNames:[enp236s0f4v1] Statistics:0xc00022e480 Promisc:0 Allmulti:0 Multi:1 Xdp:0xc000216360 EncapType:ether Protinfo:<nil> OperState:down PhysSwitchID:0 NetNsID:-1 NumTxQueues:16 NumRxQueues:16 TSOMaxSegs:0 TSOMaxSize:0 GSOMaxSegs:65535 GSOMaxSize:65536 GROMaxSize:0 GSOIPv4MaxSize:0 GROIPv4MaxSize:0 Vfs:[] Group:0 PermHWAddr:14:a9:d0:5d:3e:0c Slave:<nil>}}" podifName="pod7aefa17bbb4"
time="2025-06-19T17:07:01.808096114Z" level="debug" msg="6. Enable IPv4 ARP notify and IPv6 Network Discovery notify" cniName="sriov-cni" func="SetupVF" podifName="pod7aefa17bbb4"
time="2025-06-19T17:07:01.808369819Z" level="debug" msg="7. Set MAC address" cniName="sriov-cni" func="SetupVF" s.nLink="&{NetlinkManager:<nil>}" podifName="pod7aefa17bbb4" conf.MAC="14:a9:d0:5d:3e:0c"
time="2025-06-19T17:07:02.40943442Z" level="debug" msg="8. Enable Optimistic DAD for IPv6 addresses" cniName="sriov-cni" func="SetupVF" linkObj="&{LinkAttrs:{Index:269 MTU:1500 TxQLen:1000 Name:temp_269 HardwareAddr:14:a9:d0:5d:3e:0c Flags:broadcast|multicast RawFlags:4098 ParentIndex:0 MasterIndex:0 Namespace:<nil> Alias: AltNames:[enp236s0f4v1] Statistics:0xc00022e480 Promisc:0 Allmulti:0 Multi:1 Xdp:0xc000216360 EncapType:ether Protinfo:<nil> OperState:down PhysSwitchID:0 NetNsID:-1 NumTxQueues:16 NumRxQueues:16 TSOMaxSegs:0 TSOMaxSize:0 GSOMaxSegs:65535 GSOMaxSize:65536 GROMaxSize:0 GSOIPv4MaxSize:0 GROIPv4MaxSize:0 Vfs:[] Group:0 PermHWAddr:14:a9:d0:5d:3e:0c Slave:<nil>}}"
time="2025-06-19T17:07:02.409640607Z" level="debug" msg="9. Bring IF up in Pod netns" cniName="sriov-cni" func="SetupVF" linkObj="&{LinkAttrs:{Index:269 MTU:1500 TxQLen:1000 Name:temp_269 HardwareAddr:14:a9:d0:5d:3e:0c Flags:broadcast|multicast RawFlags:4098 ParentIndex:0 MasterIndex:0 Namespace:<nil> Alias: AltNames:[enp236s0f4v1] Statistics:0xc00022e480 Promisc:0 Allmulti:0 Multi:1 Xdp:0xc000216360 EncapType:ether Protinfo:<nil> OperState:down PhysSwitchID:0 NetNsID:-1 NumTxQueues:16 NumRxQueues:16 TSOMaxSegs:0 TSOMaxSize:0 GSOMaxSegs:65535 GSOMaxSize:65536 GROMaxSize:0 GSOIPv4MaxSize:0 GROIPv4MaxSize:0 Vfs:[] Group:0 PermHWAddr:14:a9:d0:5d:3e:0c Slave:<nil>}}"
time="2025-06-19T17:07:02.411704618Z" level="debug" msg="Cache NetConf for CmdDel" cniName="sriov-cni" func="cmdAdd" config.DefaultCNIDir="/var/lib/cni/sriov" netConf="&{NetConf:{CNIVersion:0.3.1 Name:sriov-net0-tenant1 Type:sriov Capabilities:map[] IPAM:{Type:} DNS:{Nameservers:[] Domain: Search:[] Options:[]} RawPrevResult:map[] PrevResult:<nil>} SriovNetConf:{OrigVfState:{HostIFName:eno1v1 SpoofChk:true Trust:false AdminMAC:42:d7:d6:a6:cf:01 EffectiveMAC:14:a9:d0:5d:3e:0c Vlan:0 VlanQoS:0 VlanProto:33024 MinTxRate:0 MaxTxRate:0 LinkState:0} DPDKMode:false Master:x557_1 MAC:14:a9:d0:5d:3e:0c MTU:0xc0001199c0 Vlan:<nil> VlanQoS:<nil> VlanProto:<nil> DeviceID:0000:ec:11.1 VFID:1 MinTxRate:<nil> MaxTxRate:<nil> SpoofChk:off Trust:on LinkState: RuntimeConfig:{Mac:} LogLevel: LogFile:}}"
time="2025-06-19T17:07:02.412147326Z" level="debug" msg="Mark the PCI address as in use" cniName="sriov-cni" func="cmdAdd" config.DefaultCNIDir="/var/lib/cni/sriov" netConf.DeviceID="0000:ec:11.1"
{
    "cniVersion": "0.3.1",
    "interfaces": [
        {
            "name": "pod7aefa17bbb4",
            "mac": "14:a9:d0:5d:3e:0c",
            "sandbox": "/var/run/netns/cni-4ed007b1-9575-9088-22d4-ae6affc9949c"
        }
    ],
    "dns": {}
}time="2025-06-19T17:07:02.415987311Z" level="debug" msg="In Setupmanager" cniName="sriov-cni"
time="2025-06-19T17:07:02.803729441Z" level="debug" msg="Get VF device" cniName="sriov-cni" func="ReleaseVF" podifName="pod7aefa17bbb4"
time="2025-06-19T17:07:02.804170128Z" level="debug" msg="Shutdown VF device" cniName="sriov-cni" func="ReleaseVF" linkObj="&{LinkAttrs:{Index:269 MTU:1500 TxQLen:1000 Name:pod7aefa17bbb4 HardwareAddr:42:d7:d6:a6:cf:01 Flags:up|broadcast|multicast RawFlags:4099 ParentIndex:0 MasterIndex:0 Namespace:<nil> Alias: AltNames:[enp236s0f4v1] Statistics:0xc000148180 Promisc:0 Allmulti:0 Multi:1 Xdp:0xc000130258 EncapType:ether Protinfo:<nil> OperState:down PhysSwitchID:0 NetNsID:-1 NumTxQueues:16 NumRxQueues:16 TSOMaxSegs:0 TSOMaxSize:0 GSOMaxSegs:65535 GSOMaxSize:65536 GROMaxSize:0 GSOIPv4MaxSize:0 GROIPv4MaxSize:0 Vfs:[] Group:0 PermHWAddr:42:d7:d6:a6:cf:01 Slave:<nil>}}"
time="2025-06-19T17:07:03.123094754Z" level="debug" msg="Rename VF device" cniName="sriov-cni" func="ReleaseVF" linkObj="&{LinkAttrs:{Index:269 MTU:1500 TxQLen:1000 Name:pod7aefa17bbb4 HardwareAddr:42:d7:d6:a6:cf:01 Flags:up|broadcast|multicast RawFlags:4099 ParentIndex:0 MasterIndex:0 Namespace:<nil> Alias: AltNames:[enp236s0f4v1] Statistics:0xc000148180 Promisc:0 Allmulti:0 Multi:1 Xdp:0xc000130258 EncapType:ether Protinfo:<nil> OperState:down PhysSwitchID:0 NetNsID:-1 NumTxQueues:16 NumRxQueues:16 TSOMaxSegs:0 TSOMaxSize:0 GSOMaxSegs:65535 GSOMaxSize:65536 GROMaxSize:0 GSOIPv4MaxSize:0 GROIPv4MaxSize:0 Vfs:[] Group:0 PermHWAddr:42:d7:d6:a6:cf:01 Slave:<nil>}}" conf.OrigVfState.HostIFName="eno1v1"
time="2025-06-19T17:07:03.436913246Z" level="debug" msg="Reset effective MAC address" cniName="sriov-cni" func="ReleaseVF" s.nLink="&{NetlinkManager:<nil>}" conf.OrigVfState.HostIFName="eno1v1" conf.OrigVfState.EffectiveMAC="14:a9:d0:5d:3e:0c"
{
    "code": 999,
    "msg": "failed to restore original effective netlink MAC address 14:a9:d0:5d:3e:0c: resource temporarily unavailable"
}time="2025-06-19T17:08:05.898651114Z" level="debug" msg="In Setupmanager" cniName="sriov-cni"
{
    "code": 999,
    "msg": "SRIOV-CNI failed to configure VF \"failed to set MAC address to 14:a9:d0:5d:3e:0c: invalid argument\""
}time="2025-06-19T17:08:19.883128186Z" level="debug" msg="In Setupmanager" cniName="sriov-cni"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions