Skip to content

SRIOV-CNI fails configuring VF VLAN 0 with proto 802.1q on mlx5 (ConnectX-6 Lx) although kernel accepts the same configuration manually #392

@ashokpariya0

Description

@ashokpariya0

What happened?

Pod creation fails when attaching an SR-IOV network on a Mellanox ConnectX-6 Lx NIC.
The failure occurs during VF VLAN configuration inside SRIOV-CNI with the following error:

SRIOV-CNI failed to configure VF
"failed to set vf vlan configuration - id 0, qos 0 and proto 802.1q: protocol not supported"

The pod remains in ContainerCreating state and the sandbox setup fails.

# oc describe pod -n test                     testpod-no-labels 
Name:             testpod-no-labels
Namespace:        test
Priority:         0
Service Account:  default
Node:             b314lp37.lnxero1.boe/172.23.228.37
Start Time:       Tue, 16 Dec 2025 10:26:04 +0100
Labels:           <none>
Annotations:      k8s.v1.cni.cncf.io/networks: test-apivolnetwork
Status:           Pending
IP:               
IPs:              <none>
Containers:
  test:
    Container ID:  
    Image:         quay.io/openshift-kni/cnf-tests:4.7
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/bash
      -c
      sleep INF
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Limits:
      openshift.io/testresource:  1
    Requests:
      openshift.io/testresource:  1
    Environment:                  <none>
    Mounts:
      /etc/podnetinfo from podnetinfo (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-x8xxh (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   False 
  Initialized                 True 
  Ready                       False 
  ContainersReady             False 
  PodScheduled                True 
Volumes:
  kube-api-access-x8xxh:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    Optional:                false
    DownwardAPI:             true
  podnetinfo:
    Type:  DownwardAPI (a volume populated by information about the pod)
    Items:
      metadata.annotations -> annotations
QoS Class:       BestEffort
Node-Selectors:  kubernetes.io/hostname=b314lp37.lnxero1.boe
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age   From               Message
  ----     ------                  ----  ----               -------
  Normal   Scheduled               8s    default-scheduler  Successfully assigned test/testpod-no-labels to b314lp37.lnxero1.boe
  Normal   AddedInterface          8s    multus             Add eth0 [10.244.0.47/24] from cbr0
  Warning  FailedCreatePodSandBox  8s    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "f182b903ddbe9825c84445541aadb79597fe4336fe8fe0ad21222ee155088ffa": plugin type="multus" name="multus-cni-network" failed (add): [test/testpod-no-labels/bc038bd4-bc8f-4a60-ad25-870fcb5037c1:test-apivolnetwork]: error adding container to network "test-apivolnetwork": SRIOV-CNI failed to configure VF "failed to set vf 2 vlan configuration - id 0, qos 0 and proto 802.1q: protocol not supported"

The PF eSwitch mode remains legacy.

What did you expect to happen?

SRIOV-CNI should successfully configure the VF VLAN when vlan: 0 and vlanQoS: 0 are specified.

Or, if the NIC / driver does not support proto 802.1q for VF VLAN programming in legacy eSwitch mode, SRIOV-CNI should:

Detect the capability upfront, and

Gracefully skip or avoid applying unsupported VLAN protocol settings instead of failing pod creation.

What are the minimal steps needed to reproduce the bug?

  1. Create SriovNetworkNodePolicy
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: test-policy
  namespace: sriov-network-operator
spec:
  deviceType: netdevice
  nicSelector:
    pfNames:
    - ens2048f0np0
  nodeSelector:
    kubernetes.io/hostname: <node-name>
  numVfs: 5
  resourceName: testresource

  1. Create sriovnetwork
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: test-apivolnetwork
  namespace: sriov-network-operator
spec:
  networkNamespace: test
  resourceName: testresource
  ipam: |
    {
      "type": "host-local",
      "subnet": "10.10.10.0/24",
      "rangeStart": "10.10.10.171",
      "rangeEnd": "10.10.10.181"
    }

  1. Create pod
apiVersion: v1
kind: Pod
metadata:
  name: testpod-no-labels
  namespace: test
  annotations:
    k8s.v1.cni.cncf.io/networks: test-apivolnetwork
spec:
  containers:
  - name: test
    image: quay.io/openshift-kni/cnf-tests:4.7
    command: ["/bin/bash", "-c", "sleep INF"]
    resources:
      requests:
        openshift.io/testresource: 1
      limits:
        openshift.io/testresource: 1
  nodeSelector:
    kubernetes.io/hostname: <node-name>

Anything else we need to know?

This is on k8s cluster running on s390x arch with mellanox nic.

# kubectl get sriovnetworknodestates.sriovnetwork.openshift.io -n sriov-network-operator b314lp37.lnxero1.boe -o yaml
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodeState
metadata:
  annotations:
    sriovnetwork.openshift.io/current-state: Idle
    sriovnetwork.openshift.io/desired-state: Idle
  creationTimestamp: "2025-12-16T05:01:21Z"
  generation: 17
  name: b314lp37.lnxero1.boe
  namespace: sriov-network-operator
  ownerReferences:
  - apiVersion: sriovnetwork.openshift.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: SriovOperatorConfig
    name: default
    uid: 6b0252dd-a698-4abd-b553-af01fc5c26e3
  resourceVersion: "546133"
  uid: 3de85b7e-a072-40ef-85b7-99246cc2cbe2
spec:
  bridges: {}
  interfaces:
  - name: ens2048f0np0
    numVfs: 5
    pciAddress: "0101:00:00.0"
    vfGroups:
    - deviceType: netdevice
      policyName: test-policy
      resourceName: testresource
      vfRange: 0-4
  system: {}
status:
  interfaces:
  - Vfs:
    - deviceID: 101e
      driver: mlx5_core
      guid: "00:00:00:00:00:00:00:00"
      mac: e2:60:85:11:d6:9a
      mtu: 1500
      name: ens2048f0v0
      pciAddress: "0101:00:00.2"
      vendor: 15b3
      vfID: 0
    - deviceID: 101e
      driver: mlx5_core
      guid: "00:00:00:00:00:00:00:00"
      mac: 12:75:3d:93:4c:a0
      mtu: 1500
      name: ens2048f0v1
      pciAddress: "0101:00:00.3"
      vendor: 15b3
      vfID: 1
    - deviceID: 101e
      driver: mlx5_core
      guid: "00:00:00:00:00:00:00:00"
      mac: e2:95:01:fc:5a:26
      mtu: 1500
      name: ens2048f0v2
      pciAddress: "0101:00:00.4"
      vendor: 15b3
      vfID: 2
    - deviceID: 101e
      driver: mlx5_core
      guid: "00:00:00:00:00:00:00:00"
      mac: 1e:af:61:2c:7f:fd
      mtu: 1500
      name: ens2048f0v3
      pciAddress: "0101:00:00.5"
      vendor: 15b3
      vfID: 3
    - deviceID: 101e
      driver: mlx5_core
      guid: "00:00:00:00:00:00:00:00"
      mac: 8e:e5:c4:e3:e6:d6
      mtu: 1500
      name: ens2048f0v4
      pciAddress: "0101:00:00.6"
      vendor: 15b3
      vfID: 4
    deviceID: 101f
    driver: mlx5_core
    eSwitchMode: legacy
    linkAdminState: up
    linkSpeed: 10000 Mb/s
    linkType: ETH
    mac: 9c:63:c0:53:06:82
    mtu: 1500
    name: ens2048f0np0
    numVfs: 5
    pciAddress: "0101:00:00.0"
    totalvfs: 127
    vendor: 15b3
  - deviceID: 101f
    driver: mlx5_core
    eSwitchMode: legacy
    linkAdminState: up
    linkSpeed: 10000 Mb/s
    linkType: ETH
    mac: 9c:63:c0:53:06:83
    mtu: 1500
    name: ens2176f1np1
    pciAddress: "0102:00:00.1"
    totalvfs: 127
    vendor: 15b3
  syncStatus: Succeeded
  system:
    rdmaMode: exclusive

[root@b3 sriov-network-operator]#

[root@b3 sriov-network-operator]# lspci
00f5:00:00.0 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
00f6:00:00.0 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
0101:00:00.0 Ethernet controller: Mellanox Technologies MT2894 Family [ConnectX-6 Lx]
0101:00:00.2 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
0101:00:00.3 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
0101:00:00.4 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
0101:00:00.5 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
0101:00:00.6 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
0102:00:00.1 Ethernet controller: Mellanox Technologies MT2894 Family [ConnectX-6 Lx]

Component Versions

Please fill in the below table with the version numbers of applicable components used.

Component Version
SR-IOV CNI Plugin
Multus
SR-IOV Network Device Plugin
Kubernetes
OS

Config Files

Config file locations may be config dependent.

CNI config (Try '/etc/cni/net.d/')
Device pool config file location (Try '/etc/pcidp/config.json')
Multus config (Try '/etc/cni/multus/net.d')
Kubernetes deployment type ( Bare Metal, Kubeadm etc.)
Kubeconfig file
SR-IOV Network Custom Resource Definition

Logs

SR-IOV Network Device Plugin Logs (use kubectl logs $PODNAME)
Multus logs (If enabled. Try '/var/log/multus.log' )
Kubelet logs (journalctl -u kubelet)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions