-
Notifications
You must be signed in to change notification settings - Fork 156
Description
What happened?
Pod creation fails when attaching an SR-IOV network on a Mellanox ConnectX-6 Lx NIC.
The failure occurs during VF VLAN configuration inside SRIOV-CNI with the following error:
SRIOV-CNI failed to configure VF
"failed to set vf vlan configuration - id 0, qos 0 and proto 802.1q: protocol not supported"
The pod remains in ContainerCreating state and the sandbox setup fails.
# oc describe pod -n test testpod-no-labels
Name: testpod-no-labels
Namespace: test
Priority: 0
Service Account: default
Node: b314lp37.lnxero1.boe/172.23.228.37
Start Time: Tue, 16 Dec 2025 10:26:04 +0100
Labels: <none>
Annotations: k8s.v1.cni.cncf.io/networks: test-apivolnetwork
Status: Pending
IP:
IPs: <none>
Containers:
test:
Container ID:
Image: quay.io/openshift-kni/cnf-tests:4.7
Image ID:
Port: <none>
Host Port: <none>
Command:
/bin/bash
-c
sleep INF
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Limits:
openshift.io/testresource: 1
Requests:
openshift.io/testresource: 1
Environment: <none>
Mounts:
/etc/podnetinfo from podnetinfo (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-x8xxh (ro)
Conditions:
Type Status
PodReadyToStartContainers False
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-x8xxh:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
Optional: false
DownwardAPI: true
podnetinfo:
Type: DownwardAPI (a volume populated by information about the pod)
Items:
metadata.annotations -> annotations
QoS Class: BestEffort
Node-Selectors: kubernetes.io/hostname=b314lp37.lnxero1.boe
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 8s default-scheduler Successfully assigned test/testpod-no-labels to b314lp37.lnxero1.boe
Normal AddedInterface 8s multus Add eth0 [10.244.0.47/24] from cbr0
Warning FailedCreatePodSandBox 8s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "f182b903ddbe9825c84445541aadb79597fe4336fe8fe0ad21222ee155088ffa": plugin type="multus" name="multus-cni-network" failed (add): [test/testpod-no-labels/bc038bd4-bc8f-4a60-ad25-870fcb5037c1:test-apivolnetwork]: error adding container to network "test-apivolnetwork": SRIOV-CNI failed to configure VF "failed to set vf 2 vlan configuration - id 0, qos 0 and proto 802.1q: protocol not supported"
The PF eSwitch mode remains legacy.
What did you expect to happen?
SRIOV-CNI should successfully configure the VF VLAN when vlan: 0 and vlanQoS: 0 are specified.
Or, if the NIC / driver does not support proto 802.1q for VF VLAN programming in legacy eSwitch mode, SRIOV-CNI should:
Detect the capability upfront, and
Gracefully skip or avoid applying unsupported VLAN protocol settings instead of failing pod creation.
What are the minimal steps needed to reproduce the bug?
- Create SriovNetworkNodePolicy
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: test-policy
namespace: sriov-network-operator
spec:
deviceType: netdevice
nicSelector:
pfNames:
- ens2048f0np0
nodeSelector:
kubernetes.io/hostname: <node-name>
numVfs: 5
resourceName: testresource
- Create sriovnetwork
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
name: test-apivolnetwork
namespace: sriov-network-operator
spec:
networkNamespace: test
resourceName: testresource
ipam: |
{
"type": "host-local",
"subnet": "10.10.10.0/24",
"rangeStart": "10.10.10.171",
"rangeEnd": "10.10.10.181"
}
- Create pod
apiVersion: v1
kind: Pod
metadata:
name: testpod-no-labels
namespace: test
annotations:
k8s.v1.cni.cncf.io/networks: test-apivolnetwork
spec:
containers:
- name: test
image: quay.io/openshift-kni/cnf-tests:4.7
command: ["/bin/bash", "-c", "sleep INF"]
resources:
requests:
openshift.io/testresource: 1
limits:
openshift.io/testresource: 1
nodeSelector:
kubernetes.io/hostname: <node-name>
Anything else we need to know?
This is on k8s cluster running on s390x arch with mellanox nic.
# kubectl get sriovnetworknodestates.sriovnetwork.openshift.io -n sriov-network-operator b314lp37.lnxero1.boe -o yaml
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodeState
metadata:
annotations:
sriovnetwork.openshift.io/current-state: Idle
sriovnetwork.openshift.io/desired-state: Idle
creationTimestamp: "2025-12-16T05:01:21Z"
generation: 17
name: b314lp37.lnxero1.boe
namespace: sriov-network-operator
ownerReferences:
- apiVersion: sriovnetwork.openshift.io/v1
blockOwnerDeletion: true
controller: true
kind: SriovOperatorConfig
name: default
uid: 6b0252dd-a698-4abd-b553-af01fc5c26e3
resourceVersion: "546133"
uid: 3de85b7e-a072-40ef-85b7-99246cc2cbe2
spec:
bridges: {}
interfaces:
- name: ens2048f0np0
numVfs: 5
pciAddress: "0101:00:00.0"
vfGroups:
- deviceType: netdevice
policyName: test-policy
resourceName: testresource
vfRange: 0-4
system: {}
status:
interfaces:
- Vfs:
- deviceID: 101e
driver: mlx5_core
guid: "00:00:00:00:00:00:00:00"
mac: e2:60:85:11:d6:9a
mtu: 1500
name: ens2048f0v0
pciAddress: "0101:00:00.2"
vendor: 15b3
vfID: 0
- deviceID: 101e
driver: mlx5_core
guid: "00:00:00:00:00:00:00:00"
mac: 12:75:3d:93:4c:a0
mtu: 1500
name: ens2048f0v1
pciAddress: "0101:00:00.3"
vendor: 15b3
vfID: 1
- deviceID: 101e
driver: mlx5_core
guid: "00:00:00:00:00:00:00:00"
mac: e2:95:01:fc:5a:26
mtu: 1500
name: ens2048f0v2
pciAddress: "0101:00:00.4"
vendor: 15b3
vfID: 2
- deviceID: 101e
driver: mlx5_core
guid: "00:00:00:00:00:00:00:00"
mac: 1e:af:61:2c:7f:fd
mtu: 1500
name: ens2048f0v3
pciAddress: "0101:00:00.5"
vendor: 15b3
vfID: 3
- deviceID: 101e
driver: mlx5_core
guid: "00:00:00:00:00:00:00:00"
mac: 8e:e5:c4:e3:e6:d6
mtu: 1500
name: ens2048f0v4
pciAddress: "0101:00:00.6"
vendor: 15b3
vfID: 4
deviceID: 101f
driver: mlx5_core
eSwitchMode: legacy
linkAdminState: up
linkSpeed: 10000 Mb/s
linkType: ETH
mac: 9c:63:c0:53:06:82
mtu: 1500
name: ens2048f0np0
numVfs: 5
pciAddress: "0101:00:00.0"
totalvfs: 127
vendor: 15b3
- deviceID: 101f
driver: mlx5_core
eSwitchMode: legacy
linkAdminState: up
linkSpeed: 10000 Mb/s
linkType: ETH
mac: 9c:63:c0:53:06:83
mtu: 1500
name: ens2176f1np1
pciAddress: "0102:00:00.1"
totalvfs: 127
vendor: 15b3
syncStatus: Succeeded
system:
rdmaMode: exclusive
[root@b3 sriov-network-operator]#
[root@b3 sriov-network-operator]# lspci
00f5:00:00.0 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
00f6:00:00.0 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
0101:00:00.0 Ethernet controller: Mellanox Technologies MT2894 Family [ConnectX-6 Lx]
0101:00:00.2 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
0101:00:00.3 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
0101:00:00.4 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
0101:00:00.5 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
0101:00:00.6 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
0102:00:00.1 Ethernet controller: Mellanox Technologies MT2894 Family [ConnectX-6 Lx]
Component Versions
Please fill in the below table with the version numbers of applicable components used.
| Component | Version |
|---|---|
| SR-IOV CNI Plugin | |
| Multus | |
| SR-IOV Network Device Plugin | |
| Kubernetes | |
| OS |
Config Files
Config file locations may be config dependent.