Skip to content

rpc error: code = Internal desc = an error (exit status 1) occurred while running modprobe args: [ceph] | on minikube Mac M2 #13739

@Fenrur

Description

@Fenrur

Bug Report

When i use CephFilesystem on Macbook M2 with minikube I can't mount a volume on a pod.

I have no problem in production on x86_64 architecture

How to reproduce it (minimal and precise):

  • Macbook M2
  • minikube v1.32.0
  • Command used: minikube start --disk-size 100g --cpus 2 --extra-disks 1 --memory 5g --nodes 3 --driver qemu --profile athena --network socket_vmnet
  • I used helm chart for deploy operator and cluster

Health status:

Command kubectl rook-ceph ceph status

Output:

  cluster:
    id:     7018d828-f6f1-4caf-bef2-264e53b1891b
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum a,b,c (age 24m)
    mgr: b(active, since 23m), standbys: a
    mds: 1/1 daemons up, 1 hot standby
    osd: 3 osds: 3 up (since 23m), 3 in (since 23m)
    rgw: 1 daemon active (1 hosts, 1 zones)
 
  data:
    volumes: 1/1 healthy
    pools:   12 pools, 169 pgs
    objects: 426 objects, 65 MiB
    usage:   299 MiB used, 300 GiB / 300 GiB avail
    pgs:     169 active+clean
 
  io:
    client:   1.1 KiB/s rd, 2.1 KiB/s wr, 1 op/s rd, 0 op/s wr

Log operator parts ceph-filesystem:

2024-02-09 12:35:13.469929 I | cephclient: creating a new crush rule for changed deviceClass on crush rule "ceph-filesystem-data0"
2024-02-09 12:35:13.469944 I | cephclient: updating pool "ceph-filesystem-data0" failure domain from "host" to "host" with new crush rule "ceph-filesystem-data0_host"
2024-02-09 12:35:13.469946 I | cephclient: crush rule "ceph-filesystem-data0" will no longer be used by pool "ceph-filesystem-data0"
2024-02-09 12:35:15.044554 I | cephclient: Successfully updated pool "ceph-filesystem-data0" failure domain to "host"
2024-02-09 12:35:15.044654 I | cephclient: creating filesystem "ceph-filesystem" with metadata pool "ceph-filesystem-metadata" and data pools [ceph-filesystem-data0]
2024-02-09 12:35:16.054778 I | cephclient: reconciling replicated pool ceph-objectstore.rgw.meta succeeded
2024-02-09 12:35:16.898292 I | ceph-file-controller: created filesystem "ceph-filesystem" on 1 data pool(s) and metadata pool "ceph-filesystem-metadata"
2024-02-09 12:35:16.898459 I | cephclient: setting allow_standby_replay to true for filesystem "ceph-filesystem"
2024-02-09 12:35:17.215112 I | cephclient: creating a new crush rule for changed deviceClass on crush rule "ceph-objectstore.rgw.meta"
2024-02-09 12:35:17.215127 I | cephclient: updating pool "ceph-objectstore.rgw.meta" failure domain from "host" to "host" with new crush rule "ceph-objectstore.rgw.meta_host"
2024-02-09 12:35:17.215130 I | cephclient: crush rule "ceph-objectstore.rgw.meta" will no longer be used by pool "ceph-objectstore.rgw.meta"
2024-02-09 12:35:17.833347 I | cephclient: creating cephfs "ceph-filesystem" subvolume group "csi"
2024-02-09 12:35:18.306880 I | cephclient: successfully created cephfs "ceph-filesystem" subvolume group "csi"
2024-02-09 12:35:18.936719 I | clusterdisruption-controller: all "host" failure domains: [athena athena-m02 athena-m03]. osd is down in failure domain: "". active node drains: false. pg health: "cluster is not fully clean. PGs: [{StateName:active+clean Count:12} {StateName:creating+peering Count:8}]"
2024-02-09 12:35:19.066436 I | cephclient: Successfully updated pool "ceph-objectstore.rgw.meta" failure domain to "host"
2024-02-09 12:35:19.066453 I | cephclient: setting pool property "pg_num_min" to "8" on pool "ceph-objectstore.rgw.meta"
2024-02-09 12:35:20.203839 I | ceph-spec: parsing mon endpoints: a=10.104.233.67:6789,b=10.102.212.254:6789,c=10.106.235.6:6789
2024-02-09 12:35:20.203984 I | ceph-fs-subvolumegroup-controller: creating ceph filesystem subvolume group ceph-filesystem-csi in namespace rook-ceph
2024-02-09 12:35:20.204001 I | cephclient: creating cephfs "ceph-filesystem" subvolume group "csi"
2024-02-09 12:35:20.746172 I | cephclient: successfully created cephfs "ceph-filesystem" subvolume group "csi"
2024-02-09 12:35:20.751422 I | cephclient: validating pinning configuration of cephfs subvolume group rook-ceph/csi of filesystem "ceph-filesystem"
2024-02-09 12:35:20.751687 I | cephclient: pinning cephfs subvolume group rook-ceph/csi of filesystem "ceph-filesystem"
2024-02-09 12:35:20.751738 I | cephclient: subvolume group pinning args [fs subvolumegroup pin ceph-filesystem csi distributed 1]

Error my pods

MountVolume.MountDevice failed for volume "pvc-892a9b0c-d8c2-4522-8367-ff167b8ea0a8" : rpc error: code = Internal desc = an error (exit status 1) occurred while running modprobe args: [ceph]

Check modprobe rbd:

lsmod | grep rbd

The module is loaded in all nodes

installation step

install-operator.sh

#!/bin/bash

echo "------ Installing Rook Ceph Operator ------"
echo ""

helm upgrade rook-ceph rook-ceph \
  --install \
  --namespace rook-ceph \
  --create-namespace \
  --version 1.13.3 \
  -f rook-ceph-operator-values.yaml \
  --repo https://charts.rook.io/release

kubectl wait --namespace rook-ceph --for=condition=ready pod -l app=rook-ceph-operator

echo ""
echo "------ Rook Ceph Operator installed ------"
echo ""

rook-ceph-operator-values.yaml

crds:
  enabled: true

resources:
  limits:
    cpu: 500m
    memory: 512Mi
  requests:
    cpu: 10m
    memory: 10Mi

nodeSelector: {}

tolerations: []

unreachableNodeTolerationSeconds: 5

currentNamespaceOnly: false

annotations: {}

logLevel: INFO

rbacEnable: true

rbacAggregate:
  enableOBCs: false

pspEnable: false

priorityClassName:

containerSecurityContext:
  runAsNonRoot: true
  runAsUser: 2016
  runAsGroup: 2016
  capabilities:
    drop: ["ALL"]
allowLoopDevices: false

csi:
  enableRbdDriver: true
  enableCephfsDriver: true
  enableCSIHostNetwork: true
  enableCephfsSnapshotter: true
  enableNFSSnapshotter: true
  enableRBDSnapshotter: true
  enablePluginSelinuxHostMount: false
  enableCSIEncryption: false

  pluginPriorityClassName: system-node-critical

  provisionerPriorityClassName: system-cluster-critical

  rbdFSGroupPolicy: "File"

  cephFSFSGroupPolicy: "File"

  nfsFSGroupPolicy: "File"

  enableOMAPGenerator: false

  cephFSKernelMountOptions:

  enableMetadata: false

  provisionerReplicas: 2

  clusterName:

  logLevel: 0

  sidecarLogLevel:

  rbdPluginUpdateStrategy:

  rbdPluginUpdateStrategyMaxUnavailable:

  cephFSPluginUpdateStrategy:

  cephFSPluginUpdateStrategyMaxUnavailable:

  nfsPluginUpdateStrategy:

  grpcTimeoutInSeconds: 150

  allowUnsupportedVersion: false

  csiRBDPluginVolume:

  csiRBDPluginVolumeMount:

  csiCephFSPluginVolume:

  csiCephFSPluginVolumeMount:

  csiRBDProvisionerResource: |
    - name : csi-provisioner
      resource:
        requests:
          memory: 10Mi
          cpu: 10m
        limits:
          memory: 256Mi
          cpu: 200m
    - name : csi-resizer
      resource:
        requests:
          memory: 10Mi
          cpu: 10m
        limits:
          memory: 256Mi
          cpu: 200m
    - name : csi-attacher
      resource:
        requests:
          memory: 10Mi
          cpu: 10m
        limits:
          memory: 256Mi
          cpu: 200m
    - name : csi-snapshotter
      resource:
        requests:
          memory: 10Mi
          cpu: 10m
        limits:
          memory: 256Mi
          cpu: 200m
    - name : csi-rbdplugin
      resource:
        requests:
          memory: 10Mi
          cpu: 10m
        limits:
          memory: 1Gi
          cpu: 500m
    - name : csi-omap-generator
      resource:
        requests:
          memory: 10Mi
          cpu: 10m
        limits:
          memory: 1Gi
          cpu: 500m
    - name : liveness-prometheus
      resource:
        requests:
          memory: 10Mi
          cpu: 10m
        limits:
          memory: 256Mi
          cpu: 100m

  csiRBDPluginResource: |
    - name : driver-registrar
      resource:
        requests:
          memory: 10Mi
          cpu: 10m
        limits:
          memory: 256Mi
          cpu: 100m
    - name : csi-rbdplugin
      resource:
        requests:
          memory: 10Mi
          cpu: 10m
        limits:
          memory: 1Gi
          cpu: 500m
    - name : liveness-prometheus
      resource:
        requests:
          memory: 10Mi
          cpu: 10m
        limits:
          memory: 256Mi
          cpu: 100m

  csiCephFSProvisionerResource: |
    - name : csi-provisioner
      resource:
        requests:
          memory: 10Mi
          cpu: 10m
        limits:
          memory: 256Mi
          cpu: 200m
    - name : csi-resizer
      resource:
        requests:
          memory: 10Mi
          cpu: 10m
        limits:
          memory: 256Mi
          cpu: 200m
    - name : csi-attacher
      resource:
        requests:
          memory: 10Mi
          cpu: 10m
        limits:
          memory: 256Mi
          cpu: 200m
    - name : csi-snapshotter
      resource:
        requests:
          memory: 10Mi
          cpu: 10m
        limits:
          memory: 256Mi
          cpu: 200m
    - name : csi-cephfsplugin
      resource:
        requests:
          memory: 10Mi
          cpu: 10m
        limits:
          memory: 1Gi
          cpu: 500m
    - name : liveness-prometheus
      resource:
        requests:
          memory: 10Mi
          cpu: 10m
        limits:
          memory: 256Mi
          cpu: 100m

  csiCephFSPluginResource: |
    - name : driver-registrar
      resource:
        requests:
          memory: 10Mi
          cpu: 10m
        limits:
          memory: 256Mi
          cpu: 100m
    - name : csi-cephfsplugin
      resource:
        requests:
          memory: 10Mi
          cpu: 10m
        limits:
          memory: 1Gi
          cpu: 500m
    - name : liveness-prometheus
      resource:
        requests:
          memory: 10Mi
          cpu: 10m
        limits:
          memory: 256Mi
          cpu: 100m

  csiNFSProvisionerResource: |
    - name : csi-provisioner
      resource:
        requests:
          memory: 10Mi
          cpu: 10m
        limits:
          memory: 256Mi
          cpu: 200m
    - name : csi-nfsplugin
      resource:
        requests:
          memory: 10Mi
          cpu: 10m
        limits:
          memory: 1Gi
          cpu: 500m
    - name : csi-attacher
      resource:
        requests:
          memory: 10Mi
          cpu: 10m
        limits:
          memory: 1Gi
          cpu: 500m

  csiNFSPluginResource: |
    - name : driver-registrar
      resource:
        requests:
          memory: 10Mi
          cpu: 10m
        limits:
          memory: 256Mi
          cpu: 100m
    - name : csi-nfsplugin
      resource:
        requests:
          memory: 10Mi
          cpu: 10m
        limits:
          memory: 1Gi
          cpu: 500m


  provisionerTolerations:

  provisionerNodeAffinity: #key1=value1,value2; key2=value3

  pluginTolerations:

  pluginNodeAffinity: # key1=value1,value2; key2=value3

  enableLiveness: false

  cephfsLivenessMetricsPort:

  csiAddonsPort:

  forceCephFSKernelClient: true

  rbdLivenessMetricsPort:

  serviceMonitor:
    enabled: true
    interval: 5s
    labels: {}
    namespace:

  kubeletDirPath:

  cephcsi:
    image:

  registrar:
    image:

  provisioner:
    image:

  snapshotter:
    image:

  attacher:
    image:

  resizer:
    image:

  imagePullPolicy: IfNotPresent

  cephfsPodLabels: #"key1=value1,key2=value2"

  nfsPodLabels: #"key1=value1,key2=value2"

  rbdPodLabels: #"key1=value1,key2=value2"

  csiAddons:
    enabled: false

  nfs:
    enabled: false

  topology:
    enabled: false
    domainLabels:

  readAffinity:
    enabled: false
    crushLocationLabels:

  cephFSAttachRequired: true
  rbdAttachRequired: true
  nfsAttachRequired: true

enableDiscoveryDaemon: true
discoveryDaemonInterval: 1m

cephCommandsTimeoutSeconds: "15"

useOperatorHostNetwork:

scaleDownOperator: false


discover:
  toleration:
  tolerationKey:
  tolerations:
  nodeAffinity: # key1=value1,value2; key2=value3
  podLabels: # "key1=value1,key2=value2"
  resources:

hostpathRequiresPrivileged: false

disableDeviceHotplug: false

discoverDaemonUdev:

imagePullSecrets:

enableOBCWatchOperatorNamespace: true

monitoring:
  enabled: true

install-cluster.sh

#!/bin/bash

echo "------ Installing Rook Ceph Cluster ------"
echo ""

helm upgrade rook-ceph-cluster rook-ceph-cluster \
  --install \
  --namespace rook-ceph \
  --create-namespace \
  --version 1.13.3 \
  -f rook-ceph-cluster-values.yaml \
  --repo https://charts.rook.io/release

kubectl wait --namespace rook-ceph --for=condition=ready pod -l app=rook-ceph-osd,app=rook-ceph-mon,app=rook-ceph-mgr

echo ""
echo "------ Rook Ceph Operator Cluster ------"
echo ""

rook-ceph-cluster-values.yaml

operatorNamespace: rook-ceph

clusterName:

kubeVersion:

configOverride:

toolbox:
  enabled: true
  image: #quay.io/ceph/ceph:v17.2.6
  tolerations: []
  affinity: {}
  containerSecurityContext:
    runAsNonRoot: true
    runAsUser: 2016
    runAsGroup: 2016
    capabilities:
      drop: ["ALL"]
  resources:
    limits:
      cpu: "500m"
      memory: "1Gi"
    requests:
      memory: 10Mi
      cpu: 10m
  priorityClassName:

monitoring:
  enabled: true
  createPrometheusRules: true
  rulesNamespaceOverride:
  prometheusRule:
    labels: {}
    annotations: {}

pspEnable: false


cephClusterSpec:

  cephVersion:
    allowUnsupported: false

  dataDirHostPath: /var/lib/rook

  skipUpgradeChecks: false

  continueUpgradeAfterChecksEvenIfNotHealthy: false

  waitTimeoutForHealthyOSDInMinutes: 10

  mon:
    count: 3
    allowMultiplePerNode: false

  mgr:
    count: 2
    allowMultiplePerNode: false
    modules:
      - name: pg_autoscaler
        enabled: true

  dashboard:
    enabled: true
    ssl: false

  network:
    connections:
      encryption:
        enabled: false
      compression:
        enabled: false
      requireMsgr2: false

  crashCollector:
    disable: false

  logCollector:
    enabled: true
    periodicity: daily # one of: hourly, daily, weekly, monthly
    maxLogSize: 500M # SUFFIX may be 'M' or 'G'. Must be at least 1M.

  cleanupPolicy:
    confirmation: ""
    sanitizeDisks:
      method: quick
      dataSource: zero
      iteration: 1
    allowUninstallWithVolumes: false




  resources:
    mgr:
      limits:
        cpu: "1000m"
        memory: "1Gi"
      requests:
        memory: 10Mi
        cpu: 10m
    mon:
      limits:
        cpu: "2000m"
        memory: "2Gi"
      requests:
        memory: 10Mi
        cpu: 10m
    osd:
      limits:
        cpu: "2000m"
        memory: "4Gi"
      requests:
        memory: 10Mi
        cpu: 10m
    prepareosd:
      requests:
        memory: 10Mi
        cpu: 10m
    mgr-sidecar:
      limits:
        cpu: "500m"
        memory: "100Mi"
      requests:
        memory: 10Mi
        cpu: 10m
    crashcollector:
      limits:
        cpu: "500m"
        memory: "60Mi"
      requests:
        memory: 10Mi
        cpu: 10m
    logcollector:
      limits:
        cpu: "500m"
        memory: "1Gi"
      requests:
        memory: 10Mi
        cpu: 10m
    cleanup:
      limits:
        cpu: "500m"
        memory: "1Gi"
      requests:
        memory: 10Mi
        cpu: 10m
    exporter:
      limits:
        cpu: "250m"
        memory: "128Mi"
      requests:
        memory: 10Mi
        cpu: 10m

  removeOSDsIfOutAndSafeToRemove: false

  priorityClassNames:
    mon: system-node-critical
    osd: system-node-critical
    mgr: system-cluster-critical

  storage: # cluster level storage configuration and selection
    useAllNodes: true
    useAllDevices: true

  disruptionManagement:
    managePodBudgets: true
    osdMaintenanceTimeout: 30
    pgHealthCheckTimeout: 0

  healthCheck:
    daemonHealth:
      mon:
        disabled: false
        interval: 45s
      osd:
        disabled: false
        interval: 60s
      status:
        disabled: false
        interval: 60s
    livenessProbe:
      mon:
        disabled: false
      mgr:
        disabled: false
      osd:
        disabled: false

ingress:
  dashboard:
    {}

cephBlockPools:
  - name: ceph-blockpool
    spec:
      failureDomain: host
      replicated:
        size: 3
    storageClass:
      enabled: true
      name: ceph-block
      isDefault: true
      reclaimPolicy: Delete
      allowVolumeExpansion: true
      volumeBindingMode: "Immediate"
      mountOptions: []
      allowedTopologies: []
      parameters:


        imageFormat: "2"

        imageFeatures: layering

        csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
        csi.storage.k8s.io/provisioner-secret-namespace: "{{ .Release.Namespace }}"
        csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
        csi.storage.k8s.io/controller-expand-secret-namespace: "{{ .Release.Namespace }}"
        csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
        csi.storage.k8s.io/node-stage-secret-namespace: "{{ .Release.Namespace }}"
        csi.storage.k8s.io/fstype: ext4

cephFileSystems:
  - name: ceph-filesystem
    spec:
      metadataPool:
        replicated:
          size: 3
      dataPools:
        - failureDomain: host
          replicated:
            size: 3
          name: data0
      metadataServer:
        activeCount: 1
        activeStandby: true
        resources:
          limits:
            cpu: "2000m"
            memory: "4Gi"
          requests:
            memory: 10Mi
            cpu: 10m
        priorityClassName: system-cluster-critical
    storageClass:
      enabled: true
      isDefault: false
      name: ceph-filesystem
      pool: data0
      reclaimPolicy: Delete
      allowVolumeExpansion: true
      volumeBindingMode: "Immediate"
      mountOptions: []
      parameters:
        csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
        csi.storage.k8s.io/provisioner-secret-namespace: "{{ .Release.Namespace }}"
        csi.storage.k8s.io/controller-expand-secret-name: rook-csi-cephfs-provisioner
        csi.storage.k8s.io/controller-expand-secret-namespace: "{{ .Release.Namespace }}"
        csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
        csi.storage.k8s.io/node-stage-secret-namespace: "{{ .Release.Namespace }}"
        csi.storage.k8s.io/fstype: ext4

cephFileSystemVolumeSnapshotClass:
  enabled: false
  name: ceph-filesystem
  isDefault: true
  deletionPolicy: Delete
  annotations: {}
  labels: {}
  parameters: {}

cephBlockPoolsVolumeSnapshotClass:
  enabled: false
  name: ceph-block
  isDefault: false
  deletionPolicy: Delete
  annotations: {}
  labels: {}
  parameters: {}

cephObjectStores:
  - name: ceph-objectstore
    spec:
      metadataPool:
        failureDomain: host
        replicated:
          size: 3
      dataPool:
        failureDomain: host
        erasureCoded:
          dataChunks: 2
          codingChunks: 1
      preservePoolsOnDelete: true
      gateway:
        port: 80
        resources:
          limits:
            cpu: "2000m"
            memory: "2Gi"
          requests:
            memory: 10Mi
            cpu: 10m
        instances: 1
        priorityClassName: system-cluster-critical
    storageClass:
      enabled: true
      name: ceph-bucket
      reclaimPolicy: Delete
      volumeBindingMode: "Immediate"
      parameters:
        region: us-east-1
    ingress:
      enabled: false

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions