-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Closed
Labels
Description
Bug Report
When i use CephFilesystem on Macbook M2 with minikube I can't mount a volume on a pod.
I have no problem in production on x86_64 architecture
How to reproduce it (minimal and precise):
- Macbook M2
- minikube v1.32.0
- Command used:
minikube start --disk-size 100g --cpus 2 --extra-disks 1 --memory 5g --nodes 3 --driver qemu --profile athena --network socket_vmnet - I used helm chart for deploy operator and cluster
Health status:
Command kubectl rook-ceph ceph status
Output:
cluster:
id: 7018d828-f6f1-4caf-bef2-264e53b1891b
health: HEALTH_OK
services:
mon: 3 daemons, quorum a,b,c (age 24m)
mgr: b(active, since 23m), standbys: a
mds: 1/1 daemons up, 1 hot standby
osd: 3 osds: 3 up (since 23m), 3 in (since 23m)
rgw: 1 daemon active (1 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 12 pools, 169 pgs
objects: 426 objects, 65 MiB
usage: 299 MiB used, 300 GiB / 300 GiB avail
pgs: 169 active+clean
io:
client: 1.1 KiB/s rd, 2.1 KiB/s wr, 1 op/s rd, 0 op/s wrLog operator parts ceph-filesystem:
2024-02-09 12:35:13.469929 I | cephclient: creating a new crush rule for changed deviceClass on crush rule "ceph-filesystem-data0"
2024-02-09 12:35:13.469944 I | cephclient: updating pool "ceph-filesystem-data0" failure domain from "host" to "host" with new crush rule "ceph-filesystem-data0_host"
2024-02-09 12:35:13.469946 I | cephclient: crush rule "ceph-filesystem-data0" will no longer be used by pool "ceph-filesystem-data0"
2024-02-09 12:35:15.044554 I | cephclient: Successfully updated pool "ceph-filesystem-data0" failure domain to "host"
2024-02-09 12:35:15.044654 I | cephclient: creating filesystem "ceph-filesystem" with metadata pool "ceph-filesystem-metadata" and data pools [ceph-filesystem-data0]
2024-02-09 12:35:16.054778 I | cephclient: reconciling replicated pool ceph-objectstore.rgw.meta succeeded
2024-02-09 12:35:16.898292 I | ceph-file-controller: created filesystem "ceph-filesystem" on 1 data pool(s) and metadata pool "ceph-filesystem-metadata"
2024-02-09 12:35:16.898459 I | cephclient: setting allow_standby_replay to true for filesystem "ceph-filesystem"
2024-02-09 12:35:17.215112 I | cephclient: creating a new crush rule for changed deviceClass on crush rule "ceph-objectstore.rgw.meta"
2024-02-09 12:35:17.215127 I | cephclient: updating pool "ceph-objectstore.rgw.meta" failure domain from "host" to "host" with new crush rule "ceph-objectstore.rgw.meta_host"
2024-02-09 12:35:17.215130 I | cephclient: crush rule "ceph-objectstore.rgw.meta" will no longer be used by pool "ceph-objectstore.rgw.meta"
2024-02-09 12:35:17.833347 I | cephclient: creating cephfs "ceph-filesystem" subvolume group "csi"
2024-02-09 12:35:18.306880 I | cephclient: successfully created cephfs "ceph-filesystem" subvolume group "csi"
2024-02-09 12:35:18.936719 I | clusterdisruption-controller: all "host" failure domains: [athena athena-m02 athena-m03]. osd is down in failure domain: "". active node drains: false. pg health: "cluster is not fully clean. PGs: [{StateName:active+clean Count:12} {StateName:creating+peering Count:8}]"
2024-02-09 12:35:19.066436 I | cephclient: Successfully updated pool "ceph-objectstore.rgw.meta" failure domain to "host"
2024-02-09 12:35:19.066453 I | cephclient: setting pool property "pg_num_min" to "8" on pool "ceph-objectstore.rgw.meta"
2024-02-09 12:35:20.203839 I | ceph-spec: parsing mon endpoints: a=10.104.233.67:6789,b=10.102.212.254:6789,c=10.106.235.6:6789
2024-02-09 12:35:20.203984 I | ceph-fs-subvolumegroup-controller: creating ceph filesystem subvolume group ceph-filesystem-csi in namespace rook-ceph
2024-02-09 12:35:20.204001 I | cephclient: creating cephfs "ceph-filesystem" subvolume group "csi"
2024-02-09 12:35:20.746172 I | cephclient: successfully created cephfs "ceph-filesystem" subvolume group "csi"
2024-02-09 12:35:20.751422 I | cephclient: validating pinning configuration of cephfs subvolume group rook-ceph/csi of filesystem "ceph-filesystem"
2024-02-09 12:35:20.751687 I | cephclient: pinning cephfs subvolume group rook-ceph/csi of filesystem "ceph-filesystem"
2024-02-09 12:35:20.751738 I | cephclient: subvolume group pinning args [fs subvolumegroup pin ceph-filesystem csi distributed 1]Error my pods
MountVolume.MountDevice failed for volume "pvc-892a9b0c-d8c2-4522-8367-ff167b8ea0a8" : rpc error: code = Internal desc = an error (exit status 1) occurred while running modprobe args: [ceph]
Check modprobe rbd:
lsmod | grep rbdThe module is loaded in all nodes
installation step
install-operator.sh
#!/bin/bash
echo "------ Installing Rook Ceph Operator ------"
echo ""
helm upgrade rook-ceph rook-ceph \
--install \
--namespace rook-ceph \
--create-namespace \
--version 1.13.3 \
-f rook-ceph-operator-values.yaml \
--repo https://charts.rook.io/release
kubectl wait --namespace rook-ceph --for=condition=ready pod -l app=rook-ceph-operator
echo ""
echo "------ Rook Ceph Operator installed ------"
echo ""rook-ceph-operator-values.yaml
crds:
enabled: true
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 10m
memory: 10Mi
nodeSelector: {}
tolerations: []
unreachableNodeTolerationSeconds: 5
currentNamespaceOnly: false
annotations: {}
logLevel: INFO
rbacEnable: true
rbacAggregate:
enableOBCs: false
pspEnable: false
priorityClassName:
containerSecurityContext:
runAsNonRoot: true
runAsUser: 2016
runAsGroup: 2016
capabilities:
drop: ["ALL"]
allowLoopDevices: false
csi:
enableRbdDriver: true
enableCephfsDriver: true
enableCSIHostNetwork: true
enableCephfsSnapshotter: true
enableNFSSnapshotter: true
enableRBDSnapshotter: true
enablePluginSelinuxHostMount: false
enableCSIEncryption: false
pluginPriorityClassName: system-node-critical
provisionerPriorityClassName: system-cluster-critical
rbdFSGroupPolicy: "File"
cephFSFSGroupPolicy: "File"
nfsFSGroupPolicy: "File"
enableOMAPGenerator: false
cephFSKernelMountOptions:
enableMetadata: false
provisionerReplicas: 2
clusterName:
logLevel: 0
sidecarLogLevel:
rbdPluginUpdateStrategy:
rbdPluginUpdateStrategyMaxUnavailable:
cephFSPluginUpdateStrategy:
cephFSPluginUpdateStrategyMaxUnavailable:
nfsPluginUpdateStrategy:
grpcTimeoutInSeconds: 150
allowUnsupportedVersion: false
csiRBDPluginVolume:
csiRBDPluginVolumeMount:
csiCephFSPluginVolume:
csiCephFSPluginVolumeMount:
csiRBDProvisionerResource: |
- name : csi-provisioner
resource:
requests:
memory: 10Mi
cpu: 10m
limits:
memory: 256Mi
cpu: 200m
- name : csi-resizer
resource:
requests:
memory: 10Mi
cpu: 10m
limits:
memory: 256Mi
cpu: 200m
- name : csi-attacher
resource:
requests:
memory: 10Mi
cpu: 10m
limits:
memory: 256Mi
cpu: 200m
- name : csi-snapshotter
resource:
requests:
memory: 10Mi
cpu: 10m
limits:
memory: 256Mi
cpu: 200m
- name : csi-rbdplugin
resource:
requests:
memory: 10Mi
cpu: 10m
limits:
memory: 1Gi
cpu: 500m
- name : csi-omap-generator
resource:
requests:
memory: 10Mi
cpu: 10m
limits:
memory: 1Gi
cpu: 500m
- name : liveness-prometheus
resource:
requests:
memory: 10Mi
cpu: 10m
limits:
memory: 256Mi
cpu: 100m
csiRBDPluginResource: |
- name : driver-registrar
resource:
requests:
memory: 10Mi
cpu: 10m
limits:
memory: 256Mi
cpu: 100m
- name : csi-rbdplugin
resource:
requests:
memory: 10Mi
cpu: 10m
limits:
memory: 1Gi
cpu: 500m
- name : liveness-prometheus
resource:
requests:
memory: 10Mi
cpu: 10m
limits:
memory: 256Mi
cpu: 100m
csiCephFSProvisionerResource: |
- name : csi-provisioner
resource:
requests:
memory: 10Mi
cpu: 10m
limits:
memory: 256Mi
cpu: 200m
- name : csi-resizer
resource:
requests:
memory: 10Mi
cpu: 10m
limits:
memory: 256Mi
cpu: 200m
- name : csi-attacher
resource:
requests:
memory: 10Mi
cpu: 10m
limits:
memory: 256Mi
cpu: 200m
- name : csi-snapshotter
resource:
requests:
memory: 10Mi
cpu: 10m
limits:
memory: 256Mi
cpu: 200m
- name : csi-cephfsplugin
resource:
requests:
memory: 10Mi
cpu: 10m
limits:
memory: 1Gi
cpu: 500m
- name : liveness-prometheus
resource:
requests:
memory: 10Mi
cpu: 10m
limits:
memory: 256Mi
cpu: 100m
csiCephFSPluginResource: |
- name : driver-registrar
resource:
requests:
memory: 10Mi
cpu: 10m
limits:
memory: 256Mi
cpu: 100m
- name : csi-cephfsplugin
resource:
requests:
memory: 10Mi
cpu: 10m
limits:
memory: 1Gi
cpu: 500m
- name : liveness-prometheus
resource:
requests:
memory: 10Mi
cpu: 10m
limits:
memory: 256Mi
cpu: 100m
csiNFSProvisionerResource: |
- name : csi-provisioner
resource:
requests:
memory: 10Mi
cpu: 10m
limits:
memory: 256Mi
cpu: 200m
- name : csi-nfsplugin
resource:
requests:
memory: 10Mi
cpu: 10m
limits:
memory: 1Gi
cpu: 500m
- name : csi-attacher
resource:
requests:
memory: 10Mi
cpu: 10m
limits:
memory: 1Gi
cpu: 500m
csiNFSPluginResource: |
- name : driver-registrar
resource:
requests:
memory: 10Mi
cpu: 10m
limits:
memory: 256Mi
cpu: 100m
- name : csi-nfsplugin
resource:
requests:
memory: 10Mi
cpu: 10m
limits:
memory: 1Gi
cpu: 500m
provisionerTolerations:
provisionerNodeAffinity: #key1=value1,value2; key2=value3
pluginTolerations:
pluginNodeAffinity: # key1=value1,value2; key2=value3
enableLiveness: false
cephfsLivenessMetricsPort:
csiAddonsPort:
forceCephFSKernelClient: true
rbdLivenessMetricsPort:
serviceMonitor:
enabled: true
interval: 5s
labels: {}
namespace:
kubeletDirPath:
cephcsi:
image:
registrar:
image:
provisioner:
image:
snapshotter:
image:
attacher:
image:
resizer:
image:
imagePullPolicy: IfNotPresent
cephfsPodLabels: #"key1=value1,key2=value2"
nfsPodLabels: #"key1=value1,key2=value2"
rbdPodLabels: #"key1=value1,key2=value2"
csiAddons:
enabled: false
nfs:
enabled: false
topology:
enabled: false
domainLabels:
readAffinity:
enabled: false
crushLocationLabels:
cephFSAttachRequired: true
rbdAttachRequired: true
nfsAttachRequired: true
enableDiscoveryDaemon: true
discoveryDaemonInterval: 1m
cephCommandsTimeoutSeconds: "15"
useOperatorHostNetwork:
scaleDownOperator: false
discover:
toleration:
tolerationKey:
tolerations:
nodeAffinity: # key1=value1,value2; key2=value3
podLabels: # "key1=value1,key2=value2"
resources:
hostpathRequiresPrivileged: false
disableDeviceHotplug: false
discoverDaemonUdev:
imagePullSecrets:
enableOBCWatchOperatorNamespace: true
monitoring:
enabled: trueinstall-cluster.sh
#!/bin/bash
echo "------ Installing Rook Ceph Cluster ------"
echo ""
helm upgrade rook-ceph-cluster rook-ceph-cluster \
--install \
--namespace rook-ceph \
--create-namespace \
--version 1.13.3 \
-f rook-ceph-cluster-values.yaml \
--repo https://charts.rook.io/release
kubectl wait --namespace rook-ceph --for=condition=ready pod -l app=rook-ceph-osd,app=rook-ceph-mon,app=rook-ceph-mgr
echo ""
echo "------ Rook Ceph Operator Cluster ------"
echo ""rook-ceph-cluster-values.yaml
operatorNamespace: rook-ceph
clusterName:
kubeVersion:
configOverride:
toolbox:
enabled: true
image: #quay.io/ceph/ceph:v17.2.6
tolerations: []
affinity: {}
containerSecurityContext:
runAsNonRoot: true
runAsUser: 2016
runAsGroup: 2016
capabilities:
drop: ["ALL"]
resources:
limits:
cpu: "500m"
memory: "1Gi"
requests:
memory: 10Mi
cpu: 10m
priorityClassName:
monitoring:
enabled: true
createPrometheusRules: true
rulesNamespaceOverride:
prometheusRule:
labels: {}
annotations: {}
pspEnable: false
cephClusterSpec:
cephVersion:
allowUnsupported: false
dataDirHostPath: /var/lib/rook
skipUpgradeChecks: false
continueUpgradeAfterChecksEvenIfNotHealthy: false
waitTimeoutForHealthyOSDInMinutes: 10
mon:
count: 3
allowMultiplePerNode: false
mgr:
count: 2
allowMultiplePerNode: false
modules:
- name: pg_autoscaler
enabled: true
dashboard:
enabled: true
ssl: false
network:
connections:
encryption:
enabled: false
compression:
enabled: false
requireMsgr2: false
crashCollector:
disable: false
logCollector:
enabled: true
periodicity: daily # one of: hourly, daily, weekly, monthly
maxLogSize: 500M # SUFFIX may be 'M' or 'G'. Must be at least 1M.
cleanupPolicy:
confirmation: ""
sanitizeDisks:
method: quick
dataSource: zero
iteration: 1
allowUninstallWithVolumes: false
resources:
mgr:
limits:
cpu: "1000m"
memory: "1Gi"
requests:
memory: 10Mi
cpu: 10m
mon:
limits:
cpu: "2000m"
memory: "2Gi"
requests:
memory: 10Mi
cpu: 10m
osd:
limits:
cpu: "2000m"
memory: "4Gi"
requests:
memory: 10Mi
cpu: 10m
prepareosd:
requests:
memory: 10Mi
cpu: 10m
mgr-sidecar:
limits:
cpu: "500m"
memory: "100Mi"
requests:
memory: 10Mi
cpu: 10m
crashcollector:
limits:
cpu: "500m"
memory: "60Mi"
requests:
memory: 10Mi
cpu: 10m
logcollector:
limits:
cpu: "500m"
memory: "1Gi"
requests:
memory: 10Mi
cpu: 10m
cleanup:
limits:
cpu: "500m"
memory: "1Gi"
requests:
memory: 10Mi
cpu: 10m
exporter:
limits:
cpu: "250m"
memory: "128Mi"
requests:
memory: 10Mi
cpu: 10m
removeOSDsIfOutAndSafeToRemove: false
priorityClassNames:
mon: system-node-critical
osd: system-node-critical
mgr: system-cluster-critical
storage: # cluster level storage configuration and selection
useAllNodes: true
useAllDevices: true
disruptionManagement:
managePodBudgets: true
osdMaintenanceTimeout: 30
pgHealthCheckTimeout: 0
healthCheck:
daemonHealth:
mon:
disabled: false
interval: 45s
osd:
disabled: false
interval: 60s
status:
disabled: false
interval: 60s
livenessProbe:
mon:
disabled: false
mgr:
disabled: false
osd:
disabled: false
ingress:
dashboard:
{}
cephBlockPools:
- name: ceph-blockpool
spec:
failureDomain: host
replicated:
size: 3
storageClass:
enabled: true
name: ceph-block
isDefault: true
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: "Immediate"
mountOptions: []
allowedTopologies: []
parameters:
imageFormat: "2"
imageFeatures: layering
csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: "{{ .Release.Namespace }}"
csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/controller-expand-secret-namespace: "{{ .Release.Namespace }}"
csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
csi.storage.k8s.io/node-stage-secret-namespace: "{{ .Release.Namespace }}"
csi.storage.k8s.io/fstype: ext4
cephFileSystems:
- name: ceph-filesystem
spec:
metadataPool:
replicated:
size: 3
dataPools:
- failureDomain: host
replicated:
size: 3
name: data0
metadataServer:
activeCount: 1
activeStandby: true
resources:
limits:
cpu: "2000m"
memory: "4Gi"
requests:
memory: 10Mi
cpu: 10m
priorityClassName: system-cluster-critical
storageClass:
enabled: true
isDefault: false
name: ceph-filesystem
pool: data0
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: "Immediate"
mountOptions: []
parameters:
csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: "{{ .Release.Namespace }}"
csi.storage.k8s.io/controller-expand-secret-name: rook-csi-cephfs-provisioner
csi.storage.k8s.io/controller-expand-secret-namespace: "{{ .Release.Namespace }}"
csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
csi.storage.k8s.io/node-stage-secret-namespace: "{{ .Release.Namespace }}"
csi.storage.k8s.io/fstype: ext4
cephFileSystemVolumeSnapshotClass:
enabled: false
name: ceph-filesystem
isDefault: true
deletionPolicy: Delete
annotations: {}
labels: {}
parameters: {}
cephBlockPoolsVolumeSnapshotClass:
enabled: false
name: ceph-block
isDefault: false
deletionPolicy: Delete
annotations: {}
labels: {}
parameters: {}
cephObjectStores:
- name: ceph-objectstore
spec:
metadataPool:
failureDomain: host
replicated:
size: 3
dataPool:
failureDomain: host
erasureCoded:
dataChunks: 2
codingChunks: 1
preservePoolsOnDelete: true
gateway:
port: 80
resources:
limits:
cpu: "2000m"
memory: "2Gi"
requests:
memory: 10Mi
cpu: 10m
instances: 1
priorityClassName: system-cluster-critical
storageClass:
enabled: true
name: ceph-bucket
reclaimPolicy: Delete
volumeBindingMode: "Immediate"
parameters:
region: us-east-1
ingress:
enabled: false