Skip to content

Rook fails to decode MONs DNS name (external mode, mgr2) #17443

@ArjonBu

Description

@ArjonBu

Is this a bug report or feature request?

  • Bug Report

I am running Rook Ceph in external mode. On Master it's running with hostNetwork exposed through public IPs. I have external-dns updating a DNS record for the active MON and for MGR to expose metrics to the client clusters.

I want to deploy rook in external mode on client-1. In order to do so, I generate the credentials with create-external-cluster-resources.py. Then I modify ROOK_EXTERNAL_CEPH_MON_DATA with my DNS name.

export ROOK_EXTERNAL_CEPH_MON_DATA=a=[v2:ceph-mons.domain.com:3300]

which results in the following configmap

apiVersion: v1
kind: ConfigMap
metadata:
  name: rook-ceph-mon-endpoints
  namespace: rook-ceph
data:
  data: a=[v2:ceph-mons.domain.com:3300]
  mapping: '{}'
  maxMonId: "2"

The problem is that the operator fails to parse the DNS name. See the operator logs below

2026-04-25 19:10:23.047847 I | cluster-controller: [rook-ceph/rook-ceph] reconciling ceph cluster
2026-04-25 19:10:23.076241 I | ceph-spec: parsing mon endpoints: a=[v2:ceph-mons.domain.com:3300]
2026-04-25 19:10:23.076298 I | ceph-spec: [rook-ceph] found the cluster info to connect to the external cluster. will use "client.healthchecker" to check health and monitor status. mons=map[a:0xc00136a420]
2026-04-25 19:10:23.076309 I | ceph-spec: detecting the ceph image version for image quay.io/ceph/ceph:v19.2.3...
2026-04-25 19:10:24.792384 I | ceph-spec: detected ceph image version: "19.2.3-0 squid"
2026-04-25 19:10:24.792399 I | cluster-controller: [rook-ceph] validating ceph version from provided image
2026-04-25 19:10:24.801077 I | ceph-spec: parsing mon endpoints: a=[v2:ceph-mons.domain.com:3300]
2026-04-25 19:10:24.801415 E | op-ceph-util: failed to split ip and port for endpoint "[v2:ceph-mons.domain.com:3300]". address [v2:ceph-mons.domain.com:3300]: missing port in address
2026-04-25 19:10:24.801442 E | op-ceph-util: failed to split host and port for endpoint "[v2:ceph-mons.domain.com:3300]", assuming default Ceph port "". address [v2:ceph-mons.domain.com:3300]: missing port in address
2026-04-25 19:10:24.805853 I | cephclient: writing config file /var/lib/rook/rook-ceph/rook-ceph.config
2026-04-25 19:10:24.806103 I | cephclient: generated admin config in /var/lib/rook/rook-ceph
2026-04-25 19:10:24.928166 E | cluster-controller: failed to reconcile CephCluster "rook-ceph/rook-ceph". failed to reconcile cluster "rook-ceph": failed to configure external ceph cluster: failed to detect and validate ceph version: failed to validate ceph version between external and local: failed to get ceph mon version: failed to run 'ceph version'. server name not found: [v2::3300 (Name or service not known)
. unable to parse addrs in '[v2::3300,v1::0]'
2026-04-25T19:10:17.899+0000 7f9b1e2fd640 -1 monclient: get_monmap_and_config cannot identify monitors to contact
[errno 22] RADOS invalid argument (error connecting to the cluster): exit status 1

Environment:

  • OS (e.g. from /etc/os-release): Talos 1.12
  • Kernel (e.g. uname -a): 6.18.1-talos
  • Cloud provider or hardware configuration: Hetzner bare-meral
  • Rook version (use rook version inside of a Rook Pod): 1.19.4
  • Storage backend version (e.g. for ceph do ceph -v): 20.2.1
  • Kubernetes version (use kubectl version): v1.34.1
  • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): Talos
  • Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox): Master cluster healthy

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions