WIP: ceph: add nvme-of gateway crd support #16689

OdedViner · 2025-11-10T08:05:51Z

Fixes: #15551
adds support for ceph nvme-of gateways through a new crd.

introduces CephNVMeOFGateway custom resource
implements controller for managing nvme-of gateway lifecycle
supports configurable instances, placement, resources, and host networking
generates gateway configuration and manages deployments/services
follows existing patterns from nfs controller implementation

Test NVMeOF on minikube Based on: https://github.com/ceph/ceph-csi/tree/devel/e2e/nvmeof

Deploy the Rook operator without the CSI operator, and use a private image for Ceph-CSI:
ROOK_CSI_CEPH_IMAGE: "quay.io/gdidi/cephcsi:nvmeof"
Also set Ceph version 20: quay.io/ceph/ceph:v20.”

kind: ConfigMap
apiVersion: v1
metadata:
  name: rook-ceph-operator-config
  # should be in the namespace of the operator
  namespace: rook-ceph # namespace:operator
data:
  ROOK_USE_CSI_OPERATOR: "false"
  ROOK_CSI_CEPH_IMAGE: "quay.io/gdidi/cephcsi:nvmeof"

$ kubectl get pods -n rook-ceph  csi-rbdplugin-l2rgc -o yaml| grep didi
    image: quay.io/gdidi/cephcsi:nvmeof
    image: quay.io/gdidi/cephcsi:nvmeof

Working with ceph version image: quay.io/ceph/ceph:v20

rook/deploy/examples/cluster-test.yaml

Line 18 in e3341a4

image: quay.io/ceph/ceph:v19

2.Create nvmeofpool and verify in ready phase

apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
  name: nvmeofpool
  namespace: rook-ceph # namespace:cluster
spec:
  failureDomain: osd
  replicated:
    size: 1

$ kubectl get Cephblockpool -A
NAMESPACE   NAME          PHASE   TYPE         FAILUREDOMAIN   AGE
rook-ceph   nvmeofpool    Ready   Replicated   osd             6m3s

3.Create CephNVMeOFGateway CR

apiVersion: ceph.rook.io/v1
kind: CephNVMeOFGateway
metadata:
  name: my-nvmeof
  namespace: rook-ceph # namespace:cluster
spec:
  # Pool name that will be used by the NVMe-oF gateway
  pool: nvmeofpool
  # ANA (Asymmetric Namespace Access) group name
  group: mygroup
  # Number of gateway instances to run
  instances: 1

4.Verify ceph-nvmeof-gateway pod runing state

$ kubectl get pods ceph-nvmeof-gateway-765dffc5bc-sfscr -n rook-ceph 
NAME                                   READY   STATUS    RESTARTS   AGE
ceph-nvmeof-gateway-765dffc5bc-sfscr   1/1     Running   0          3m20s

Checklist:

Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
Reviewed the developer guide on Submitting a Pull Request
Pending release notes updated with breaking and/or notable changes for the next minor release.
Documentation has been updated, if necessary.
Unit tests have been added, if necessary.
Integration tests have been added, if necessary.

OdedViner · 2025-11-10T15:13:18Z

pkg/operator/ceph/cr_manager.go


 	metricsBindAddress := k8sutil.GetOperatorSetting("ROOK_OPERATOR_METRICS_BIND_ADDRESS", "0")
 	skipNameValidation := true
+	cacheSyncTimeout := 10 * time.Minute


Adding the NVMeOF controller causes operator startup timeouts.
Increased CacheSyncTimeout to 10 minutes.
Rook operator logs with the private image show errors with the new NVMeoF code.
@travisn Do you know why this is happening?

2025-11-10 14:29:16.627898 D | clusterdisruption-controller: could not match failure domain. defaulting to "host" 2025-11-10 14:29:16.627938 D | exec: Running command: ceph osd metadata --connect-timeout=15 --cluster=rook-ceph --conf=/var/lib/rook/rook-ceph/rook-ceph.config --name=client.admin --keyring=/var/lib/rook/rook-ceph/client.admin.keyring --format json 2025-11-10 14:29:16.790834 E | clusterdisruption-controller: failed to get OSD status: failed to get osd metadata: exit status 1 2025-11-10 14:29:16.963003 I | ceph-csi: cluster info for cluster "my-cluster" is not ready yet, will retry in 10s, proceeding with ready clusters 2025-11-10 14:29:16.970832 C | rookcmd: failed to run operator: gave up to run the operator manager: failed to run the controller-runtime manager: failed to wait for ceph-object-realm-controller caches to sync kind source: *v1.CephObjectRealm: timed out waiting for cache to be synced for Kind *v1.CephObjectRealm

Probably check if the CephObjectRealm crd is present

On another test, I got this error

2025-11-11 12:41:59.314806 D | exec: Running command: ceph osd metadata --connect-timeout=15 --cluster=rook-ceph --conf=/var/lib/rook/rook-ceph/rook-ceph.config --name=client.admin --keyring=/var/lib/rook/rook-ceph/client.admin.keyring --format json 2025-11-11 12:41:59.506880 E | clusterdisruption-controller: failed to get OSD status: failed to get osd metadata: exit status 1 2025-11-11 12:41:59.509731 C | rookcmd: failed to run operator: gave up to run the operator manager: failed to run the controller-runtime manager: failed to wait for ceph-object-zone-controller caches to sync kind source: *v1.CephObjectZone: timed out waiting for cache to be synced for Kind *v1.CephObjectZone

cc @subhamkrai mentioned to add the register to scheme, can you try that

It is done in PR https://github.com/rook/rook/pull/16689/files#diff-6c58bb0ea33fb8d02e4b3fbc0319f912784be096e78b84b7188e2f570b946998R68
@subhamkrai did you do this?

may something like this will resolve this

rook/pkg/operator/ceph/cluster/controller.go

Lines 153 to 161 in 8424d09

err = addonsv1alpha1.AddToScheme(mgr.GetScheme())

if err != nil {

return err

}

err = csiopv1.AddToScheme(mgr.GetScheme())

if err != nil {

return err

}

OdedViner · 2025-11-10T15:49:38Z

pkg/operator/ceph/nvmeof/spec.go

+	script := fmt.Sprintf(`
+set -xEeuo pipefail
+
+cat << EOF > /etc/ceph/ceph.conf


@travisn based on https://github.com/ceph/ceph-csi/blob/devel/e2e/nvmeof/deployment.yaml#L66

That has been manually copied from the NFS-Ganesha deployment that Rook made. I think the script is generated by this:

rook/pkg/operator/ceph/controller/spec.go

Lines 570 to 613 in de8c2d2

// GenerateMinimalCephConfInitContainer returns an init container that will generate the most basic

// Ceph config for connecting non-Ceph daemons to a Ceph cluster (e.g., nfs-ganesha). Effectively

// what this means is that it generates '/etc/ceph/ceph.conf' with 'mon_host' populated and a

// keyring path associated with the user given. 'mon_host' is determined by the 'ROOK_CEPH_MON_HOST'

// env var present in other Ceph daemon pods, and the keyring is expected to be mounted into the

// container with a Kubernetes pod volume+mount.

func GenerateMinimalCephConfInitContainer(

username, keyringPath string,

containerImage string,

containerImagePullPolicy v1.PullPolicy,

volumeMounts []v1.VolumeMount,

resources v1.ResourceRequirements,

securityContext *v1.SecurityContext,

) v1.Container {

cfgPath := client.DefaultConfigFilePath()

// Note that parameters like $(PARAM) will be replaced by Kubernetes with env var content before

// container creation.

confScript := `

set -xEeuo pipefail

cat << EOF > ` + cfgPath + `

[global]

mon_host = $(ROOK_CEPH_MON_HOST)

[` + username + `]

keyring = ` + keyringPath + `

EOF

chmod 444 ` + cfgPath + `

cat ` + cfgPath + `

`

return v1.Container{

Name: "generate-minimal-ceph-conf",

Command: []string{"/bin/bash", "-c", confScript},

Args: []string{},

Image: containerImage,

ImagePullPolicy: containerImagePullPolicy,

VolumeMounts: volumeMounts,

Env: opconfig.StoredMonHostEnvVars(),

Resources: resources,

SecurityContext: securityContext,

}

}

The nvmeof.conf needs to be generated as well, maybe with a similar workflow in it's own init-container?

jhoblitt · 2025-11-10T22:04:04Z

pkg/operator/ceph/nvmeof/controller.go

+)
+
+const (
+	controllerName                = "ceph-nvmeof-gateway-controller"


This will be a fairly long prefix to log messages. Perhaps -controller suffix could be omitted from the logs similar to the ceph-bucket-notification controller?

jhoblitt · 2025-11-10T22:11:30Z

pkg/operator/ceph/nvmeof/controller.go

+
+[ceph]
+pool = `)
+	config.WriteString(poolName)


I'm not nacking this but I'm concerned this will be brittle. Perhaps it would be better to use a template?

I wonder if this is sufficient and works everywhere. With my testing on different cloud provider clusters and within Ceph-CSI e2e, I found that I needed to disable some defaults. But the documentation of the gateway configuration may give better clarity than what I managed to put together at https://github.com/ceph/ceph-csi/blob/devel/e2e/nvmeof/config.yaml

I'm also concerned, and I want to make sure we are clear on why we are choosing this pattern.

We are creating a configmap that has domain-specific templating defined @@VAR@@ instead of using well-understood templating semantics

Are we reconciling the configmap and overwriting any user modifications each time, or do we write it once and allow users to use it to modify their config. If we overwite it each time, we would be better off having an init container that creates this file each time so that users don't modify it and then complain when it changes later.

Cross reference: #16689 (comment)

jhoblitt · 2025-11-10T22:18:06Z

pkg/operator/ceph/nvmeof/controller.go

+func (r *ReconcileCephNVMeOFGateway) generateConfigMap(nvmeof *cephv1.CephNVMeOFGateway) *v1.ConfigMap {
+	poolName := nvmeof.Spec.Pool
+	if poolName == "" {
+		poolName = "nvmeofpool"


Is this a pool that Ceph will automatically create on when a nvmeof daemon is started or is it a default being created by Rook? If its the later, it might be better to require the end user to explicitly set the pool name.

@gadididi could we get a clear response here instead of a thumbs up? I'm not sure how to interpret this

The pool is not automatically created. The customer must create the CephBlockPool CR first, then create the CephNVMeOFGateway CR.
The default "nvmeofpool" is only used when spec.pool is omitted. The pool must already exist (created via CephBlockPool CR) before the gateway can use it.

https://github.com/rook/rook/pull/16689/files#diff-0e2dc8235e5a70baa91cb4a7aee6db662b882a358e73af0a4a3a8ec255220cc8R15

If this isn't a Ceph default, I would suggest not having this default. Instead, we can make the pool name a required input on the custom resource. This will avoid a minor amount of complexity to manage internally (maintenance and test simplification) and could avoid some bugs or user confusion.

I agree strongly with @BlaineEXE that it is better to require explicit declaration of the pool rather than magic defaults.

It might be worth considering if new CRDs should require that pools already exist or go a step further and require that a CephBlockPool CR exists (and its phase is Ready). The argument for introspecting on CephBlockPool CRs is that it allows decisions to be made about continuing with a reconcile by probing only k8s state (which may already be cached) instead of having to probe ceph's state.

I've made spec.pool required and removed the default nvmeofpool value. If you prefer, I can add logic in the controller to automatically create the CephBlockPool CR when a CephNVMeOFGateway is created.

Will the controller create the block pool via the Ceph CLI, similar to this function?

rook/pkg/operator/ceph/object/objectstore.go

Lines 1116 to 1117 in 70ec50b

func createRGWPool(ctx *Context, cluster *cephv1.ClusterSpec, poolSpec cephv1.PoolSpec, pgCount, requestedName string) error {

// create the pool if it doesn't exist yet

@BlaineEXE @jhoblitt

I prefer to do initial major-feature development by stripping away as many automations and minor features as possible to still get to a useful working result for end users.

In this case, automatic pool creation is what I consider premature user-workflow optimization in Rook controllers.

From observing Rook feature development over time, I have seen that trying automating things like this during the first development phase often results in users not having flexibility in their desired end state, requiring more developer rework over time.

@BlaineEXE Hi, I missed you tagged me.
yes, as Oded said, the rbd pool is not created automatically.
you can create it before or after the nvmeof GWs are up.
Also FYI, the nvmeof GWs can work with multiple pools.
but I don't know the requirements

jhoblitt · 2025-11-10T22:29:04Z

pkg/operator/ceph/nvmeof/spec.go

+sed -e "s/@@POD_NAME@@/${POD_NAME}/g" \
+    -e "s/@@ANA_GROUP@@/${ANA_GROUP}/g" \
+    -e "s/@@POD_IP@@/${POD_IP}/g" \
+    < /config/ceph-nvmeof.conf > /etc/ceph/nvmeof.conf


The cm that's been generated is only a template to have interpolation done in the init container? Why not generate the the entire config here and reduce the number of moving parts?

That is surely an option. However the nvmeof.conf contains really a lot of options, and it may be difficult to decide what options should be configurable in a CRD. Documentation is a little sparse (or I don't know where to find it), but here is one example of the config file: https://github.com/ceph/ceph-nvmeof/blob/devel/ceph-nvmeof.conf

The one included in the ConfigMap is a minimal version that does not depend on specific memory configurations or pre-allocations. It works, but probably can be tuned for better resource usage or performance.

At a glance, it looks like that ini format file could be represented in a CR as a map[string]map[string]string.

I feel similarly. We can have users modify the configmap to set advanced configs, but this makes it awkward when (and there will be a "when") we have to make decisions about when it is safe for Rook to update the configmap versus when it can't because users have modified it.

I think it will be simpler for users and for code maintenance to provide some sort of passthru in Rook and avoid the configmap. The pattern of Josh's suggestion is already being used to set arbitrary ceph config options here:

rook/pkg/apis/ceph.rook.io/v1/types.go

Lines 239 to 242 in 8db5274

// Ceph Config options

// +optional

// +nullable

CephConfig map[string]map[string]string `json:"cephConfig,omitempty"`

jhoblitt · 2025-11-10T22:30:40Z

pkg/operator/ceph/nvmeof/controller.go

+	var configMapName, configHash string
+	var err error
+
+	if cephNVMeOFGateway.Spec.ConfigMapRef == "" {


If the CM is edited by a user, it will never be overwritten, is that an intentional design choice?

Ideally the ConfigMap with nvmeof.conf can be edited by the admin so that the configuration of the gateway can be modified according to their hardware/needs/...

If that is the intended interface, then argo/flux/fleet/etc. might create the CM before the CephNVMeOFGateway and the controller would need to accommodate that. It would also mean that the "template" in the CM becomes a stable interface and any changes would need to be carefully backwards compatible. Another downside to this approach is that typos/errors in the template string would either have to be discovered by trial and error by the user or a manual validator would need to be written. Whereas field validation comes essentially for free with a CRD.

subhamkrai

@OdedViner I was able to find the issue, RBAC is missing for cephnvmeofgateways. Please add those in common.yaml. I can share the common.yaml with you offline.

nixpanic · 2025-11-14T15:21:21Z

pkg/client/clientset/versioned/typed/ceph.rook.io/v1/cephnvmeofgateway.go

@@ -0,0 +1,67 @@
+/*
+Copyright 2018 The Rook Authors. All rights reserved.


nit: 2025 ⌚

When I run make codegen, the year is automatically changed to 2018 (Copyright 2018 The Rook Authors. All rights reserved.) in pkg/client/clientset/versioned/typed/ceph.rook.io/v1/cephnvmeofgateway.go.

@travisn @obnoxxx #16774

jhoblitt · 2025-11-14T16:19:26Z

I think a new nvmeof integration suite should probably be created instead of trying to add tests for this into one of the existing suites.

gadididi · 2025-11-16T15:47:31Z

pkg/operator/ceph/nvmeof/spec.go

+    < /config/ceph-nvmeof.conf > /etc/ceph/nvmeof.conf
+
+ceph nvme-gw create ${POD_NAME} %s ${ANA_GROUP}
+ceph nvme-gw show %s ${ANA_GROUP}


I would add a parsing the return value (json) from ceph nvme-gw show %s ${ANA_GROUP} and validate if everything was ok

BlaineEXE · 2025-11-20T15:40:54Z

pkg/operator/ceph/nvmeof/spec.go

+	}
+
+	script := fmt.Sprintf(`
+set -xEeuo pipefail


I'd suggest that we not inline this script in the go code. Instead, I've found development and maintenance to be easier with separate scripts that are loaded into strings. We do this with the MDS liveness probe:

https://github.com/rook/rook/blob/master/pkg/operator/ceph/file/mds/livenessprobe.sh

https://github.com/rook/rook/blob/master/pkg/operator/ceph/file/mds/livenessprobe.go

done https://github.com/rook/rook/blob/7eb9d57a8883d09251b2eb6917b3ddfa4cec9a37/pkg/operator/ceph/nvmeof/connectionconfig.sh

BlaineEXE · 2025-11-20T15:47:41Z

pkg/operator/ceph/nvmeof/spec.go

+[global]
+mon_host = $(CEPH_MON_HOST)
+log_to_stderr = true
+keyring = /etc/ceph/keyring
+EOF


This was needed in the NFS script because NFS-Ganesha doesn't accept the --mon-host argument. If the commands we are running accept these args, we should use the args instead of the config file

BlaineEXE · 2025-11-20T16:12:09Z

pkg/operator/ceph/nvmeof/spec.go

+    -e "s/@@POD_IP@@/${POD_IP}/g" \
+    < /config/ceph-nvmeof.conf > /etc/ceph/nvmeof.conf
+
+ceph nvme-gw create ${POD_NAME} %s ${ANA_GROUP}


We may need to be careful here when it comes to pod restarts. A k8s deployment will end up having many pod names (e.g., rook-ceph-nvme-gw-abc123) over its lifetime. Will this mean Ceph will register many unique gateway names for each pod restart? And would this mean we are responsible for cleanup of the old pod names/IDs?

If we use a more stable naming here, like deployment name, we will also need to make sure this is idempotent -- or we must make our script idempotent.

BlaineEXE · 2025-11-20T16:23:19Z

pkg/operator/ceph/nvmeof/spec.go

+	nvmeofGatewayPort   = 5500
+	nvmeofMonitorPort   = 5499
+	nvmeofDiscoveryPort = 8009
+	defaultNVMeOFImage  = "quay.io/ceph/nvmeof:1.5"


I'm curious about @travisn 's thoughts, but I would suggest not having a default here. If we keep a default, we will always be obligated to keep it up to date. If we require it as input, we can avoid a minor, repeated (and probably often overlooked) maintenance task

I've added an optional image parameter to the NVMeOFGatewaySpec. If not specified, it defaults to quay.io/ceph/nvmeof:1.5. This allows users to override the image when needed while maintaining backward compatibility.

BlaineEXE · 2025-11-20T16:36:12Z

pkg/operator/ceph/nvmeof/spec.go

+ceph nvme-gw create ${POD_NAME} %s ${ANA_GROUP}
+ceph nvme-gw show %s ${ANA_GROUP}


There is a neat bash trick we can do for arguments here that I think will help us.

Let's imagine ths simplified script:

#!/usr/bin/env bash set -xEeuo pipefail # ARGS: all args are assumed to be 'ceph' CLI args ceph "$@" nvme-gw <subcommand> <args> ceph "$@" nvme-gw show %s ${ANA_GROUP}

We can then call the script like this:

bash script.sh --connect-timeout=90 --mon-host="${CEPH_MON_HOST}" --etc-etc-etc

When the script runs, it will use the mon args where "$@" exists in the script. This allows easy passthru, and we can use Rook's methods for setting CLI options for Ceph daemons in the init container to make our lives easier and to make sure we don't miss options.

https://github.com/rook/rook/pull/16689/files#diff-ad8b8c14b324452bb61cb683b6afd1f902c341f7ce8b2406657300497cab9d0fR25

OdedViner · 2025-11-24T10:51:11Z

I think a new nvmeof integration suite should probably be created instead of trying to add tests for this into one of the existing suites.

@jhoblitt Done https://github.com/rook/rook/pull/16689/files#diff-e92d2b50640744c370e7d59b0510338412d324658ce89df677dbcf1105a4ae57

https://github.com/rook/rook/pull/16689/files#diff-99105bed8e71810279170b92d9b1480d330c081f15de6dd30a2e9dce6a0a027a

adds CephNVMeOFGateway crd for managing ceph nvme-of gateways. supports configurable instances, placement, resources, and networking options. Signed-off-by: Oded Viner <oviner@redhat.com>

Signed-off-by: Oded Viner <oviner@redhat.com>

travisn

My review mainly focused on getting the examples and documentation created to be clear how users can consume the feature.

travisn · 2025-12-15T22:54:53Z

pkg/apis/ceph.rook.io/v1/types.go

+	// For example, quay.io/ceph/nvmeof:1.5
+	// If not specified, a default image will be used.
+	// +optional
+	Image string `json:"image,omitempty"`


The nvmeof gateway image is separate from the ceph/ceph image? Until now, everything in ceph has been a single image, and we could get it from the cephcluster CR.

This is an NVMe-oF image, not a Ceph image, based on:
https://github.com/ceph/ceph-csi/blob/devel/e2e/nvmeof/deployment.yaml#L32

travisn · 2025-12-15T22:56:16Z

deploy/examples/nvmeof-test.yaml

@@ -0,0 +1,69 @@
+# This example is for Ceph v19 and above only


Doesn't this require Ceph v20?

travisn · 2025-12-15T22:56:54Z

deploy/examples/nvmeof-test.yaml

@@ -0,0 +1,69 @@
+# This example is for Ceph v19 and above only
+# The CephBlockPool must be created before the CephNVMeOFGateway


The order with CRDs should not matter. If the controller does not find a needed pool, it can just fail the reconcile and retry until it finds the pool. So no need to mention this order.

travisn · 2025-12-15T22:57:35Z

deploy/examples/nvmeof-test.yaml

+#   - ConfigMap (if configMapRef is not specified)
+#   - Service (one per gateway instance)
+#   - Deployment (one per gateway instance)
+# The ServiceAccount "ceph-nvmeof-gateway" must exist in the namespace.


Let's add the service account and needed rbac to common.yaml

travisn · 2025-12-15T23:00:04Z

deploy/examples/nvmeof-test.yaml

+apiVersion: ceph.rook.io/v1
+kind: CephNVMeOFGateway
+metadata:
+  name: my-nvmeof


How about just remove the my prefix? In retrospect, we should not have included that in the other example names either.

Suggested change

name: my-nvmeof

name: nvmeof

travisn · 2025-12-15T23:00:50Z

deploy/examples/nvmeof-test.yaml

+  namespace: rook-ceph # namespace:cluster
+spec:
+  # Pool name that will be used by the NVMe-oF gateway
+  pool: nvmeofpool


I would suggest no "pool" suffix, since it is already of type pool, the name is redundant.

Suggested change

pool: nvmeofpool

pool: nvmeof

travisn · 2025-12-15T23:01:15Z

deploy/examples/nvmeof-test.yaml

+  # Pool name that will be used by the NVMe-oF gateway
+  pool: nvmeofpool
+  # ANA (Asymmetric Namespace Access) group name
+  group: mygroup


How about this example name instead of the "my" prefix?

Suggested change

group: mygroup

group: group-a

travisn · 2025-12-15T23:05:15Z

deploy/examples/nvmeof-test.yaml

+# This example is for Ceph v19 and above only
+# The CephBlockPool must be created before the CephNVMeOFGateway
+#
+# Note: The operator will automatically create the following resources:


Please create a document that describes how the user can use this feature. Some thoughts:

Create the doc: Documentation/Storage-Configuration/Block-Storage-RBD/nvme-of.md

The content will be similar as block-storage.md, to describe the scenario, show a simple yaml example, and describe how the user can enable the nvmeof csi driver, and do all the steps to configure the scenario.

With this document, reviewers can even better understand the end-to-end flow to better review the feature, and then the feature will be documented for users to try out.

travisn · 2025-12-15T23:07:42Z

deploy/examples/nvmeof-test.yaml

I would suggest creating two example yamls:

nvmeof.yaml: This is the production example. Basically, just rename this file, and change the pool replication to 3.

nvmeof-test.yaml: This will be the replica 1 pool example, and the nvmeof CR will be as simple as possible, no need to have all the comments with all possible settings.

We also need an example storage class.

Since this involves CSI as well, I liked Travis's comment from yesterday to add a deploy/examples/csi/nvmeof directory with examples similar to what we see in the other CSI dirs. We can focus on a single storageclass, pvc, and pod for now. More examples might just make things complicated at this early stage.

BlaineEXE

The most common feedback item here (of which there are more than I flagged) is that in Rook, it is almost always undesirable to use logger.Error() in a reconciliation loop. Only in rare cases do we actually do this in Rook. The ReportReconileResult() handles logging errors that are "bubbled up" via error return wrapping for nearly all cases.

In other projects that use a different logging framework, logger.Error() can be more common, but not in Rook. Please remove all instances unless you believe that it is imperative to log the error. If you do decide that it's imperative, please tag me and Travis on a comment around it, and we will most likely find a way to help avoid logging the error.

For most errors, I see errors.Wrapf() used, which is generally what I am looking for when reviewing error handling in Rook.

BlaineEXE · 2025-12-16T22:00:18Z

pkg/apis/ceph.rook.io/v1/types.go

+	metav1.TypeMeta   `json:",inline"`
+	metav1.ObjectMeta `json:"metadata"`
+	Spec              NVMeOFGatewaySpec `json:"spec"`
+	// +kubebuilder:pruning:PreserveUnknownFields


This is generally only needed to keep fields around when dealing with legacy types. I'd suggest getting rid of this here because it's a whole new resource.

BlaineEXE · 2025-12-16T22:01:43Z

pkg/apis/ceph.rook.io/v1/types.go

+	// +optional
+	Image string `json:"image,omitempty"`


I would suggest making this a required input for users and setting // +kubebuilder:validation:MinLength=1

BlaineEXE · 2025-12-16T22:07:01Z

pkg/apis/ceph.rook.io/v1/types.go

+	// The number of active gateway instances
+	Instances int `json:"instances"`


Current K8s API conventions suggest omitempty for all fields. Please add here and below.

They also recommend setting either // +required or // +optional for all fields. Please add one of those here and for fields below.

Optional fields are recommended to be pointers, except for maps and slices.

BlaineEXE · 2025-12-16T22:07:34Z

pkg/apis/ceph.rook.io/v1/types.go

+	// NVMeOFConfig is a map of section names to key-value pairs for nvmeof.conf configuration
+	// This allows users to override or add configuration options without needing to manage a ConfigMap
+	// +optional
+	// +nullable


// +nullable is actively discouraged by modern API conventions. Please remove any additions of nullable

BlaineEXE · 2025-12-16T22:12:35Z

pkg/operator/ceph/nvmeof/controller.go

+const (
+	packageName                   = "ceph-nvmeof-gateway"
+	controllerName                = packageName + "-controller"
+	CephNVMeOFGatewayNameLabelKey = "ceph_nvmeof_gateway"


Is it possible to avoid adding a new label and instead rely on ceph_daemon_id added by CephDaemonAppLabels()?

BlaineEXE · 2025-12-16T22:28:03Z

pkg/operator/ceph/nvmeof/controller.go

+
+	configContent, err := getNVMeOFGatewayConfig(poolName, podName, podIP, anaGroup, nvmeof.Spec.NVMeOFConfig)
+	if err != nil {
+		logger.Errorf("failed to generate config: %v", err)


No need to log an error when returning the error.

BlaineEXE · 2025-12-16T22:36:51Z

pkg/operator/ceph/nvmeof/controller.go

+		for key, value := range options {
+			section.Key(key).SetValue(value)
+		}


I would recommend also using section.Key(key).GetValue() here to determine if a value has already been specified. If a value is set already, let's log something like below, which can help us and users debug the code used to set defaults and apply user overrides:

logger.NamespacedDebugf("overriding section %q config %q value %q with user-specified value %q", section, key, currentValue, newValue)

BlaineEXE · 2025-12-16T22:40:44Z

pkg/operator/ceph/nvmeof/connectionconfig.sh

+cat << EOF > /etc/ceph/ceph.conf
+[global]
+mon_host = ${ROOK_CEPH_MON_HOST}
+log_to_stderr = true
+keyring = /etc/ceph/keyring
+EOF


Is this needed for the nvme-gw itself, or is this used for the ceph commands below?

If it's needed for the nvme-gw itself, please provide a comment that indicates why CLI args are insufficient.

If it's for the ceph commands below, I'm quite sure this can be omitted.

BlaineEXE · 2025-12-16T22:48:45Z

pkg/operator/ceph/nvmeof/controller.go

+	gatewaySection.Key("port").SetValue("5500")
+	gatewaySection.Key("enable_auth").SetValue("False")
+	gatewaySection.Key("state_update_notify").SetValue("True")
+	gatewaySection.Key("state_update_timeout_in_msec").SetValue("2000")
+	gatewaySection.Key("state_update_interval_sec").SetValue("5")
+	gatewaySection.Key("enable_spdk_discovery_controller").SetValue("False")
+	gatewaySection.Key("encryption_key").SetValue("/etc/ceph/encryption.key")
+	gatewaySection.Key("rebalance_period_sec").SetValue("7")
+	gatewaySection.Key("max_gws_in_grp").SetValue("16")
+	gatewaySection.Key("max_ns_to_change_lb_grp").SetValue("8")
+	gatewaySection.Key("verify_listener_ip").SetValue("False")
+	gatewaySection.Key("enable_monitor_client").SetValue("True")


I'm personally fine with establishing defaults here without using package consts to define them. However, I do think we should make comments that indicate why we believe Rook's defaults to be needed instead of relying on nvme-gw's own internal defaults.

For example, if enable_auth is False by default, I would prefer to either remove Rook's default, or add a comment that explains why Rook believes it's necessary to establish default control for the value.

For things like IP and port, I think it is obvious enough that we don't need comments to doc why Rook is modifying the value. For anything beyond this, I'd prefer to see a comment to help with code maintenance and future decision making around changes to these values.

In general, in Rook we tend to prefer to allow components to do their own defaulting, which trusts the component to change configurations when necessary for user benefit.

Hypothetically, if rebalance_period_sec = 7 is nvme-gw's default and nvme-gw devs determine that 7 seconds isn't enough to accounting for 99% tail latencies in most deployments, Rook will benefit when nvme-gw changes to rebalance_period_sec = 9, but Rook would lose this benefit if Rook establishes its own control.

In the inverse hypothetical, if rebalance_period_sec = 5 is nvme-gw's default and this isn't enough to account for 99% tail latencies in most Rook deployments where the k8s Pod network is adding some latency, it is totally appropriate for Rook to establish a more preferred default of rebalance_period_sec = 7. In this case, we should document in a comment that nvme-gw's default was too long for Rook users and why we chose 7 speciffically as the new value for Rook users.

BlaineEXE · 2025-12-16T22:51:21Z

pkg/operator/ceph/nvmeof/controller.go

+	if err != nil {
+		return "", errors.Wrap(err, "failed to create gateway-logs section")
+	}
+	gatewayLogsSection.Key("log_level").SetValue("debug")


Should we consider changing the default based on ROOK_LOG_LEVEL? Thoughts @travisn ?

Let's revert it. @OdedViner while testing the feature, you can just set the log level to debug in the operator.yaml, no need to set it in code.

OdedViner requested review from BlaineEXE and travisn November 10, 2025 08:05

OdedViner force-pushed the nmveof_new2 branch 3 times, most recently from 70f289f to b5e8c8e Compare November 10, 2025 08:47

OdedViner requested a review from ideepika November 10, 2025 08:48

OdedViner force-pushed the nmveof_new2 branch 5 times, most recently from d2344fc to 61c2a4a Compare November 10, 2025 15:07

OdedViner commented Nov 10, 2025

View reviewed changes

jhoblitt reviewed Nov 10, 2025

View reviewed changes

OdedViner changed the title ~~ceph: add nvme-of gateway crd support~~ WIP: ceph: add nvme-of gateway crd support Nov 11, 2025

OdedViner force-pushed the nmveof_new2 branch from 61c2a4a to a0c490d Compare November 12, 2025 12:44

subhamkrai requested changes Nov 13, 2025

View reviewed changes

OdedViner force-pushed the nmveof_new2 branch 4 times, most recently from 4e47848 to ff746ea Compare November 13, 2025 12:54

travisn mentioned this pull request Nov 13, 2025

add nvme-of support for rook #15551

Open

nixpanic reviewed Nov 14, 2025

View reviewed changes

gadididi reviewed Nov 16, 2025

View reviewed changes

BlaineEXE reviewed Nov 20, 2025

View reviewed changes

OdedViner force-pushed the nmveof_new2 branch 5 times, most recently from 271c1fc to 1689667 Compare November 24, 2025 09:34

OdedViner force-pushed the nmveof_new2 branch from 1689667 to 7eb9d57 Compare November 24, 2025 13:04

OdedViner mentioned this pull request Nov 30, 2025

codegen updates copyright year to 2018 instead of current year #16773

Closed

OdedViner force-pushed the nmveof_new2 branch from 7eb9d57 to 610bcce Compare November 30, 2025 16:55

OdedViner mentioned this pull request Dec 2, 2025

build: update codegen to use year range for copyright #16774

Closed

6 tasks

OdedViner force-pushed the nmveof_new2 branch 5 times, most recently from ae40c70 to e7a4393 Compare December 9, 2025 15:17

operator: add nvme-of gateway crd support

110d7d5

adds CephNVMeOFGateway crd for managing ceph nvme-of gateways. supports configurable instances, placement, resources, and networking options. Signed-off-by: Oded Viner <oviner@redhat.com>

OdedViner force-pushed the nmveof_new2 branch 3 times, most recently from 0fbbafe to e23bf35 Compare December 14, 2025 12:52

fix code

e94122e

Signed-off-by: Oded Viner <oviner@redhat.com>

OdedViner force-pushed the nmveof_new2 branch from e23bf35 to e94122e Compare December 14, 2025 14:15

fix 2

f2f1718

Signed-off-by: Oded Viner <oviner@redhat.com>

OdedViner force-pushed the nmveof_new2 branch from 83c2fa5 to f2f1718 Compare December 14, 2025 14:41

fix 4

a1088d0

Signed-off-by: Oded Viner <oviner@redhat.com>

OdedViner force-pushed the nmveof_new2 branch from 90f06b5 to a1088d0 Compare December 14, 2025 16:12

OdedViner added 3 commits December 14, 2025 18:30

fix 5

db256b4

Signed-off-by: Oded Viner <oviner@redhat.com>

fix 6

9833c05

Signed-off-by: Oded Viner <oviner@redhat.com>

fix 7

13dec34

Signed-off-by: Oded Viner <oviner@redhat.com>

travisn requested changes Dec 15, 2025

View reviewed changes

BlaineEXE requested changes Dec 16, 2025

View reviewed changes

	err = addonsv1alpha1.AddToScheme(mgr.GetScheme())
	if err != nil {
	return err
	}

	err = csiopv1.AddToScheme(mgr.GetScheme())
	if err != nil {
	return err
	}

	// GenerateMinimalCephConfInitContainer returns an init container that will generate the most basic
	// Ceph config for connecting non-Ceph daemons to a Ceph cluster (e.g., nfs-ganesha). Effectively
	// what this means is that it generates '/etc/ceph/ceph.conf' with 'mon_host' populated and a
	// keyring path associated with the user given. 'mon_host' is determined by the 'ROOK_CEPH_MON_HOST'
	// env var present in other Ceph daemon pods, and the keyring is expected to be mounted into the
	// container with a Kubernetes pod volume+mount.
	func GenerateMinimalCephConfInitContainer(
	username, keyringPath string,
	containerImage string,
	containerImagePullPolicy v1.PullPolicy,
	volumeMounts []v1.VolumeMount,
	resources v1.ResourceRequirements,
	securityContext *v1.SecurityContext,
	) v1.Container {
	cfgPath := client.DefaultConfigFilePath()
	// Note that parameters like $(PARAM) will be replaced by Kubernetes with env var content before
	// container creation.
	confScript := `
	set -xEeuo pipefail

	cat << EOF > ` + cfgPath + `
	[global]
	mon_host = $(ROOK_CEPH_MON_HOST)

	[` + username + `]
	keyring = ` + keyringPath + `
	EOF

	chmod 444 ` + cfgPath + `

	cat ` + cfgPath + `
	`
	return v1.Container{
	Name: "generate-minimal-ceph-conf",
	Command: []string{"/bin/bash", "-c", confScript},
	Args: []string{},
	Image: containerImage,
	ImagePullPolicy: containerImagePullPolicy,
	VolumeMounts: volumeMounts,
	Env: opconfig.StoredMonHostEnvVars(),
	Resources: resources,
	SecurityContext: securityContext,
	}
	}

	func createRGWPool(ctx Context, cluster cephv1.ClusterSpec, poolSpec cephv1.PoolSpec, pgCount, requestedName string) error {
	// create the pool if it doesn't exist yet

	// Ceph Config options
	// +optional
	// +nullable
	CephConfig map[string]map[string]string `json:"cephConfig,omitempty"`

		@@ -0,0 +1,67 @@
		/*
		Copyright 2018 The Rook Authors. All rights reserved.

		ceph nvme-gw create ${POD_NAME} %s ${ANA_GROUP}
		ceph nvme-gw show %s ${ANA_GROUP}

		@@ -0,0 +1,69 @@
		# This example is for Ceph v19 and above only

		@@ -0,0 +1,69 @@
		# This example is for Ceph v19 and above only
		# The CephBlockPool must be created before the CephNVMeOFGateway

		// The number of active gateway instances
		Instances int `json:"instances"`

WIP: ceph: add nvme-of gateway crd support #16689

Are you sure you want to change the base?

WIP: ceph: add nvme-of gateway crd support #16689

Uh oh!

Conversation

OdedViner commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BlaineEXE Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BlaineEXE Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BlaineEXE Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

OdedViner Nov 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

subhamkrai left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

OdedViner commented Nov 10, 2025 •

edited

Loading

BlaineEXE Nov 20, 2025 •

edited

Loading

BlaineEXE Nov 20, 2025 •

edited

Loading

BlaineEXE Nov 24, 2025 •

edited

Loading

OdedViner Nov 30, 2025 •

edited

Loading

OdedViner Nov 30, 2025 •

edited

Loading

BlaineEXE Nov 20, 2025 •

edited

Loading

BlaineEXE Nov 20, 2025 •

edited

Loading