consider implementing per-CephCluster operator pattern

### Is this a bug report or feature request?
* Feature Request

I consider this a feature request more than a bug report. It may also reasonably be a large cleanup-related task.

### What is use case behind this feature:
There are several minor problems that Rook users face that could be improved if Rook had one operator per CephCluster. The problems are outlined below, with additional notes about investigations related to fixing each issue individually:

**The OSD wait loop is long and can starve other CephCluster reconciles**
Captured in: https://github.com/rook/rook/issues/11330

Using a per-CephCluster operator would mean that the OSD wait loops we have written -- that help avoid unnecessary k8s and ceph API calls -- don't need to be refactored.

**Logging framework does not enforce use of namespaced names in logs**
Our logging framework does not automatically capture (or force use of) namespaced names when logging. Many contributing devs forget to include the namespace of a resource in logs, resulting in unclear logs in multi-CephCluster environments.

Doing the work to separate the k8s-cluster-scope functions and ceph-cluster-scope functions may actually be less work than it would take to replace the logging framework with a clearer one.

In addition, users have asked in the past if it's possible to easily see logs only for a single one of their multiple clusters. This is not easy today, because all cluster logs are interleaved together into a single operator log.

**Version mismatch errors during usage of radosgw-admin**
When the version of radosgw-admin in the rook-operator image is lower than the version present in the Ceph image in the CephCluster, there is a risk that radosgw-admin commands fail (with random, unexpected failure messages and error codes) due to a non-forward-compatible radosgw-admin API.

Historical issue: https://github.com/rook/rook/issues/13241 
Abandoned to resolve the issue: https://github.com/rook/rook/pull/13430

The abandoned attempt to resolve the issue is workable, but it continues to retain the hackiness needed by requiring a mgr sidecar. This would be totally unnecessary with a per-cephcluster operator.

**In non-default network configs, rook isn't connected to the same network as the Ceph cluster**
This isn't a bug, but it does require maintenance of hacky workarounds, especially in multus-enabled clusters. For example:
- needing to use a multus network CIDR detection canary job (hacky, could consume all IP allocations in DHCP environments if there is an operator CLBO bug)
- needing to exec commands in a mgr sidecar pod in multus-enabled clusters (extra cpu/mem resources needed, complicated scenario for krew plugin)

It also means that Rook doesn't have ease of flexibility in nonstandard network environments that could help implement new network features or more advanced modifications to existing features.

If a per-cephcluster operator were run in the same network space as the CephCluster, this would be unnecessary and allow cleanup of several complicated workarounds. Admittedly, the extra CPU/Mem resources would move from the mgr sidecar to being used by the per-cephcluster operator -- a net zero change.

**Ceph version-detection jobs are hacky**
Rook uses Ceph version detection jobs to detect the Ceph version present in a given Ceph image version. 

This could be avoided if a per-cephcluster operator were to always run on top of the same Ceph image defined in the CephCluster.


### What should the feature do:

The overall proposal for what can be changed to help resolve these issues would be to have a global rook operator handle these tasks:
- run CSI driver
- run COSI driver
- spawn a sub-operator for each CephCluster
  - update the sub-operator when `spec.version` changes
  - update the sub-operator when `spec.network` changes

The CephCluster sub-operator would continue to handle the same functions as the rook-ceph-operator handles today with the exception of the changes handled by the global operator above. For what it's worth, this more closely resembles the suggestion that controller-runtime makes to use jobs for long-running reconcile steps: https://github.com/kubernetes-sigs/controller-runtime/issues/1763#issuecomment-1003703642

Implementation suggestion:
- First modification
  - Sub-operator should watch for all changes to CephClusters, but ignore changes to `spec.version` and `spec.network` -- the "self" sub-operator pod spec when these changes occur
  - Sub-operator should take resource requests/limits and placement from rook-ceph-operator as a starting point
  - Top-level operator should keep the same name (`rook-ceph-operator`) for easy migration, and sub-operator will be named `ceph-operator`
- Second modification [per Travis' suggestion]
  - Sub-operator should run in the same Ceph image as defined in `spec.version`
  - Sub-operator should be configured to use the same `spec.network` config as Ceph pods when necessary -- e.g., must be attached to multus net(s) (possibly host network?)
- Follow-up modifications:
  - Refactor ceph version detection jobs to just call `ceph version locally`
  - Refactor multus network canary job to check for network CIDRs locally
  - Remove usage of mgr sidecar when multus is enabled
  - Refactor krew plugin for new architecture
  - Refactor OBC controller to ensure it (or they) operate as needed
  - Update troubleshooting docs to ensure the correct operator is referenced
  - (if needed) Create a new serviceaccount for the global operator, and refactor RBAC to clean up
  - Provide CephCluster options for `ceph-operator` placement and resource requests/limits to be specified optionally

Open questions:
- Which operator manages OBCs? Global or per-CephCluster? 
  I propose the per-CephCluster operator, and we should take extra care to ensure that each sub-operator will only respond to OBC changes related to its namespace
- Should we put the sub-operator on the host network if `hostNetwork: true`? I think not since it isn't necessary and not doing so is more secure, and we can always change this in the future if needed.

### Environment
All, and especially:
- multi-CephCluster environments
- Multus environments
- environments with CephObjectStores

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

consider implementing per-CephCluster operator pattern #13848

Is this a bug report or feature request?

What is use case behind this feature:

What should the feature do:

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

consider implementing per-CephCluster operator pattern #13848

Description

Is this a bug report or feature request?

What is use case behind this feature:

What should the feature do:

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions