Skip to content

Conversation

dshnayder
Copy link
Contributor

@dshnayder dshnayder commented Jun 2, 2025

Change Description

Summary

This change allows Kubernetes Config Connector (KCC) controller managers to be deployed in dedicated, tenant-specific namespaces instead of the default cnrm-system namespace. This significantly improves the security of multi-tenant environments by providing better namespace-level isolation and reducing the blast radius of potential security incidents.

Motivation

Currently, all KCC controller managers operate within the shared cnrm-system namespace, irrespective of the specific ConfigConnectorContext or tenant they serve. This centralized deployment model presents a security concern: it increases the potential blast radius in the event of a security compromise or critical failure affecting one controller manager. A vulnerability exploited within cnrm-system could potentially impact all tenants managed by that KCC installation due to the shared execution environment.

The goal of this refactoring is to mitigate this risk by allowing controller managers to be deployed into distinct, dedicated namespaces. This architectural change facilitates stronger isolation between tenants or configurations, thereby reducing the scope of impact from potential security incidents or operational issues within a single manager instance. This approach aligns with best practices for secure multi-tenant system design in Kubernetes.

Implementation Details

The new functionality is gated by command line argument “manager-namespace-isolation” - “shared” or “dedicated” (default os "shared"). The argument can be passed to the pod “manager” (/configconnector-operator/manager):

   spec:
      containers:
      - args:
        - --local-repo=/configconnector-operator/autopilot-channels
        - --manager-namespace-isolation=dedicated
        command:
        - /configconnector-operator/manager

When this flag is enabled, KCC will check for the optional managerNamespace field in the ConfigConnectorContext resource. If this field is present, KCC will deploy the corresponding controller manager and its associated ServiceAccount into the specified namespace, rather than the default cnrm-system.

  • Spec field: managerNamespace
    When this field is present on a ConfigConnectorContext resource specification and it is not empty, the system is instructed not to deploy the corresponding controller manager components into the default cnrm-system namespace. Instead, the namespace specified in managerNamespace is used. The managerNamespace will be created if it doesn’t exists. The selection of this specification field as the configuration mechanism adheres to standard Kubernetes practices for declarative configuration, allowing behavior modification without altering the core ConfigConnector API specification.

Example:
Consider a ConfigConnectorContext resource, located in the namespace t1234-tenant0-provider:

apiVersion: core.cnrm.cloud.google.com/v1beta1
kind: ConfigConnectorContext
metadata:
  name: configconnectorcontext.core.cnrm.cloud.google.com
  namespace: t1234-tenant0-provider
  labels:
    tenancy.gke.io/access-level: supervisor
    tenancy.gke.io/project: t1234
    tenancy.gke.io/tenant: t1234-tenant0
spec:
  googleServiceAccount: "gcp@tpname.iam.gserviceaccount.com"
  stateIntoSpec: Absent
  managerNamespace: t1234-tenant0-supervisor

If the ConfigConnectorContext resource includes the specification field managerNamespace: t1234-tenant0-supervisor, then the controller manager StatefulSet and its ServiceAccount will be created within the t1234-tenant0-supervisor namespace.

If the new functionality is not activated, then the specification field “managerNamespace” on each ConfigConnectorContext must not be specified. If “managerNamespace” is specified when the feature is globally disabled, then ConfigConnectorContext reconciliation will fail with an error message indicating that “managerNamespace” is not supported.

If the new functionality is activated then the specification field “managerNamespace” on each ConfigConnectorContext must be specified. If “managerNamespace” is not specified when the feature is globally enabled, then ConfigConnectorContext reconciliation will fail with an error message indicating that “managerNamespace” must be populated.

Summary of configuration

Parameter Type Description
--manager-namespace-isolation Command-line flag Top-level enabler for the new functionality. Must be set to dedicated for the managerNamespace field to have any effect.
managerNamespace ConfigConnectorContext spec field Configures the target namespace for the controller manager when the feature is enabled.

Scope and Impact

This enhancement is required for multi-tenant GKE deployments, where strict tenant isolation is often a requirement.

Positive Impacts:

  • Significantly reduces the blast radius by isolating controller managers per tenant or configuration context.
  • Aligns with best practices for secure multi-tenant system design in Kubernetes

Negative Impacts:

  • When the feature is enabled via the gating flag --manager-namespace-isolation=dedicated and a new Custom Resource is created, an additional request from unmanageddetector to apiserver required to get specification of ConfigConnectorContext resource.

Tests

  • Created GKE MT cluster.
  • Installed KCC in namespaced mode.
  • Created a tenant.
  • Created a ConfigConnectorContext in the provider namespace. Verified that StatefulSets/Pod cnrm-controller-manager-XXX are created in the supervisor namespace.
  • Created PubSubTopic in the provider namespace. Verified that corresponding object is created in GCP.
  • Removed the ConfigConnectorContext. Verified that StatefulSets/Pod cnrm-controller-manager-XXX are deleted
  • Recreated ConfigConnectorContext and PubSubTopic, then deleted tenant. Verified that GCP object, ConfigConnectorContext, namespaces and tenant were deleted successfully.
  • Deleted ConfigConnector
  • Verified that all StatefulSets/Pods are removed
  • Created ConfigConnector in cluster mode
  • Verified that StatefulSets/Pod cnrm-controller-manager are created in the cnrm-system namespace
  • Run make ready-pr to ensure this PR is ready for review.
  • Perform necessary E2E testing for changed resources.

Does this PR add something which needs to be 'release noted'?

# New Fields:
- ConfigConnectorContext
  - Added `spec.managerNamespace`
  • Reviewer reviewed release note.

Additional documentation e.g., references, usage docs, etc.:

This change allows Kubernetes Config Connector (KCC) controller managers to be deployed in dedicated,
tenant-specific namespaces instead of the default cnrm-system namespace. This significantly improves
the security of multi-tenant environments by providing better namespace-level isolation and reducing the
blast radius of potential security incidents.

The new functionality is gated by command line argument “manager-namespace-isolation” - “shared” or “dedicated” (default os "shared"). The argument can be passed to the pod “manager” (/configconnector-operator/manager):

   spec:
      containers:
      - args:
        - --local-repo=/configconnector-operator/autopilot-channels
        - --manager-namespace-isolation=dedicated
        command:
        - /configconnector-operator/manager

When this flag is enabled, KCC will check for the optional `managerNamespace` field in the
`ConfigConnectorContext` resource. If this field is present, KCC will deploy the corresponding controller
manager and its associated ServiceAccount into the specified namespace, rather than the default
`cnrm-system`.

- Spec field: `managerNamespace`
When this field is present on a `ConfigConnectorContext` resource specification and it is not empty, the
system is instructed not to deploy the corresponding controller manager components into the default
`cnrm-system` namespace. Instead, the namespace specified in `managerNamespace` is used. The
`managerNamespace` will be created if it doesn’t exists. The selection of this specification field as the
configuration mechanism adheres to standard Kubernetes practices for declarative configuration,
allowing behavior modification without altering the core ConfigConnector API specification.

**Example**:
Consider a ConfigConnectorContext resource, located in the namespace t1234-tenant0-provider:

apiVersion: core.cnrm.cloud.google.com/v1beta1
kind: ConfigConnectorContext
metadata:
  name: configconnectorcontext.core.cnrm.cloud.google.com
  namespace: t1234-tenant0-provider
  labels:
    tenancy.gke.io/access-level: supervisor
    tenancy.gke.io/project: t1234
    tenancy.gke.io/tenant: t1234-tenant0
spec:
  googleServiceAccount: "gcp@tpname.iam.gserviceaccount.com"
  stateIntoSpec: Absent
  managerNamespace: t1234-tenant0-supervisor

If the ConfigConnectorContext resource includes the specification field
`managerNamespace: t1234-tenant0-supervisor`, then the controller manager StatefulSet
and its ServiceAccount will be created within the `t1234-tenant0-supervisor` namespace.

If the new functionality is not activated, then the specification field “managerNamespace” on each
ConfigConnectorContext must not be specified. If “managerNamespace” is specified when the feature is
globally disabled, then ConfigConnectorContext reconciliation will fail with an error message indicating
that “managerNamespace” is not supported.

If the new functionality is activated then the specification field “managerNamespace” on each
ConfigConnectorContext must be specified. If “managerNamespace” is not specified when the feature is
globally enabled, then ConfigConnectorContext reconciliation will fail with an error message indicating
that “managerNamespace” must be populated.

@yuwenma
Copy link
Collaborator

yuwenma commented Jun 4, 2025

Thank you so much for the detailed descriptions. I saw this PR also contains a lot of changes to unmanaged-detector. Could you elaborate a little bit more about those changes?

Comment on lines 49 to 51
tenancy.gke.io/access-level: supervisor
tenancy.gke.io/project: no-project
tenancy.gke.io/tenant: no-tenant
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I ask more about these 3 labels? I want to know the life cycle of them, e.g. how is it used/modified, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answered in corporate chat

Comment on lines 86 to 91
var b []byte
if managerNamespaceSuffix == "" {
b = []byte(defaultConfigConnector)
} else {
b = []byte(fmt.Sprintf(defaultConfigConnectorNamespaceSeparated, managerNamespaceSuffix))
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest making this a Unstructured level operations, especially considering that a string formatting is used.

u.SetLabels(
map[string]string{
  ...
}
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

NamespaceEnvVar = "NAMESPACE"
ImmediateReconcileRequestsBufferSize = 10000
MaxNumResourceWatcherRoutines = 10000
ManagerNamespaceSuffixLabel = "cnrm.cloud.google.com/manager-namespace-suffix"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest using a new field to CCC spec instead of labels. Spec is specifically for user configurations, and it is more managable and traceble compared to labels.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting spec on CCC (ConfigConnectorContext)? There should be a global flag that indicates:

  1. If managers need to run in separate namespaces
  2. How to name the namespaces )what should be the suffix of the manager's namespace)
    This suggest that CC (ConfigConnector) should have the ManagerNamespaceSuffix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

update Unstructured instead of string substitution
@dshnayder dshnayder requested a review from yuwenma June 5, 2025 13:29
@jingyih
Copy link
Collaborator

jingyih commented Jun 12, 2025

/assign @justinsb @yuwenma

@yuwenma
Copy link
Collaborator

yuwenma commented Jun 17, 2025

Update: This PR is under the context that a GKE MT cluster creates the custom namespace (different from KCC creates its system ns), and adds the mt specific labels. Per offline discussion, we expect the MT team to share the guide on these additional setup so we can verify the PR in the GKE autopilot cluster.

flag.CommandLine.AddGoFlagSet(goflag.CommandLine)
flag.BoolVar(&enablePprof, "enable-pprof", false, "Enable the pprof server.")
flag.IntVar(&pprofPort, "pprof-port", 6060, "The port that the pprof server binds to if enabled.")
flag.BoolVar(&unmanageddetector.EnableManagerNamespace, "enable-manager-namespace", false, "Enable running controller managers in separate namespaces.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not blocking
I dislike bools in APIs. It always seems like something which is 2 state eventually becomes 3+ state. Right now the manager "run-mode" is either "cluster-only" or "namespace-only". I can imagine wanting something in the future like "cluster-namespace-hybrid" or even "project". Switching from a bool to an enum seems like future proofing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed from boolean to string everywhere in the code.


imagePrefix := os.Getenv("IMAGE_PREFIX")
flag.StringVar(&imagePrefix, "image-prefix", imagePrefix, "Remap container images to pull from the specified registry or mirror.")
flag.BoolVar(&enableManagerNamespace, "enable-manager-namespace", false, "Enable running controller managers in separate namespaces.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not blocking
I dislike bools in APIs. It always seems like something which is 2 state eventually becomes 3+ state. Right now the manager "run-mode" is either "cluster-only" or "namespace-only". I can imagine wanting something in the future like "cluster-namespace-hybrid" or even "project". Switching from a bool to an enum seems like future proofing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed from boolean to string everywhere in the code.

if err := cluster.DeleteNamespaceID(ctx, k8s.OperatorNamespaceIDConfigMapNN, r.client, ccc.Namespace); err != nil {
return err
}
if r.enableManagerNamespace && ccc.Spec.ManagerNamespace != "" {
Copy link
Collaborator

@cheftako cheftako Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it should not happen. Can we add a warning log and/or maybe a status message in the CCC? I understand why your doing it but we certainly should not be taking such a drastic step quietly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated code to fail reconciliation of CCC if global mode and spec.ManagerNamespace are not compatible.

@cheftako
Copy link
Collaborator

Can we add a check for not enableManagerNamespace that either prevent the managerNamepsace field or report a status warning?

@dshnayder
Copy link
Contributor Author

Can we add a check for not enableManagerNamespace that either prevent the managerNamepsace field or report a status warning?

Updated code to fail reconciliation of CCC if global mode and spec.ManagerNamespace are not compatible.

@cheftako
Copy link
Collaborator

/assign @cheftako

@cheftako cheftako added this to the 1.134 milestone Aug 14, 2025
@cheftako
Copy link
Collaborator

@dshnayder The PR template should have had a Release Notes and Docs section. Can you put those back in your top notes?

@cheftako
Copy link
Collaborator

/lgtm
/approve

@google-oss-prow google-oss-prow bot added the lgtm label Aug 14, 2025
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cheftako

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit 827c965 into GoogleCloudPlatform:master Aug 14, 2025
116 of 117 checks passed
@dshnayder
Copy link
Contributor Author

@dshnayder The PR template should have had a Release Notes and Docs section. Can you put those back in your top notes?

Updated pull request descriotion with release notes and documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants