Skip to content

PoC cluster updates based on CS sha calculation and management cluster state #5593

Open
miguelsorianod wants to merge 5 commits into
mainfrom
msoriano-poc-cluster-updates-1
Open

PoC cluster updates based on CS sha calculation and management cluster state #5593
miguelsorianod wants to merge 5 commits into
mainfrom
msoriano-poc-cluster-updates-1

Conversation

@miguelsorianod

Copy link
Copy Markdown
Collaborator

What

Why

Testing

Special notes for your reviewer

PR Checklist

  • PR is scoped to a single task (no mixed concerns)
  • Title follows Conventional Commits format
  • Summary explains the "Why" behind the change
  • Linked to relevant ticket/issue
  • Screenshots included (if graph/UI/metrics changes)
  • Self-reviewed the diff
  • CI/CD checks are passing (ignore Tide)
  • Draft PR used for WIP (if applicable)
  • Commit history is clean (rebased/squashed)
  • Tricky code blocks are commented
  • Specific reviewers tagged
  • All comment threads resolved before merge

Copilot AI review requested due to automatic review settings June 10, 2026 17:35
@openshift-ci openshift-ci Bot requested a review from deads2k June 10, 2026 17:35
@miguelsorianod

Copy link
Copy Markdown
Collaborator Author

/hold

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR moves cluster updates toward an asynchronous, backend-driven flow by introducing a canonical “updatable config” hash (SHA-256 over canonical JSON) and a backend dispatch controller that PATCHes Cluster Service only when the desired config hash changes. It also adds update-operation state calculation based on observed HyperShift HostedCluster spec.

Changes:

  • Refactors BuildCSCluster to apply updatable fields via a shared clusterUpdatableConfig, and adds hashing/canonicalization helpers + tests.
  • Adds a backend “Cluster Service update dispatch” controller that compares/stamps the updatable-config hash and issues CS update calls when needed.
  • Updates the cluster update operation controller to include HyperShift state checks and wires required listers through app/controller construction.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
internal/ocm/convert.go Refactors cluster build flow to use the new updatable-config application helpers.
internal/ocm/cluster_updatable_config.go Introduces canonical updatable-config struct, hashing, canonical JSON generation, and CS application helpers.
internal/ocm/cluster_updatable_config_test.go Adds tests for hash stability/differences and canonical JSON behavior.
internal/api/v20251223preview/conversion_fuzz_test.go Updates fuzz roundtrip normalizations for new internal-only update flag.
internal/api/v20240610preview/conversion_fuzz_test.go Same as above for the older preview API version.
internal/api/types_serviceprovider_cluster.go Adds ServiceProviderCluster status field to store the updatable-config hash used by dispatch.
internal/api/types_cluster.go Adds internal-only UsesNewClusterUpdateApproach flag for rollout tracking.
frontend/pkg/frontend/cluster.go Stops doing synchronous CS updates in frontend; instead marks cluster to use the new update approach.
backend/pkg/controllers/operationcontrollers/operation_cluster_update.go Adds HyperShift-based state calculation to operation state determination.
backend/pkg/controllers/operationcontrollers/operation_cluster_update_test.go Updates controller wiring/tests for the new read-desire lister dependency.
backend/pkg/controllers/operationcontrollers/operation_cluster_update_state_calculation.go New logic comparing desired Cosmos config to observed HostedCluster spec for update completion.
backend/pkg/controllers/operationcontrollers/operation_cluster_update_state_calculation_test.go Unit tests for the new HyperShift state-calculation logic.
backend/pkg/controllers/clusterupdate/cluster_cluster_service_update_dispatch_controller.go New controller that dispatches CS updates based on the stored vs desired updatable-config hash.
backend/pkg/controllers/clusterupdate/cluster_cluster_service_update_dispatch_controller_test.go Unit tests for dispatch gating, hashing behavior, and CS call expectations.
backend/pkg/app/backend.go Wires new controller + lister into backend runtime/controller startup.

Comment on lines +17 to +26
import (
"slices"
"testing"

"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"

"github.com/Azure/ARO-HCP/internal/api"
)

Comment on lines +62 to +69
func ClusterUpdatableConfigFromProperties(properties api.HCPOpenShiftClusterCustomerProperties) *clusterUpdatableConfig {
return &clusterUpdatableConfig{
NodeDrainTimeoutMinutes: properties.NodeDrainTimeoutMinutes,
AuthorizedCIDRs: properties.API.AuthorizedCIDRs,
ImageDigestMirrors: properties.ImageDigestMirrors,
Autoscaling: properties.Autoscaling,
}
}
// when serialized as a json map. This hash is used by the cluster update dispatch controller to determine if a CS PATCH call is needed.
// Note: this hash does not necessarily include all the fields that can be updated via the CS API, just the ones
// that are to be considered by the cluster update dispatch controller to determine if a CS PATCH call is needed.
ClusterServiceUpdatableConfigHashForUpdateDispatch string `json:"clusterServiceUpdatableConfigHashForUpdateDispatch"`
Comment on lines +163 to +172
existingCluster, err := c.resourcesDBClient.HCPClusters(operation.ExternalID.SubscriptionID, operation.ExternalID.ResourceGroupName).Get(ctx, operation.ExternalID.Name)
if err != nil {
errs = append(errs, utils.TrackError(fmt.Errorf("failed to get cluster for hypershift state check: %w", err)))
} else {
if operationState, hsErr := c.hypershiftClusterOperationState(ctx, existingCluster); hsErr != nil {
errs = append(errs, utils.TrackError(hsErr))
} else {
operationStates = append(operationStates, operationState)
}
}
@miguelsorianod miguelsorianod force-pushed the msoriano-poc-cluster-updates-1 branch from 4706563 to d4fc1c7 Compare June 12, 2026 18:14
@openshift-ci

openshift-ci Bot commented Jun 12, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: miguelsorianod
Once this PR has been reviewed and has the lgtm label, please assign deads2k for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copilot AI review requested due to automatic review settings June 12, 2026 18:56
@miguelsorianod miguelsorianod force-pushed the msoriano-poc-cluster-updates-1 branch from d4fc1c7 to ed4a4d3 Compare June 12, 2026 18:56

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.

Comment on lines +253 to +254
case "syncClusterUpdateDispatch":
return newSyncClusterUpdateDispatchStep(stepID, stepDir)
assert.Equal(t, hash1, hash2)
}

func TestClusterUpdatableConfigJSONForHashIsCanonical(t *testing.T) {
Comment on lines +163 to +166
existingCluster, err := c.resourcesDBClient.HCPClusters(operation.ExternalID.SubscriptionID, operation.ExternalID.ResourceGroupName).Get(ctx, operation.ExternalID.Name)
if err != nil {
errs = append(errs, utils.TrackError(fmt.Errorf("failed to get cluster for hypershift state check: %w", err)))
} else {
@miguelsorianod miguelsorianod force-pushed the msoriano-poc-cluster-updates-1 branch from ed4a4d3 to 0605a95 Compare June 15, 2026 11:10
Copilot AI review requested due to automatic review settings June 15, 2026 16:06
@miguelsorianod miguelsorianod force-pushed the msoriano-poc-cluster-updates-1 branch from 0605a95 to 7bca90c Compare June 15, 2026 16:06

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 18 changed files in this pull request and generated 5 comments.

Comment on lines +152 to +156
logger.Info("update dispatch config differs between RP and CS",
"clusterServiceID", clusterCSID.String(),
"desiredConfig", desiredConfigJSON,
"actualConfig", actualConfigJSON,
)
Comment on lines +168 to +171
logger.Info("dispatching cluster autoscaler update to Cluster Service",
"clusterServiceID", clusterCSID.String(),
"clusterServiceClusterAutoscalerPayload", clusterAutoscalerPayload,
)
Comment on lines +192 to +195
logger.Info("dispatching cluster update to Cluster Service",
"clusterServiceID", clusterCSID.String(),
"clusterServiceClusterPayload", clusterPayload,
)
Comment on lines +135 to +136
// TODO Revisit once we know the internal CIDR blocks once we move the nodes egress lb logic to the RP
// (or can read the internally set allow-list from Cluster Service) so stale customer entries can be rejected.
Comment on lines +190 to +199
func clusterUpdateDispatchConfigAuthorizedCIDRsFromCS(in *arohcpv1alpha1.ClusterAPI) []string {
cidrAccess := in.CIDRBlockAccess()
if cidrAccess.Empty() {
return nil
}
if cidr := cidrAccess.Allow(); cidr != nil {
return cidr.Values()
}
return nil
}
@openshift-ci

openshift-ci Bot commented Jun 15, 2026

Copy link
Copy Markdown

@miguelsorianod: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-parallel 0605a95 link true /test e2e-parallel
ci/prow/lint 7bca90c link true /test lint

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants