PoC cluster updates based on CS sha calculation and management cluster state #5593
PoC cluster updates based on CS sha calculation and management cluster state #5593miguelsorianod wants to merge 5 commits into
Conversation
|
/hold |
There was a problem hiding this comment.
Pull request overview
This PR moves cluster updates toward an asynchronous, backend-driven flow by introducing a canonical “updatable config” hash (SHA-256 over canonical JSON) and a backend dispatch controller that PATCHes Cluster Service only when the desired config hash changes. It also adds update-operation state calculation based on observed HyperShift HostedCluster spec.
Changes:
- Refactors
BuildCSClusterto apply updatable fields via a sharedclusterUpdatableConfig, and adds hashing/canonicalization helpers + tests. - Adds a backend “Cluster Service update dispatch” controller that compares/stamps the updatable-config hash and issues CS update calls when needed.
- Updates the cluster update operation controller to include HyperShift state checks and wires required listers through app/controller construction.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/ocm/convert.go | Refactors cluster build flow to use the new updatable-config application helpers. |
| internal/ocm/cluster_updatable_config.go | Introduces canonical updatable-config struct, hashing, canonical JSON generation, and CS application helpers. |
| internal/ocm/cluster_updatable_config_test.go | Adds tests for hash stability/differences and canonical JSON behavior. |
| internal/api/v20251223preview/conversion_fuzz_test.go | Updates fuzz roundtrip normalizations for new internal-only update flag. |
| internal/api/v20240610preview/conversion_fuzz_test.go | Same as above for the older preview API version. |
| internal/api/types_serviceprovider_cluster.go | Adds ServiceProviderCluster status field to store the updatable-config hash used by dispatch. |
| internal/api/types_cluster.go | Adds internal-only UsesNewClusterUpdateApproach flag for rollout tracking. |
| frontend/pkg/frontend/cluster.go | Stops doing synchronous CS updates in frontend; instead marks cluster to use the new update approach. |
| backend/pkg/controllers/operationcontrollers/operation_cluster_update.go | Adds HyperShift-based state calculation to operation state determination. |
| backend/pkg/controllers/operationcontrollers/operation_cluster_update_test.go | Updates controller wiring/tests for the new read-desire lister dependency. |
| backend/pkg/controllers/operationcontrollers/operation_cluster_update_state_calculation.go | New logic comparing desired Cosmos config to observed HostedCluster spec for update completion. |
| backend/pkg/controllers/operationcontrollers/operation_cluster_update_state_calculation_test.go | Unit tests for the new HyperShift state-calculation logic. |
| backend/pkg/controllers/clusterupdate/cluster_cluster_service_update_dispatch_controller.go | New controller that dispatches CS updates based on the stored vs desired updatable-config hash. |
| backend/pkg/controllers/clusterupdate/cluster_cluster_service_update_dispatch_controller_test.go | Unit tests for dispatch gating, hashing behavior, and CS call expectations. |
| backend/pkg/app/backend.go | Wires new controller + lister into backend runtime/controller startup. |
| import ( | ||
| "slices" | ||
| "testing" | ||
|
|
||
| "github.com/stretchr/testify/assert" | ||
| "github.com/stretchr/testify/require" | ||
|
|
||
| "github.com/Azure/ARO-HCP/internal/api" | ||
| ) | ||
|
|
| func ClusterUpdatableConfigFromProperties(properties api.HCPOpenShiftClusterCustomerProperties) *clusterUpdatableConfig { | ||
| return &clusterUpdatableConfig{ | ||
| NodeDrainTimeoutMinutes: properties.NodeDrainTimeoutMinutes, | ||
| AuthorizedCIDRs: properties.API.AuthorizedCIDRs, | ||
| ImageDigestMirrors: properties.ImageDigestMirrors, | ||
| Autoscaling: properties.Autoscaling, | ||
| } | ||
| } |
| // when serialized as a json map. This hash is used by the cluster update dispatch controller to determine if a CS PATCH call is needed. | ||
| // Note: this hash does not necessarily include all the fields that can be updated via the CS API, just the ones | ||
| // that are to be considered by the cluster update dispatch controller to determine if a CS PATCH call is needed. | ||
| ClusterServiceUpdatableConfigHashForUpdateDispatch string `json:"clusterServiceUpdatableConfigHashForUpdateDispatch"` |
| existingCluster, err := c.resourcesDBClient.HCPClusters(operation.ExternalID.SubscriptionID, operation.ExternalID.ResourceGroupName).Get(ctx, operation.ExternalID.Name) | ||
| if err != nil { | ||
| errs = append(errs, utils.TrackError(fmt.Errorf("failed to get cluster for hypershift state check: %w", err))) | ||
| } else { | ||
| if operationState, hsErr := c.hypershiftClusterOperationState(ctx, existingCluster); hsErr != nil { | ||
| errs = append(errs, utils.TrackError(hsErr)) | ||
| } else { | ||
| operationStates = append(operationStates, operationState) | ||
| } | ||
| } |
4706563 to
d4fc1c7
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: miguelsorianod The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
d4fc1c7 to
ed4a4d3
Compare
| case "syncClusterUpdateDispatch": | ||
| return newSyncClusterUpdateDispatchStep(stepID, stepDir) |
| assert.Equal(t, hash1, hash2) | ||
| } | ||
|
|
||
| func TestClusterUpdatableConfigJSONForHashIsCanonical(t *testing.T) { |
| existingCluster, err := c.resourcesDBClient.HCPClusters(operation.ExternalID.SubscriptionID, operation.ExternalID.ResourceGroupName).Get(ctx, operation.ExternalID.Name) | ||
| if err != nil { | ||
| errs = append(errs, utils.TrackError(fmt.Errorf("failed to get cluster for hypershift state check: %w", err))) | ||
| } else { |
ed4a4d3 to
0605a95
Compare
0605a95 to
7bca90c
Compare
| logger.Info("update dispatch config differs between RP and CS", | ||
| "clusterServiceID", clusterCSID.String(), | ||
| "desiredConfig", desiredConfigJSON, | ||
| "actualConfig", actualConfigJSON, | ||
| ) |
| logger.Info("dispatching cluster autoscaler update to Cluster Service", | ||
| "clusterServiceID", clusterCSID.String(), | ||
| "clusterServiceClusterAutoscalerPayload", clusterAutoscalerPayload, | ||
| ) |
| logger.Info("dispatching cluster update to Cluster Service", | ||
| "clusterServiceID", clusterCSID.String(), | ||
| "clusterServiceClusterPayload", clusterPayload, | ||
| ) |
| // TODO Revisit once we know the internal CIDR blocks once we move the nodes egress lb logic to the RP | ||
| // (or can read the internally set allow-list from Cluster Service) so stale customer entries can be rejected. |
| func clusterUpdateDispatchConfigAuthorizedCIDRsFromCS(in *arohcpv1alpha1.ClusterAPI) []string { | ||
| cidrAccess := in.CIDRBlockAccess() | ||
| if cidrAccess.Empty() { | ||
| return nil | ||
| } | ||
| if cidr := cidrAccess.Allow(); cidr != nil { | ||
| return cidr.Values() | ||
| } | ||
| return nil | ||
| } |
|
@miguelsorianod: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What
Why
Testing
Special notes for your reviewer
PR Checklist