Skip to content

feat: node pool upgrades frontend handler#4224

Merged
openshift-merge-bot[bot] merged 1 commit into
mainfrom
ahitacat/frontend-np-upgrade
Mar 18, 2026
Merged

feat: node pool upgrades frontend handler#4224
openshift-merge-bot[bot] merged 1 commit into
mainfrom
ahitacat/frontend-np-upgrade

Conversation

@ahitacat

Copy link
Copy Markdown
Collaborator

ARO-23988

What

Implement frontend handling for node pool upgrades. Initial draft, to start discussion about the design and implementation.

Why

As a part of the node pool upgrades, we need to implement the handling from the customer petitions.

Special notes for your reviewer

For easy review I am adding commits. Once it is ready to merge I will squash all of them.

@openshift-ci

openshift-ci Bot commented Feb 26, 2026

Copy link
Copy Markdown

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ahitacat ahitacat force-pushed the ahitacat/frontend-np-upgrade branch from dcfe2e6 to 8d6c54b Compare March 10, 2026 10:35
@ahitacat ahitacat changed the title feat: node pool upgrades frontend handle feat: node pool upgrades frontend handler Mar 10, 2026
Comment thread frontend/pkg/frontend/node_pool.go
@ahitacat ahitacat force-pushed the ahitacat/frontend-np-upgrade branch 2 times, most recently from e3701e1 to 5767690 Compare March 10, 2026 12:07
Comment thread internal/admission/admit_nodepool.go Outdated
Comment on lines +60 to +61
// Include all standard node pool admission checks
errs = append(errs, AdmitNodePool(newNodePool, cluster)...)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updates should always be separate from create. Create only looks at the new object. Often a previously valid nodepool can become invalid without updating the nodepool. When this happens, we need to ensure all fields not being modified are allowed to remain invalid. A good example is version, where the control plane version will be changed without touching the API. The forced y-stream upgrade possibility remains, and thus it can happen here.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I thought this AdmitNodePool were checks for both creation and update as this was called in the updateNodePool
https://github.com/Azure/ARO-HCP/pull/4224/changes#diff-30bc17cd7777df0f7bfd82d36a2ccaa4d5e9af7a620b82a115f95ed6ee5913aaL549

Should I create smaller validations one for channelGroup and another one for subnets and a new AdmitNodePoolCreate that will also call them?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow the cluster.version.channel is changed (doesn't matter how or why). Someone updates a nodepool AutoScaling section and nothing else. The update should be allowed because the autoscaling section change is allowed. This call causes a failure, correct? How would you resolve this problem in the code so that doesn't happen?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been addressed

Comment thread internal/admission/admit_nodepool.go Outdated
Comment thread internal/admission/admit_nodepool.go Outdated
Comment thread internal/admission/admit_nodepool.go Outdated
Comment thread internal/admission/admit_nodepool.go Outdated
}

// No major version change
if newVersion.Major != oldVersion.Major {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

followup to make this possible via the experimental features. v5 is now available and ready to install.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be also needed for clusters, and probably some modifications in CS as the imagerelease has changed for v5. I will create a jira issue so we can work on this after upgrades are functional.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be also needed for clusters, and probably some modifications in CS as the imagerelease has changed for v5. I will create a jira issue so we can work on this after upgrades are functional.

I'm fine with also getting a jira card for v5, but for this exact one, if you get an IOU in this PR, I don't want to wait for someone to work on a jira card, I want this line gated.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://redhat.atlassian.net/browse/ARO-25229 the Jira card and the followup PR #4479

Comment thread internal/admission/admit_nodepool.go Outdated
@ahitacat ahitacat force-pushed the ahitacat/frontend-np-upgrade branch from 5767690 to 486ab9e Compare March 10, 2026 15:56
Comment thread frontend/pkg/frontend/node_pool.go
Comment thread internal/admission/admit_nodepool.go Outdated
Comment thread internal/admission/admit_nodepool.go Outdated
Comment thread internal/admission/admit_nodepool.go Outdated
Comment thread internal/admission/admit_nodepool.go Outdated
@ahitacat ahitacat force-pushed the ahitacat/frontend-np-upgrade branch from 486ab9e to 6a89d15 Compare March 11, 2026 10:09
@ahitacat ahitacat marked this pull request as ready for review March 11, 2026 10:14
@openshift-ci openshift-ci Bot requested review from deads2k and geoberle March 11, 2026 10:14
@raelga

raelga commented Mar 11, 2026

Copy link
Copy Markdown
Collaborator

/retest

@ahitacat ahitacat force-pushed the ahitacat/frontend-np-upgrade branch from 6a89d15 to 280ccd1 Compare March 11, 2026 12:44
@ahitacat ahitacat added the ai-assisted AI/LLM tool was used to help create this MR label Mar 11, 2026
@ahitacat

Copy link
Copy Markdown
Collaborator Author

@deads2k to use well the service_providers I needed to refactor them and move it to the internal module.

@ahitacat ahitacat force-pushed the ahitacat/frontend-np-upgrade branch from 2ea2c76 to 5ddc3d7 Compare March 11, 2026 14:44
// TODO: We may relax this constraint in the future
clusterActiveVersions := spCluster.Status.ControlPlaneVersion.ActiveVersions
if len(clusterActiveVersions) > 0 {
lowestControlPlane := clusterActiveVersions[len(clusterActiveVersions)-1].Version

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We allow z-streams of clusters to move backwards.

fast followup (as in Monday next week), correct this to read all clusterActiveVersions and find the actual lowest.

I bet hypershift handles the situation poorly.

  1. separate from this PR, write test that
    1. installs control plane 4.y.z-5
    2. upgrades control plane 4.y.z-3
    3. installs nodepool 4.y.z-3
    4. backlevels control plane to 4.y.z-5 (this is legal and explicitly agreed with Scott Dodson to enable fearless automated upgrade
    5. scales nodepool up
    6. changes nodepool to 4.y.z-5
  2. if it fails, HCP needs to update
  3. once HCP updates to acknowledge nodepools can be on later z-streams than control planes, challenge the restriction to be LE in the common case.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follow up work #4481

Comment on lines +122 to +123
highestActive := nodePoolActiveVersions[0].Version
lowestActive := nodePoolActiveVersions[len(nodePoolActiveVersions)-1].Version

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scan the list and get this accurate even if we later allow backleveling (we will).

I'll accept a fast-followup on Monday.

}
// No major version change
// TODO: Add support for major version upgrades (e.g., 4.20 → 5.0) when needed
if newVersion.Major != lowestActive.Major {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start gating these block with the experimental AFEC. We want to be testing this in April to coordinate with OCP.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#4479

Added, I also added a comment that it might require more additional work. We can continue the major change in that PR.

// Check if the newVersion <= control plane versions
// TODO: We may relax this constraint in the future
clusterActiveVersions := spCluster.Status.ControlPlaneVersion.ActiveVersions
if len(clusterActiveVersions) > 0 {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing a 4.y-2 check on all control plane versions.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check failing means the node pool is already under that skew (as we always allow upgrade to greater versions). I think this change should be part of the node pool creation and the control plane upgrade. I will create a followup PR to add it in those situations.

// Check if the newVersion <= control plane versions
// TODO: We may relax this constraint in the future
clusterActiveVersions := spCluster.Status.ControlPlaneVersion.ActiveVersions
if len(clusterActiveVersions) > 0 {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also check 4.y-2 on the desired version of the control plane.

@deads2k

deads2k commented Mar 13, 2026

Copy link
Copy Markdown
Collaborator

This should be also needed for clusters, and probably some modifications in CS as the imagerelease has changed for v5. I will create a jira issue so we can work on this after upgrades are functional.

I'm fine with also getting a jira card for v5, but for this exact one, if you get an IOU in this PR, I don't want to wait for someone to work on a jira card, I want this line gated.

If you can agree to the followups and timeframe (early next week for capability ones), I'm happy to approve now. If you can't, I'll hold this PR to get them updated prior to merge.

@ahitacat ahitacat force-pushed the ahitacat/frontend-np-upgrade branch 2 times, most recently from 619600b to 89dbea9 Compare March 16, 2026 10:53
@ahitacat

Copy link
Copy Markdown
Collaborator Author

/test e2e-parallel

@deads2k

deads2k commented Mar 16, 2026

Copy link
Copy Markdown
Collaborator

/lgtm
/approve

@ahitacat

Copy link
Copy Markdown
Collaborator Author

/test e2e-parallel

2 similar comments
@machi1990

Copy link
Copy Markdown
Collaborator

/test e2e-parallel

@ahitacat

Copy link
Copy Markdown
Collaborator Author

/test e2e-parallel

This commit also adds node pool upgrade frontend tests
and unit tests.

Signed-off-by: Alba Hita Catala <ahitacat@redhat.com>
@ahitacat ahitacat force-pushed the ahitacat/frontend-np-upgrade branch from 89dbea9 to 0d2c310 Compare March 17, 2026 09:49
@openshift-ci openshift-ci Bot removed the lgtm label Mar 17, 2026
@machi1990

Copy link
Copy Markdown
Collaborator

/lgtm

The new changes are rebasing following the merge of another PR #4433

@openshift-ci

openshift-ci Bot commented Mar 17, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahitacat, deads2k, machi1990

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ahitacat

Copy link
Copy Markdown
Collaborator Author

/test e2e-parallel

@openshift-merge-bot openshift-merge-bot Bot merged commit 85e5125 into main Mar 18, 2026
22 checks passed
@openshift-merge-bot openshift-merge-bot Bot deleted the ahitacat/frontend-np-upgrade branch March 18, 2026 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-assisted AI/LLM tool was used to help create this MR approved lgtm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants