feat: node pool upgrades frontend handler#4224
Conversation
|
Skipping CI for Draft Pull Request. |
dcfe2e6 to
8d6c54b
Compare
e3701e1 to
5767690
Compare
| // Include all standard node pool admission checks | ||
| errs = append(errs, AdmitNodePool(newNodePool, cluster)...) |
There was a problem hiding this comment.
updates should always be separate from create. Create only looks at the new object. Often a previously valid nodepool can become invalid without updating the nodepool. When this happens, we need to ensure all fields not being modified are allowed to remain invalid. A good example is version, where the control plane version will be changed without touching the API. The forced y-stream upgrade possibility remains, and thus it can happen here.
There was a problem hiding this comment.
Agree, I thought this AdmitNodePool were checks for both creation and update as this was called in the updateNodePool
https://github.com/Azure/ARO-HCP/pull/4224/changes#diff-30bc17cd7777df0f7bfd82d36a2ccaa4d5e9af7a620b82a115f95ed6ee5913aaL549
Should I create smaller validations one for channelGroup and another one for subnets and a new AdmitNodePoolCreate that will also call them?
There was a problem hiding this comment.
Somehow the cluster.version.channel is changed (doesn't matter how or why). Someone updates a nodepool AutoScaling section and nothing else. The update should be allowed because the autoscaling section change is allowed. This call causes a failure, correct? How would you resolve this problem in the code so that doesn't happen?
There was a problem hiding this comment.
This has been addressed
| } | ||
|
|
||
| // No major version change | ||
| if newVersion.Major != oldVersion.Major { |
There was a problem hiding this comment.
followup to make this possible via the experimental features. v5 is now available and ready to install.
There was a problem hiding this comment.
This should be also needed for clusters, and probably some modifications in CS as the imagerelease has changed for v5. I will create a jira issue so we can work on this after upgrades are functional.
There was a problem hiding this comment.
This should be also needed for clusters, and probably some modifications in CS as the imagerelease has changed for v5. I will create a jira issue so we can work on this after upgrades are functional.
I'm fine with also getting a jira card for v5, but for this exact one, if you get an IOU in this PR, I don't want to wait for someone to work on a jira card, I want this line gated.
There was a problem hiding this comment.
https://redhat.atlassian.net/browse/ARO-25229 the Jira card and the followup PR #4479
5767690 to
486ab9e
Compare
486ab9e to
6a89d15
Compare
|
/retest |
6a89d15 to
280ccd1
Compare
|
@deads2k to use well the service_providers I needed to refactor them and move it to the internal module. |
2ea2c76 to
5ddc3d7
Compare
| // TODO: We may relax this constraint in the future | ||
| clusterActiveVersions := spCluster.Status.ControlPlaneVersion.ActiveVersions | ||
| if len(clusterActiveVersions) > 0 { | ||
| lowestControlPlane := clusterActiveVersions[len(clusterActiveVersions)-1].Version |
There was a problem hiding this comment.
We allow z-streams of clusters to move backwards.
fast followup (as in Monday next week), correct this to read all clusterActiveVersions and find the actual lowest.
I bet hypershift handles the situation poorly.
- separate from this PR, write test that
- installs control plane 4.y.z-5
- upgrades control plane 4.y.z-3
- installs nodepool 4.y.z-3
- backlevels control plane to 4.y.z-5 (this is legal and explicitly agreed with Scott Dodson to enable fearless automated upgrade
- scales nodepool up
- changes nodepool to 4.y.z-5
- if it fails, HCP needs to update
- once HCP updates to acknowledge nodepools can be on later z-streams than control planes, challenge the restriction to be LE in the common case.
| highestActive := nodePoolActiveVersions[0].Version | ||
| lowestActive := nodePoolActiveVersions[len(nodePoolActiveVersions)-1].Version |
There was a problem hiding this comment.
scan the list and get this accurate even if we later allow backleveling (we will).
I'll accept a fast-followup on Monday.
| } | ||
| // No major version change | ||
| // TODO: Add support for major version upgrades (e.g., 4.20 → 5.0) when needed | ||
| if newVersion.Major != lowestActive.Major { |
There was a problem hiding this comment.
start gating these block with the experimental AFEC. We want to be testing this in April to coordinate with OCP.
There was a problem hiding this comment.
Added, I also added a comment that it might require more additional work. We can continue the major change in that PR.
| // Check if the newVersion <= control plane versions | ||
| // TODO: We may relax this constraint in the future | ||
| clusterActiveVersions := spCluster.Status.ControlPlaneVersion.ActiveVersions | ||
| if len(clusterActiveVersions) > 0 { |
There was a problem hiding this comment.
missing a 4.y-2 check on all control plane versions.
There was a problem hiding this comment.
This check failing means the node pool is already under that skew (as we always allow upgrade to greater versions). I think this change should be part of the node pool creation and the control plane upgrade. I will create a followup PR to add it in those situations.
| // Check if the newVersion <= control plane versions | ||
| // TODO: We may relax this constraint in the future | ||
| clusterActiveVersions := spCluster.Status.ControlPlaneVersion.ActiveVersions | ||
| if len(clusterActiveVersions) > 0 { |
There was a problem hiding this comment.
Also check 4.y-2 on the desired version of the control plane.
I'm fine with also getting a jira card for v5, but for this exact one, if you get an IOU in this PR, I don't want to wait for someone to work on a jira card, I want this line gated. If you can agree to the followups and timeframe (early next week for capability ones), I'm happy to approve now. If you can't, I'll hold this PR to get them updated prior to merge. |
619600b to
89dbea9
Compare
|
/test e2e-parallel |
|
/lgtm |
|
/test e2e-parallel |
2 similar comments
|
/test e2e-parallel |
|
/test e2e-parallel |
This commit also adds node pool upgrade frontend tests and unit tests. Signed-off-by: Alba Hita Catala <ahitacat@redhat.com>
89dbea9 to
0d2c310
Compare
|
/lgtm The new changes are rebasing following the merge of another PR #4433 |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahitacat, deads2k, machi1990 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/test e2e-parallel |
ARO-23988
What
Implement frontend handling for node pool upgrades. Initial draft, to start discussion about the design and implementation.
Why
As a part of the node pool upgrades, we need to implement the handling from the customer petitions.
Special notes for your reviewer
For easy review I am adding commits. Once it is ready to merge I will squash all of them.