Skip to content

feat: add node pool vs control plane version skew validations during nodepool creation#4566

Open
machi1990 wants to merge 1 commit into
Azure:mainfrom
machi1990:feat/validate-node-control-plane-skew
Open

feat: add node pool vs control plane version skew validations during nodepool creation#4566
machi1990 wants to merge 1 commit into
Azure:mainfrom
machi1990:feat/validate-node-control-plane-skew

Conversation

@machi1990

@machi1990 machi1990 commented Mar 21, 2026

Copy link
Copy Markdown
Collaborator

What

feat: add node pool vs control plane version skew validations during nodepool creation

This PR adds node pool vs control plane version skew validations during nodepool creation and updates by integrating serviceProviderCluster data into the admission validation flow.

Dependencies: Requires #5634 to merge first (adds required test fixtures for integration tests).

Changes:

  • Implements admission logic for node pool creation validating version skew against cluster desired version and active control plane versions
  • Extends update validation to enforce n-2 minor skew vs cluster desired version
  • Fetches serviceProviderCluster data during nodepool operations to access active control plane versions
  • Supports cross-major version compatibility during OpenShift 4.x to 5.x transitions
  • Updates integration test artifacts for version validation scenarios

Why

While working on #4505 and further reviewing #4554, I realized the frontend was missing version skew validation during nodepool creation. Without this validation, users could create nodepools with versions incompatible with the control plane, leading to runtime failures.

Why serviceProviderCluster is required:
The validation needs access to active control plane versions (not just the cluster's desired version) because:

  • During control plane upgrades, multiple versions may be active simultaneously
  • Nodepools must remain compatible with the lowest active CP version (cannot exceed it)
  • Nodepools can be created during CP upgrades and must respect the active version range

The PR adds the skew validation by performing the following rules:

Node Pool Creation:

  • Validates node pool version is within n-2 minor version skew from cluster desired version
  • Validates node pool version is within n-2 minor version skew from highest active control plane version (when available)
  • Enforces cross-major version compatibility via allowlist

Node Pool Updates:

  • Validates against cluster desired version with n-2 skew
  • Validates against active control plane versions (prevents downgrades, skipping minor versions)
  • Enforces cross-major version compatibility via allowlist

Cross-Major Version Transitions:

  • Allowlist-based validation for OpenShift 4.x to 5.x transitions
  • Current allowlist: CP 5.0 supports node pools 4.21/4.22, CP 5.1 supports 4.22/4.23, CP 5.2 supports 4.23
  • Major version upgrades require FeatureExperimentalReleaseFeatures feature flag

Special notes for your reviewer

Merge order: This PR depends on #5634 merging first. PR #5634 adds the required serviceProviderCluster test fixtures that this validation logic consumes during integration tests.

Cross-major version transitions: Currently gated behind the FeatureExperimentalReleaseFeatures AFEC flag as implemented in #4479.

Key implementation details:

  • AdmitNodePool now requires ServiceProviderCluster in the admission context to validate against active control plane versions
  • For CREATE operations, validateNodePoolVersionOnCreate validates against both lowest CP version (cannot exceed it) and highest CP version (n-2 skew)
  • For UPDATE operations, ValidateNodePoolVersionChange performs additional checks including upgrade path validation and minor version skip prevention
  • Version skew logic consolidated in ValidateNodePoolVersionSkew with consistent error messages

What to review:

  • frontend/pkg/frontend/node_pool.go: Fetches serviceProviderCluster via GetOrCreateServiceProviderCluster before admission
  • internal/admission/admit_nodepool.go: New validateNodePoolVersionOnCreate function and refactored admission context
  • internal/validation/validators.go: Cross-major validation logic with allowlist-based compatibility checks
  • Integration test artifacts reflect new validation behavior (see fix: add serviceProviderCluster fixtures and fix test step ordering #5634 for test fixture additions)

@machi1990 machi1990 requested review from ahitacat and deads2k March 21, 2026 12:24
@openshift-ci openshift-ci Bot requested a review from mbarnes March 21, 2026 12:24
@machi1990 machi1990 force-pushed the feat/validate-node-control-plane-skew branch 2 times, most recently from 69d8a82 to 3db9304 Compare March 23, 2026 07:48
@machi1990

Copy link
Copy Markdown
Collaborator Author

/hold

investigating test failures

Comment thread internal/admission/admit_nodepool.go Outdated
@machi1990 machi1990 force-pushed the feat/validate-node-control-plane-skew branch 2 times, most recently from 5ea8a73 to 05658ee Compare March 23, 2026 10:18
@machi1990

Copy link
Copy Markdown
Collaborator Author

/unhold

integration + unit test happy

Comment thread frontend/pkg/frontend/node_pool.go Outdated
Comment thread internal/admission/admit_nodepool.go Outdated
Comment thread internal/admission/admit_nodepool.go Outdated
Comment thread frontend/pkg/frontend/node_pool.go Outdated
Comment thread internal/admission/admit_nodepool.go Outdated
Comment thread internal/admission/admit_nodepool.go Outdated
Comment thread internal/admission/admit_nodepool.go Outdated
@machi1990

Copy link
Copy Markdown
Collaborator Author

/hold

Needs e2e based on #4566 (comment)

Comment thread internal/admission/admit_nodepool_test.go Outdated
Comment thread internal/admission/admit_nodepool.go Outdated
@machi1990

Copy link
Copy Markdown
Collaborator Author

@cgiradkar will take over this PR and add integration/e2e tests and address review comments.

@cgiradkar cgiradkar force-pushed the feat/validate-node-control-plane-skew branch 2 times, most recently from 85cd703 to 3cc0bc4 Compare March 25, 2026 11:29
Comment thread test/e2e/nodepool_admission_validation.go Outdated
@openshift-ci openshift-ci Bot added the lgtm label Apr 24, 2026
@openshift-ci

openshift-ci Bot commented Apr 24, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: JameelB, machi1990
Once this PR has been reviewed and has the lgtm label, please assign mmazur for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Comment thread internal/admission/admit_nodepool.go Outdated
fieldPath := field.NewPath("properties", "version", "id")

// Always validate against cluster desired version with n-2 skew
errs = append(errs, validateNodePoolDesiredMinorSkew(np.Properties.Version.ID, cluster.CustomerProperties.Version.ID, 2, fieldPath, op)...)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment explaining why this is for create only. this looks correct for an update too.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. I added this check to validateNodePoolVersionUpgrade also.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduced an additional error for "no downgrade check". The other position where this is checked is: https://github.com/Azure/ARO-HCP/blob/a32ec818b69b5b8774f7fd50c34249fff2886c1f/internal/validation/validators.go#L847
so I increased the expected error count in the unit test.

Comment thread internal/admission/admit_nodepool.go Outdated
// Node pool can trail by at most 2 minors from the highest active version (n-2)
errs = append(errs, validateNodePoolDesiredMinorSkew(np.Properties.Version.ID, highestClusterVersion.String(), 2, fieldPath, op)...)

// If control plane is upgrading (min != max), apply stricter n-1 skew for min version

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment and code are incorrect. it's still n-2.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

carefully looking into this, applying n-1 skew check doesnt make it stricter, it is redundant.
@machi1990 what was the intent behind the check:
// If control plane is upgrading (min != max), apply stricter n-1 skew for min version?

  1. n-1 relative to lowest active version (which mathematically doesnt make sense)
  2. n-1 relative to highest active version (which makes the check stricter )

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The n-1 skew was added going by hypershift-control-plane-version-status enhancement version skew strategy
Copying the content inline

NodePool version skew computation:

Walk history to find all versions with State == Partial (still active on some components). The union of these versions plus the last Completed version determines the range of active control plane versions.
Allowed NodePool versions are constrained by the n-1/n-2 skew policy relative to the lowest active control plane version (min) and the highest active version (max).

notice the n-1 skip relative to the lowest active control plane version.

However as we don't allow minor control plane upgrade skip; it would've only made sense if we allowed skipping a minor when upgrading the control plane.
The lowest-1 skew might be redundant and it'll always be a -2 in that regard overall. The check for version-2 skew is something we already perform with the highest active and customer desired (to catch for any ongoing upgrade).
Keeping it does no harm, but it is confusing: this thread is a proof of that confusion and this #4566 (comment)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed with this check being redundant. I'm in favor of removing it then.

Comment thread internal/admission/admit_nodepool.go Outdated

// If control plane active versions are available, perform additional checks
// Node pool version can't exceed the lowest active control plane version
errs = append(errs, validateNodePoolVersionNotAfterLowestActive(np.Properties.Version.ID, lowestClusterVersion, fieldPath)...)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validateNodePoolDesiredMinorSkew also checks // Node pool cannot be newer than cluster, so why is this needed since we check highest and lowest?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This checks patch level version BUT now removing this check since OpenShift has committed to allow a node pool to be a z-stream ahead of the cluster within the same minor

@cgiradkar cgiradkar force-pushed the feat/validate-node-control-plane-skew branch from 44be4e4 to 1fbb7d1 Compare April 27, 2026 20:06
@openshift-ci openshift-ci Bot removed the lgtm label Apr 27, 2026
@openshift-ci

openshift-ci Bot commented Apr 27, 2026

Copy link
Copy Markdown

New changes are detected. LGTM label has been removed.

desiredVersion: "4.18.0",
expectError: "cannot downgrade",
expectErrorCount: 2,
expectErrorCount: 3,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raelga

raelga commented May 4, 2026

Copy link
Copy Markdown
Collaborator

/retest

@openshift-ci

openshift-ci Bot commented May 4, 2026

Copy link
Copy Markdown

@machi1990: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/cspr 2f8c9fe link true /test cspr

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copilot AI review requested due to automatic review settings May 6, 2026 13:04
@cgiradkar cgiradkar force-pushed the feat/validate-node-control-plane-skew branch from 1fbb7d1 to 72aea9c Compare May 6, 2026 13:04

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds node pool vs control plane version-skew admission validation to prevent invalid node pool versions during create/update, and updates integration artifacts to cover these scenarios.

Changes:

  • Introduces create-time admission validation for node pool version skew against cluster desired version and highest active control plane version.
  • Extends update-time validation to enforce n-2 minor skew vs cluster desired version in addition to existing upgrade constraints.
  • Updates/adds integration test artifacts for downgrade and “skew minus 3” rejection cases.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
internal/admission/admit_nodepool.go Adds create-time node pool version skew validation and refactors common admission logic for create/update.
internal/admission/admit_nodepool_test.go Updates tests for refactored admission helpers and adds create-time skew/cross-major test cases.
frontend/pkg/frontend/node_pool.go Hooks create path into new admission entrypoint and fetches ServiceProviderCluster for skew validation.
test-integration/frontend/artifacts/FrontendCRUD/NodePool/version-downgrade/04-httpCreate-nodepool/nodepool.json Updates the downgrade scenario’s initial node pool version used by the integration test.
test-integration/frontend/artifacts/FrontendCRUD/NodePool/version-downgrade/06-loadCosmos-serviceProviderNodePool/serviceProviderNodePool.json Adjusts stored service-provider node pool desired/active versions for the downgrade scenario.
test-integration/frontend/artifacts/FrontendCRUD/NodePool/version-downgrade/07-httpReplace-nodepool-downgrade/nodepool.json Changes the downgrade target version used by the integration test.
test-integration/frontend/artifacts/FrontendCRUD/NodePool/version-downgrade/07-httpReplace-nodepool-downgrade/expected-error.txt Updates the expected error message for the downgrade integration test.
test-integration/frontend/artifacts/FrontendCRUD/NodePool/reject-nodepool-creation-version-skew-minus-3/** Adds a new negative integration test case asserting version-skew rejection behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

"version": {
"channelGroup": "stable",
"id": "4.20.8"
"id": "4.20.0"
@@ -1 +1 @@
cannot downgrade from version 4.21.0 to 4.20.8 No newline at end of file
cannot downgrade from version 4.20.8 to 4.20.0 No newline at end of file
Comment thread internal/admission/admit_nodepool.go Outdated
Comment on lines +84 to +85
lowestClusterVersion, highestClusterVersion := apihelpers.FindLowestAndHighestClusterVersion(spCluster.Status.ControlPlaneVersion.ActiveVersions)

Comment thread internal/admission/admit_nodepool.go Outdated
Comment on lines +179 to +182
// Check only if it is a creating nodepool or a change in the channelGroup
if (oldNodePool == nil || newNodePool.Properties.Version.ChannelGroup != oldNodePool.Properties.Version.ChannelGroup) &&
newNodePool.Properties.Version.ChannelGroup != cluster.CustomerProperties.Version.ChannelGroup {
errs = append(errs, field.Invalid(
field.NewPath("properties", "version", "channelGroup"),
newNodePool.Properties.Version.ChannelGroup,
channelGroupChanged := op.Type == operation.Create || newNodePool.Properties.Version.ChannelGroup != oldNodePool.Properties.Version.ChannelGroup
if channelGroupChanged && newNodePool.Properties.Version.ChannelGroup != cluster.CustomerProperties.Version.ChannelGroup {
errs = append(errs, field.Invalid(field.NewPath("properties", "version", "channelGroup"), newNodePool.Properties.Version.ChannelGroup,
Comment thread internal/admission/admit_nodepool.go Outdated
Comment on lines +239 to +244
// validateNodePoolVersionUpgrade validates that a node pool version change is a valid upgrade.
// It checks:
// - No downgrade: new version >= old version
// - No downgrade: new version >= desired version
// - No major version change: new major == old major (unless FeatureExperimentalReleaseFeatures is registered)
// - No skipping minor versions: new minor <= old minor + 1
// - Cannot exceed cluster version: new version <= cluster version
func validateNodePoolVersionUpgrade(newNodePool, oldNodePool *api.HCPOpenShiftClusterNodePool, spNodePool *api.ServiceProviderNodePool, spCluster *api.ServiceProviderCluster, op operation.Operation) field.ErrorList {
// - Within n-2 minor version skew of cluster's desired version
Comment thread frontend/pkg/frontend/node_pool.go Outdated
Comment on lines 283 to 291
serviceProviderCluster, err := database.GetOrCreateServiceProviderCluster(ctx, f.dbClient, resourceID.Parent)
if err != nil {
return utils.TrackError(err)
}

validationErrs := validation.ValidateNodePool(ctx, restOperation, newInternalNodePool, nil)
// in addition to static validation, we have validation based on the state of the hcp cluster
validationErrs = append(validationErrs, admission.AdmitNodePool(newInternalNodePool, nil, cluster)...)
validationErrs = append(validationErrs, admission.AdmitNodePoolCreate(newInternalNodePool, cluster, serviceProviderCluster, restOperation)...)
if err := arm.CloudErrorFromFieldErrors(validationErrs); err != nil {
Comment thread internal/admission/admit_nodepool.go Outdated

lowestClusterVersion, highestClusterVersion := apihelpers.FindLowestAndHighestClusterVersion(spCluster.Status.ControlPlaneVersion.ActiveVersions)

if lowestClusterVersion == nil || highestClusterVersion == nil {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why aren't we checking the ones we can

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is check is because; they either will be set all of them or not set.
If one active version then; both are set & lowest == highest
If more than one then ; both are set & lowest < highest
If no active version; then none is set i.e lowest = nil & highest=nil

Comment thread internal/admission/admit_nodepool.go Outdated
}

// Node pool can trail by at most 2 minors from the highest active version (n-2)
errs = append(errs, validateNodePoolDesiredMinorSkew(np.Properties.Version.ID, highestClusterVersion.String(), 2, fieldPath)...)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgot to check the lowest. validateNodePoolDesiredMinorSkew is also checking for "not greater than" which is critical against lowest.

expectError: "skipping minor versions is not allowed",
},
{
name: "cannot exceed cluster version",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was this test removed?

errSubstring: "cannot be more than 2 minors below",
},
{
name: "node pool version cannot exceed cluster version",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing the test where spCluster has actual versions that are lower

expectErrors: []utils.ExpectedError{},
},
// Multi-element activeVersions tests
{

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this is removed?

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

Comment on lines +289 to 301
serviceProviderCluster, err := database.GetOrCreateServiceProviderCluster(ctx, f.resourcesDBClient, resourceID.Parent)
if err != nil {
return utils.TrackError(err)
}

restOperation := operation.Operation{
Type: operation.Create,
Options: validation.AFECsToValidationOptions(subscription.GetRegisteredFeatures()),
}
admissionContext, err := f.newNodePoolAdmissionContext(ctx, restOperation, cluster, nil, nil)
admissionContext, err := f.newNodePoolAdmissionContext(ctx, restOperation, cluster, serviceProviderCluster, nil)
if err != nil {
return utils.TrackError(err)
}
Comment on lines +832 to +903
func TestAdmitNodePoolCreate_UpgradingControlPlane(t *testing.T) {
t.Parallel()
const clusterVNetSubnetARM = "/subscriptions/12345678-1234-1234-1234-123456789012/resourceGroups/rg/providers/Microsoft.Network/virtualNetworks/vnet/subnets/snet"
clusterSubnetID := api.Must(azcorearm.ParseResourceID(clusterVNetSubnetARM))

generateCluster := func(versionID string) *api.HCPOpenShiftCluster {
return &api.HCPOpenShiftCluster{
CustomerProperties: api.HCPOpenShiftClusterCustomerProperties{
Version: api.VersionProfile{
ID: versionID,
ChannelGroup: "stable",
},
Platform: api.CustomerPlatformProfile{
SubnetID: clusterSubnetID,
},
},
}
}

tests := []struct {
name string
clusterVersionID string
nodePoolVersion string
cpVersions []string
wantErr bool
errSubstring string
}{
{
name: "upgrading CP: node pool at lowest active is valid",
clusterVersionID: "4.21",
nodePoolVersion: "4.20.9",
cpVersions: []string{"4.20.9", "4.21.0"},
wantErr: false,
},
}

for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
t.Parallel()
ver := semver.MustParse("4.17.0")
cluster := generateCluster(tt.clusterVersionID)
nodePool := &api.HCPOpenShiftClusterNodePool{
Properties: api.HCPOpenShiftClusterNodePoolProperties{
Version: api.NodePoolVersionProfile{
ID: tt.nodePoolVersion,
ChannelGroup: "stable",
},
Platform: api.NodePoolPlatformProfile{
SubnetID: clusterSubnetID,
},
},
}
spCluster := serviceProviderClusterWithVersions(t, tt.cpVersions)
spNodePool := &api.ServiceProviderNodePool{
Spec: api.ServiceProviderNodePoolSpec{
NodePoolVersion: api.ServiceProviderNodePoolSpecVersion{
DesiredVersion: &ver,
},
},
Status: api.ServiceProviderNodePoolStatus{
NodePoolVersion: api.ServiceProviderNodePoolStatusVersion{
ActiveVersions: []api.HCPNodePoolActiveVersion{{Version: &ver}},
},
},
}
op := operation.Operation{Type: operation.Create}

errs := AdmitNodePool(context.Background(), &NodePoolAdmissionContext{
Cluster: cluster,
ServiceProviderNodePool: spNodePool,
ServiceProviderCluster: spCluster,
}, op, nodePool, nil)
Comment on lines +13 to +16
"version": {
"channelGroup": "stable",
"id": "4.19.7"
}

@ahitacat ahitacat left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

Comment thread frontend/pkg/frontend/node_pool.go Outdated
// For UPDATE operations, these parameters are required and the function will fail if they're nil.
// For CREATE operations, these can be nil since no prior state exists.
// For CREATE operations, spNodePool should be nil (no prior state exists), and spCluster
// is optional (used for channel group validation but not version skew checks).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serviceProvideCluster should be use for version skew validation, do not make it optional.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Comment thread internal/admission/admit_nodepool.go Outdated
// NodePoolAdmissionContext carries dependencies that node pool mutation/admission needs
// beyond the node pool object itself. It includes the parent cluster and optionally
// the service provider cluster and nodepool (for update-specific validations like version upgrades).
// the service provider cluster (for version skew validation on update) and

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// the service provider cluster (for version skew validation on update) and
// the service provider cluster (for version skew validation) and

It should be used also for creation.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applied suggestion

Comment thread internal/admission/admit_nodepool.go Outdated
lowestCPVersion, highestCPVersion := apihelpers.FindLowestAndHighestClusterVersion(spCluster.Status.ControlPlaneVersion.ActiveVersions)
if lowestCPVersion == nil || highestCPVersion == nil {
return errs
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed. It will be different from nil.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Comment thread internal/admission/admit_nodepool.go Outdated
lowestCPVersion, highestCPVersion := apihelpers.FindLowestAndHighestClusterVersion(spCluster.Status.ControlPlaneVersion.ActiveVersions)
if lowestCPVersion == nil || highestCPVersion == nil {
return nil
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed. It will be different from nil.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

}

func ValidateNodePoolVersionSkew(desiredVersion semver.Version, lowestCPVersion, highestCPVersion *semver.Version, allowMajorUpgrade bool) error {
if lowestCPVersion == nil || highestCPVersion == nil {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed. It will be different from nil

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this defensive check is needed to pass the tests. The error arises due to activeVersions being empty

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (1)

internal/admission/admit_nodepool.go:212

  • validateNodePoolVersionOnUpdate assumes both ServiceProviderNodePool and ServiceProviderCluster are non-nil. If the admission context is missing either dependency (e.g., a future caller that uses AdmitNodePool directly), this will panic once version validation runs.
	spNodePool, spCluster := admissionContext.ServiceProviderNodePool, admissionContext.ServiceProviderCluster
	// Skip validation if no version is specified or version didn't change
	if len(newObj.ID) == 0 || newObj.ID == oldObj.ID {
		return nil
	}

Comment on lines +932 to +943
// Cross-major change (upgrade or downgrade): gated behind FeatureExperimentalReleaseFeatures.
isCrossMajorUpgrade := lowest != nil && desiredVersion.Major > lowest.Major
isCrossMajorDowngrade := highest != nil && desiredVersion.Major < highest.Major
if isCrossMajorUpgrade || isCrossMajorDowngrade {
if !allowMajorUpgrade {
return fmt.Errorf("major version changes are not supported")
}
if isCrossMajorUpgrade {
return ValidateMajorUpgrade(*lowest, desiredVersion)
}
return ValidateCrossMajorNodePoolSkew(desiredVersion, *highestCPVersion)
}
Comment thread internal/admission/admit_nodepool.go Outdated
Comment on lines +186 to +191
spCluster := admissionContext.ServiceProviderCluster

lowestCPVersion, highestCPVersion := apihelpers.FindLowestAndHighestClusterVersion(spCluster.Status.ControlPlaneVersion.ActiveVersions)
// if lowestCPVersion == nil || highestCPVersion == nil {
// return errs
// }
Comment on lines 228 to +232
lowestCPVersion, highestCPVersion := apihelpers.FindLowestAndHighestClusterVersion(spCluster.Status.ControlPlaneVersion.ActiveVersions)
// if lowestCPVersion == nil || highestCPVersion == nil {
// return nil
// }

Comment on lines +173 to +175
// validateNodePoolVersionOnCreate validates the requested node pool version at CREATE time.
// At CREATE time, only basic validations apply (format, minimum version).
func validateNodePoolVersionOnCreate(ctx context.Context, admissionContext *NodePoolAdmissionContext, op operation.Operation, fldPath *field.Path, newObj *api.NodePoolVersionProfile) field.ErrorList {
Comment on lines +236 to 242
if spCluster == nil && op.Type == operation.Update {
return nil, fmt.Errorf("serviceProviderCluster is required for %v operations", op.Type)
}

if spNodePool == nil && op.Type == operation.Update {
return nil, fmt.Errorf("serviceProviderNodePool is required for %v operations", op.Type)
}
…plane versions

Implements node pool version validation requirements by adding service provider
cluster context to admission logic, enabling validation against active control
plane versions.

Validation Requirements:
- Enforce n-2 minor version skew between node pool and cluster desired version
- Validate against active control plane versions (post-install)
- Support cross-major version compatibility

AI assisted code

Signed-off-by: Chetan Giradkar <cgiradka@redhat.com>
Comment on lines +236 to +238
if spCluster == nil && op.Type == operation.Update {
return nil, fmt.Errorf("serviceProviderCluster is required for %v operations", op.Type)
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the comment above, should spCluster always be provided no matter the operation type?

Comment on lines +98 to +99
// the nodepool instance itself. For create operations, include ServiceProviderCluster in the admissionContext to enable
// version skew validation. For update operations with version changes, include both ServiceProviderNodePool and

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment makes it seem like the validation execution is based on these parameters not on the actual type of the operation. Instead, wdyt of updating it to something like this?

For update operations, ServiceProviderNodePool must be specified in the admissionContext
  • Make the doc more about the parameters required rather than validation enablement.
  • The ServiceProviderCluster must always be specified on both scenarios so no need to call it out here.

Comment on lines 982 to 984
if !allowMajorUpgrade {
return fmt.Errorf("major version changes are not supported")
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it checking if major upgrade is allowed here? what happens if this is called on create where cp is on 5.x and the np to be created is on 4.x?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit complex. Creating a cp with 5.x is only allowed under ExperimentalFeature flag. So the allowMajorUpgrade will be mostly true unless you unset that flag after the cluster creation.
What should we do in that case?

I think avoid to use it could be an option here, blocking the creation of a node pool if the subscription don't longer hold that AFEC.

On the other hand we can allow creating node pool in that case we can omit the AFEC flag check.

I don't have a strong opinion though.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the naming of allowMajorUpgrade confused me here since this is called on both create and update. Outside the context of update, it doesn't make sense.

What is the goal? Is it to block the following for subs without this afec flag:

  • creation of 5.x np?
  • upgrade from 4.x to 5.x?
  • downgrade from 5.x to 4.x?

If so, blocking the creation of a 4.x node pool for a 5.x cluster if the subscription no longer holds that afec doesn't make sense no? I would expect it to always be created since a sub without this afec flag should always be able to create a 4.x np, provided its within the valid skew, no matter the cp version. Maybe i'm missing something else?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After syncing with @ahitacat and @cgiradkar, this check is being done as we do not allow cross major np/cp creation, update or downgrade unless it is done with a sub that has that afec flag enabled. This is an existing behaviour and is the same for other offerings. Reached an agreement to change the name and update the comment above to clarify this requirement.

@openshift-ci

openshift-ci Bot commented Jun 13, 2026

Copy link
Copy Markdown

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

}
spCluster := admissionContext.ServiceProviderCluster

lowestCPVersion, highestCPVersion := apihelpers.FindLowestAndHighestClusterVersion(spCluster.Status.ControlPlaneVersion.ActiveVersions)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can mean it could return nil, nil. Are we aware of this chance, and what would be the resulting behavior in that case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants