Skip to content

fix: add serviceProviderCluster fixtures and fix test step ordering#5634

Open
cgiradkar wants to merge 2 commits into
mainfrom
svc-provider-cluster-fix
Open

fix: add serviceProviderCluster fixtures and fix test step ordering#5634
cgiradkar wants to merge 2 commits into
mainfrom
svc-provider-cluster-fix

Conversation

@cgiradkar

Copy link
Copy Markdown
Collaborator

Summary

This PR fixes integration test failures caused by missing test data required for nodepool version validation. The core issue was that nodepool creation tests lacked serviceProviderCluster fixtures containing cluster version information, causing validation to attempt dereferencing nil pointers and crashing the test server.

Context: Why These Changes Are Needed

When nodepool version validation was introduced (to prevent version skew between nodepools and control plane), the validation logic began requiring serviceProviderCluster data to check active cluster versions. Many existing integration tests predated this requirement and never had this fixture data, causing them to fail with cryptic EOF errors when the validation code panicked.

Problem Investigation Summary

The EOF errors were masking the real issue - the test HTTP server was panicking and closing connections. The panic occurred at validators.go:960 when dereferencing a nil *semver.Version pointer. This pointer was nil because:

  1. Tests lacked serviceProviderCluster fixtures entirely, OR
  2. Tests had the fixture but loaded it AFTER attempting nodepool creation (due to alphabetical step ordering)

Changes Explained

1. Added Missing Test Fixtures

What to review: New 03-loadCosmos-serviceProviderCluster/serviceProviderCluster.json files across multiple test scenarios.

Why this approach: Each test scenario needs cluster version data that matches its test cluster. The fixture structure mirrors what the backend creates when clusters are created via HTTP API - it contains:

  • active_versions array with semantic version strings (e.g., "4.20.0")
  • desired_version in the spec

2. Fixed Test Step Execution Order

What to review: Directory renames from 03-httpCreate-* to 04-httpCreate-* (or 05- in some cases).

Why this matters: The test framework processes steps in lexicographical order. When both 03-httpCreate-nodepool and 03-loadCosmos-serviceProviderCluster existed, the former executed first because 'h' < 'l' in ASCII. This meant nodepool validation ran before the required cluster version data was loaded.

Solution rationale: Rather than changing the test framework's ordering logic (which would be risky), I renumbered directories to enforce correct execution order: load serviceProviderCluster at step 03, then create nodepool at step 04+.

Tests with multiple nodepools: Some tests (like create-current) have multiple 04-httpCreate-* steps. This is intentional and safe - as long as they all run AFTER the 03-loadCosmos step, the alphabetical ordering within step 04 doesn't matter.

3. Programmatic Fixture Creation in version_compliance_test.go

What to review: New createServiceProviderClusterForVersionCompliance helper function.

Why this pattern: The version_compliance_test.go creates clusters programmatically rather than from fixtures. To maintain consistency with its testing approach, it also creates serviceProviderCluster documents programmatically rather than loading from fixture files.

Alternative considered: I could have made this test load from fixture files like FrontendCRUD tests do, but that would break the test's design pattern of being fully self-contained and not relying on external JSON files.

Testing & Verification

Manual verification performed:

  1. Ran full make test-integration - all tests pass
  2. Specifically verified previously failing tests now pass: missing-required-fields, invalid-version, create-current, delete-not-last (these were failing for the modified code for skew validation PR feat: add node pool vs control plane version skew validations during nodepool creation #4566 )

Rollout Considerations

No production impact: These are test-only changes. No business logic modified, no API behavior changed.

Review Focus Areas

  1. Step numbering consistency: Confirm all nodepool creation steps now occur after serviceProviderCluster load
  2. No behavioral changes: Confirm no changes to validation logic itself - only test infrastructure

Copilot AI review requested due to automatic review settings June 12, 2026 14:44
@openshift-ci openshift-ci Bot requested review from janboll and stevekuznetsov June 12, 2026 14:44
@openshift-ci

openshift-ci Bot commented Jun 12, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: cgiradkar
Once this PR has been reviewed and has the lgtm label, please assign deads2k for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

This comment was marked as outdated.

Added missing serviceProviderCluster test fixtures required for nodepool
version validation and corrected test step execution order to load these
fixtures before nodepool creation attempts.

AI assisted code

Signed-off-by: Chetan Giradkar <cgiradka@redhat.com>
Copilot AI review requested due to automatic review settings June 15, 2026 08:51
@cgiradkar cgiradkar force-pushed the svc-provider-cluster-fix branch from 7235a9b to 314ba71 Compare June 15, 2026 08:51

This comment was marked as spam.

ServiceProviderCluster documents must use deterministic UUIDs as document
IDs, generated by arm.ResourceIDStringToCosmosID(resourceID), not
subscription IDs or legacy pipe-escaped resource paths.

Why these values needed to be changed:

1. Document ID consistency: Cosmos DB lookups in production code use
   arm.ResourceIDStringToCosmosID() to generate document IDs. Test fixtures
   with incorrect IDs (subscription IDs, pipe-escaped strings, or random
   UUIDs) create documents that real CRUD operations cannot find, causing
   version validation and other lookups to fail.

2. Storage layer compatibility: The deterministic UUID scheme ensures
   the same resource ID always maps to the same document ID across all code
   paths (test helpers, production storage layer, backend controllers).
   Using wrong IDs breaks this contract.

3. Test helper portability: Test helpers were using concrete
   CosmosIntegrationTestInfo type assertions, which prevented them from
   working with mock storage implementations. Switching to the
   StorageIntegrationTestInfo interface allows tests to run with both real
   Cosmos and mock databases.

Changes:
- Fixed ServiceProviderCluster fixture files with wrong document IDs
- Fixed subscription ID mismatches
- Fixed cluster name mismatches
- Updated test helpers to use interface LoadContent() instead of concrete types
- Added resourceGroupName parameter to avoid hard-coded values

AI assisted code

Signed-off-by: Chetan Giradkar <cgiradka@redhat.com>
@openshift-ci

openshift-ci Bot commented Jun 15, 2026

Copy link
Copy Markdown

@cgiradkar: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-parallel 839eb07 link true /test e2e-parallel

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants