Skip to content

CI: Refactor test pipelines to cover bare VM and container environments #1369

@leofang

Description

@leofang

Using #1311 as the playground, as of commit 4011bb8 and CI logs at https://github.com/NVIDIA/cuda-python/actions/runs/19951181011 I verified that nv-gha-runners no longer makes containers as a hard requirement for running jobs on GPU runners. We can now run GPU jobs just fine on the bare, ephemeral VM. This would help us accelerate job start time.

The current test blocker is #1307. We recently added xfail to tests we did not think runnable in the CI. But those tests did run in the bare VM setup, and turned xfail to xpass (hence failing, because we set the strict mode). This can be easily fixed.

In the internal discussion we concluded that we don't need to test against a set of containers. But it is nice to test both container and containerless (i.e. bare VM) environments. We currently have two test workflows:

  • test-wheel-linux.yml: Needed because Linux runners required a container (no longer needed)
  • test-wheel-windows.yml: Needed because Windows runners do not require any container

I suggest we rename and re-purposes the two workflows as follows:

  • test-wheel-container.yml: This runs all existing Linux tests
  • test-wheel-containerless.yml: This runs all existing Linux + Windows tests
    • Piggybacking on this refactoring we can probably get rid of the Powershell usage in the workflow, since our CI relies heavily on Bash and Git Bash.

Metadata

Metadata

Assignees

No one assigned

    Labels

    CI/CDCI/CD infrastructureP0High priority - Must do!blockedThis task is currently blocked by other tasksfeatureNew feature or request

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions