Skip to content

Validate and tighten macOS dependency cache gating in CI#539

Merged
fedorov merged 3 commits into
masterfrom
copilot/fix-dependency-cache-issue
May 29, 2026
Merged

Validate and tighten macOS dependency cache gating in CI#539
fedorov merged 3 commits into
masterfrom
copilot/fix-dependency-cache-issue

Conversation

Copilot AI commented May 29, 2026

Copy link
Copy Markdown
Contributor

The macOS CI job could take the cached dependency path even when restored cache contents were incomplete/corrupt, leading to missing ITKConfig.cmake and configure failure. This change makes cached-path selection contingent on cache usability, not cache-hit alone.

  • Cache robustness

    • Added Validate restored cache step that emits cache_ok=true|false after checking:
      • dcmqi-build/ITK-build/ITKConfig.cmake
      • dcmqi-build/DCMTK-build/DCMTKConfig.cmake
      • dcmqi-build/zlib-install/include
      • dcmqi-build/zlib-install/lib
  • Configure-path gating

    • Cached configure now runs only when:
      • cache-hit == 'true' and
      • cache_ok == 'true'
    • Non-cached/superbuild configure now runs when:
      • cache miss or
      • cache validation fails
  • Cache key hardening

    • Bumped cache namespace from cache-dcmtk-itk-zlib to cache-dcmtk-itk-zlib-v2
    • Removed broad restore-keys fallback to avoid reusing arbitrary older cache entries
- name: "Configure dcmqi w/ cached DCMTK/ITK/ZLIB"
  if: ${{ steps.cache-dcmtk-itk-zlib.outputs.cache-hit == 'true' && steps.validate-cache.outputs.cache_ok == 'true' }}

- name: "Configure dcmqi w/o cached DCMTK/ITK/ZLIB"
  if: ${{ steps.cache-dcmtk-itk-zlib.outputs.cache-hit != 'true' || steps.validate-cache.outputs.cache_ok != 'true' }}
Original prompt

The macOS CI workflow has a failing job in repository QIICR/dcmqi caused by the dependency cache being treated as valid when the restored cache contents are incomplete or corrupt.

Context:

  • Failing workflow run: C/C++ CI macOS
  • Workflow file: .github/workflows/cmake-macos.yml
  • Failing job URL: https://github.com/QIICR/dcmqi/actions/runs/26633738915/job/78517899566
  • Commit analyzed: 495da82985f89a039fd8816ed23b83396c5be809

Observed failure:

  • The workflow restores cached DCMTK, ITK, and zlib directories and then configures dcmqi using the cached dependency path.
  • The job log shows WARNING: Cache entry deserialization failed, entry ignored and later CMake fails inside SlicerExecutionModel because ITKConfig.cmake cannot be found.
  • Root CMakeLists.txt requires find_package(ITK NO_MODULE REQUIRED) and find_package(DCMTK NO_MODULE REQUIRED), so the cached path is only valid if the restored build directories actually contain those package config files.

Please create a pull request that updates .github/workflows/cmake-macos.yml to make the cache usage robust.

Required changes:

  1. Add a validation step after cache restore/reporting that verifies the restored cache is actually usable before taking the cached configure path.
    • Check for ${{ github.workspace }}/dcmqi-build/ITK-build/ITKConfig.cmake
    • Check for ${{ github.workspace }}/dcmqi-build/DCMTK-build/DCMTKConfig.cmake
    • Check that ${{ github.workspace }}/dcmqi-build/zlib-install/include and ${{ github.workspace }}/dcmqi-build/zlib-install/lib exist
    • Expose an output like cache_ok=true|false
  2. Change the Configure dcmqi w/ cached DCMTK/ITK/ZLIB step so it only runs when both:
    • steps.cache-dcmtk-itk-zlib.outputs.cache-hit == 'true'
    • cache validation output indicates the cache is usable
  3. Change the Configure dcmqi w/o cached DCMTK/ITK/ZLIB step so it runs when either:
    • cache was missed, or
    • cache validation failed
  4. Tighten cache reuse to reduce reuse of stale/incompatible caches:
    • bump the cache namespace from cache-dcmtk-itk-zlib to something versioned like cache-dcmtk-itk-zlib-v2
    • remove the broad restore-keys fallback, or otherwise avoid restoring arbitrary older caches from the same prefix
  5. Keep the change focused on fixing the failing macOS CI cache behavior. Avoid unrelated refactors.

Please make the PR title concise and descriptive, explaining that it validates and tightens the macOS dependency cache in CI.

The following is the prior conversation context from the user's chat exploration (may be truncated):

User: Please find a solution for this failing job. Use the logs, job definition, and any referenced files where the failure occurred. Keep your response focused on the solution and include code suggestions when appropriate.

Assistant: The failure is caused by an invalid cache restore that reports a cache hit but does not restore a usable ITK package layout.

What failed

The macOS workflow takes the cached-deps path when cache-hit == 'true' and configures dcmqi with:

  • -DDCMTK_DIR=${{ github.workspace }}/dcmqi-build/DCMTK-build
  • -DITK_DIR=${{ github.workspace }}/dcmqi-build/ITK-build
  • -DZLIB_ROOT=${{ github.workspace }}/dcmqi-build/zlib-install

But during the build, SlicerExecutionModel fails in CMake with:

  • By not providing "FindITK.cmake"... asked CMake to find a package configuration file provided by "ITK"
  • missing ITKConfig.cmake / itk-config.cmake

That means the restored cache did not contain a valid ITK-build tree, even though the workflow treated it as usable. The log also shows:

  • WARNING: Cache entry deserialization failed, entry ignored

So the job is effectively using a broken/partial cache state.

Relevant files:

Why this happens

In CMakeLists.txt, non-superbuild configuration requires a valid ITK package:

  • find_package(ITK NO_MODULE REQUIRED) at CMakeLists.txt:276

The cached workflow path assumes ${{ github.workspace }}/dcmqi-build/ITK-build contains ITKConfig.cmake, but the cache restore warning indicates that assumption is false for this run.

Solution

Make the workflow validate the cache contents before taking the cached-deps path. If ITKConfig.cmake or DCMTKConfig.cmake is missing, force the full superbuild instead of using the cache.

Recommended workflow change

In .github/workflows/cmake-macos.yml, replace the current cache-hit-only gating with a validation step.

Add a validation step after Report cache status

    - name: Validate cached dependency config files
      id: validate-cache
      run: |
        CACHE_OK=true

        if [ ! -f "${{ github.workspace }}/dcmqi-build/ITK-build/ITKConfig.cmake" ]; t...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

*This pull request was created from Copilot chat.*
>

Copilot AI changed the title [WIP] Fix invalid dependency cache handling in macOS CI Validate and tighten macOS dependency cache gating in CI May 29, 2026
Copilot AI requested a review from fedorov May 29, 2026 14:33
@fedorov fedorov marked this pull request as ready for review May 29, 2026 15:55

@fedorov fedorov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

The fix is correct and directly addresses the root cause. Logic is sound across all cache-hit/cache_ok combinations — approve with minor suggestions.

What this gets right

  • CACHE_OK=false default ensures cache_ok is always well-defined even on cache miss, so both configure conditions evaluate correctly.
  • Uses >> "$GITHUB_OUTPUT" (current non-deprecated syntax).
  • Validation step runs unconditionally, which is necessary since both configure steps depend on its output.
  • Logging on validation failure is clear and actionable.

Suggestions

1. Stale comment — The comment above the key: line still references the old cache name pattern:

# macOS-<arch>-build-cache-dcmtk-itk-<hash>

Should read cache-dcmtk-itk-zlib-v2 to match the new env var.

2. Use $GITHUB_WORKSPACE instead of template substitution in shell — Using ${{ github.workspace }} inside run: works but $GITHUB_WORKSPACE is the idiomatic form and avoids template expansion inside scripts:

if [ ! -f "$GITHUB_WORKSPACE/dcmqi-build/ITK-build/ITKConfig.cmake" ]; then

3. restore-keys removal tradeoff — Removing the fallback means any PR touching CMakeExternals/DCMTK.cmake, ITK.cmake, or zlib.cmake gets zero cache reuse and triggers a full superbuild. If build time is a concern, a version-scoped restore-key would still benefit from partial matches while keeping the namespace tight:

restore-keys: |
  ${{ runner.os }}-${{ matrix.arch }}-build-${{ env.cache-name }}-

(with cache-name: cache-dcmtk-itk-zlib-v2 this is now safe since v2 entries will always be valid builds — the validation step handles any edge cases). That said, the conservative no-fallback approach is fine given the corruption history.

4. Cache bust on first run — Bumping to -v2 will cold-start all caches on merge. Expected and unavoidable; just worth knowing the first CI run after merge will be slow.

Overall: this is a focused, correct fix. Happy to see it merged.

@fedorov

fedorov commented May 29, 2026

Copy link
Copy Markdown
Member

Review

The fix is correct and directly addresses the root cause. Logic is sound across all cache-hit/cache_ok combinations.

What this gets right

  • CACHE_OK=false default ensures cache_ok is always well-defined even on cache miss, so both configure conditions evaluate correctly.
  • Uses >> "$GITHUB_OUTPUT" (current non-deprecated syntax).
  • Validation step runs unconditionally, which is necessary since both configure steps depend on its output.
  • Logging on validation failure is clear and actionable.

Suggestions

1. Stale comment — The comment above the key: line still references the old cache name pattern:

# macOS-<arch>-build-cache-dcmtk-itk-<hash>

Should read cache-dcmtk-itk-zlib-v2 to match the new env var.

2. Use $GITHUB_WORKSPACE instead of template substitution in shell — Using ${{ github.workspace }} inside run: works but $GITHUB_WORKSPACE is the idiomatic form and avoids template expansion inside scripts:

if [ ! -f "$GITHUB_WORKSPACE/dcmqi-build/ITK-build/ITKConfig.cmake" ]; then

3. restore-keys removal tradeoff — Removing the fallback means any PR touching CMakeExternals/DCMTK.cmake, ITK.cmake, or zlib.cmake gets zero cache reuse and triggers a full superbuild. If build time is a concern, a version-scoped restore-key would still benefit from partial matches while staying within the safe v2 namespace:

restore-keys: |
  ${{ runner.os }}-${{ matrix.arch }}-build-${{ env.cache-name }}-

The validation step would catch any bad entries anyway. That said, the conservative no-fallback approach is fine given the corruption history.

4. Cold-cache on first run — Bumping to -v2 will invalidate all existing caches on merge. Expected and unavoidable; the first CI run after merge will be slow.

Overall: focused, correct fix. Happy to see it merged.

Copilot AI requested a review from fedorov May 29, 2026 16:06
@sonarqubecloud

Copy link
Copy Markdown

@fedorov fedorov merged commit cf08f32 into master May 29, 2026
9 checks passed
@fedorov fedorov deleted the copilot/fix-dependency-cache-issue branch May 29, 2026 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants