Skip to content

Conversation

@ykdojo
Copy link
Contributor

@ykdojo ykdojo commented Nov 18, 2025

Changes Made

Add disk cleanup step to integration-test-io-credentialed job to prevent disk space failures.

Related Issues

Fixes the MinIO storage full errors causing integration test failures on main (commits cce8b5a, def3836, and 88ea033).

Related to the broader disk space issue affecting CI jobs:

Test Plan

CI will run with the disk cleanup step, preventing the XMinioStorageFull errors that were occurring.

Verified in test PR #5611 where both integration-test-io-credentialed (3.10, native) and integration-test-io-credentialed (3.10, ray) passed successfully.

The integration-test-io-credentialed job was failing due to disk space
issues in the CI environment. The MinIO storage backend was running out
of space, causing tests to fail with XMinioStorageFull errors.

This adds the "Free Disk Space" step to the integration-test-io-credentialed
job, matching the disk cleanup already present in other integration test jobs
like integration-test-io, integration-test-catalogs, and unit-test.
@codecov
Copy link

codecov bot commented Nov 18, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.35%. Comparing base (def3836) to head (45f6577).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5610      +/-   ##
==========================================
- Coverage   70.37%   70.35%   -0.03%     
==========================================
  Files        1012     1012              
  Lines      130614   130612       -2     
==========================================
- Hits        91918    91887      -31     
- Misses      38696    38725      +29     

see 6 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ykdojo ykdojo marked this pull request as ready for review November 18, 2025 02:12
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Nov 18, 2025

Greptile Summary

  • Added disk cleanup step to integration-test-io-credentialed job to prevent MinIO XMinioStorageFull errors
  • Uses the same jlumbroso/free-disk-space@main action configuration as other jobs in the workflow

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The change adds a standard disk cleanup step that is already used successfully in 5 other jobs in the same workflow file. The configuration is identical to other jobs and addresses a real production issue (MinIO storage full errors). The fix has been verified in test PR test: force integration-test-io-credentialed to verify disk cleanup fix #5611.
  • No files require special attention

Important Files Changed

Filename Overview
.github/workflows/pr-test-suite.yml Added disk cleanup step to integration-test-io-credentialed job to prevent MinIO storage full errors, matching the pattern used in other jobs

Sequence Diagram

sequenceDiagram
    participant GHA as "GitHub Actions"
    participant Runner as "Ubuntu Runner"
    participant Cleanup as "Free Disk Space"
    participant Docker as "Docker Compose"
    participant MinIO as "MinIO Service"
    participant Tests as "Integration Tests"
    
    GHA->>Runner: Start integration-test-io-credentialed job
    Runner->>Cleanup: Execute jlumbroso/free-disk-space action
    Cleanup->>Runner: Remove Android, Haskell, swap storage
    Runner->>Runner: Checkout code and setup environment
    Runner->>Docker: Spin up IO services
    Docker->>MinIO: Start MinIO container with /data volume
    Runner->>Tests: Run pytest integration tests
    Tests->>MinIO: Write test data during execution
    Tests->>Runner: Complete test execution
    Runner->>Docker: Tear down services
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format

Copy link
Contributor

@samstokes samstokes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ykdojo ykdojo merged commit 5343e9b into main Nov 18, 2025
42 checks passed
@ykdojo ykdojo deleted the fix/add-disk-cleanup-to-integration-tests branch November 18, 2025 02:20
ykdojo added a commit that referenced this pull request Dec 2, 2025
## Changes Made

Add disk cleanup step to `integration-test-ai` job in the PR test suite
workflow to prevent disk space failures when installing heavy AI
dependencies.

## Related Issues

Fixes the "no space left on device" errors causing `integration-test-ai`
failures starting December 2:
- Error: `No space left on device:
'/home/runner/actions-runner/cached/_diag/Worker_...'`

Related to the broader disk space issue affecting CI jobs:
- #5589
- #5609
- #5610
- #5711

## Root Cause

GitHub Actions runners have limited disk space. The combination of:
1. Large Python dependencies (torch, tensorflow, vllm ~2GB+)
2. Rust compilation artifacts

...exceeds available disk space, causing the runner worker process to
crash.

Disk space became borderline after GitHub updated the `ubuntu-latest`
runner image from `ubuntu24/20251102.99` to `ubuntu24/20251112.124` in
November (see #5711). The `integration-test-ai` job failures are
intermittent due to borderline disk space - sometimes there's just
enough, sometimes not.

## Timeline Evidence

| Date | Run | Runner Image | Branch | Commit | integration-test-ai
Result |

|------|-----|--------------|--------|--------|---------------------------|
| Dec 2, 04:14 |
[19846881482](https://github.com/Eventual-Inc/Daft/actions/runs/19846881482)
| `20251112.124.1` | main | `37623e1` | ✅ Success |
| Dec 2, 04:58 |
[19847729052](https://github.com/Eventual-Inc/Daft/actions/runs/19847729052)
| `20251112.124.1` | main | `699e213` | ✅ Success |
| Dec 2, 07:43 |
[19851013864](https://github.com/Eventual-Inc/Daft/actions/runs/19851013864)
| `20251112.124.1` | dataframe-hashing-function | `7b01545` | ✅ Success
|
| Dec 2, 12:50 |
[19859197086](https://github.com/Eventual-Inc/Daft/actions/runs/19859197086)
| `20251112.124.1` | zhenchao-lance-stats | `e96d6cd` | ✅ Success |
| Dec 2, 17:26 |
[19867603566](https://github.com/Eventual-Inc/Daft/actions/runs/19867603566)
| `20251112.124.1` | slade/daft-arrow-crate | `00d2eba` | ✅ Success |
| **Dec 2, 18:33** |
[19869482423](https://github.com/Eventual-Inc/Daft/actions/runs/19869482423)
| `20251112.124.1` | main | `15adbc3` | ❌ **First Failure** |
| **Dec 2, 18:42** |
[19869705631](https://github.com/Eventual-Inc/Daft/actions/runs/19869705631)
| `20251112.124.1` | fix/macos-timeout (PR #5731) | `eb69f98` | ❌
Failure |

## Test Plan

- [ ] Verify `integration-test-ai` jobs pass on this PR

## Internal

Closes EVE-1300
ykdojo added a commit that referenced this pull request Dec 2, 2025
## Changes Made

Add disk cleanup step to `integration-test-io` job in the nightly
workflow to prevent disk space failures when pulling Docker images
(especially the large `google/cloud-sdk` image for the bigtable
emulator).

## Related Issues

Fixes the "no space left on device" errors causing intermittent nightly
integration test failures since November 14:
- Error: `failed to register layer: write /usr/lib/google-cloud-sdk/...:
no space left on device`

Related to the broader disk space issue affecting CI jobs:
- #5589 - Fixed disk space in docgen workflow
- #5609 - Fixed disk space in doctests job
- #5610 - Fixed disk space in integration-test-io-credentialed job

## Root Cause

GitHub Actions runners have limited disk space. The combination of:
1. Large Python dependencies (torch, tensorflow, vllm ~2GB+)
2. The `google/cloud-sdk` Docker image for bigtable emulator (~1.3GB+)

...exceeds available disk space, causing Docker layer extraction to
fail.

This issue began when GitHub updated the `ubuntu-latest` runner image
from `ubuntu24/20251102.99` to `ubuntu24/20251112.124` on November 12,
which added .NET Core SDK 10.0.100 (~2-3GB), reducing available disk
space.

## Timeline Evidence

| Date | Nightly Run | Runner Image | integration-test-io Result |
|------|-------------|--------------|---------------------------|
| Nov 11 |
[19255638323](https://github.com/Eventual-Inc/Daft/actions/runs/19255638323)
| `20251102.99` (old) | ✅ Success |
| Nov 12 |
[19287063045](https://github.com/Eventual-Inc/Daft/actions/runs/19287063045)
| `20251102.99` (old) | ✅ Success |
| Nov 13 |
[19321208767](https://github.com/Eventual-Inc/Daft/actions/runs/19321208767)
| `20251102.99` (old) | ✅ Success |
| **Nov 14** |
[19354956886](https://github.com/Eventual-Inc/Daft/actions/runs/19354956886)
| `20251112.124` (new) | ❌ **First Failure** |
| Nov 15 |
[19384849493](https://github.com/Eventual-Inc/Daft/actions/runs/19384849493)
| `20251112.124` (new) | ❌ Failure |
| Nov 16 |
[19400729617](https://github.com/Eventual-Inc/Daft/actions/runs/19400729617)
| `20251112.124` (new) | ✅ Success |
| Nov 17 |
[19419062335](https://github.com/Eventual-Inc/Daft/actions/runs/19419062335)
| `20251112.124` (new) | ✅ Success |
| Nov 18 |
[19454879276](https://github.com/Eventual-Inc/Daft/actions/runs/19454879276)
| `20251112.124` (new) | ❌ Failure |
| Nov 19 |
[19490527917](https://github.com/Eventual-Inc/Daft/actions/runs/19490527917)
| `20251112.124` (new) | ❌ Failure |
| Nov 20 |
[19526319067](https://github.com/Eventual-Inc/Daft/actions/runs/19526319067)
| `20251112.124` (new) | ❌ Failure |
| Nov 22 |
[19590724755](https://github.com/Eventual-Inc/Daft/actions/runs/19590724755)
| `20251112.124` (new) | ✅ Success |
| Nov 23 |
[19606289653](https://github.com/Eventual-Inc/Daft/actions/runs/19606289653)
| `20251112.124` (new) | ✅ Success |
| Nov 24 |
[19624009710](https://github.com/Eventual-Inc/Daft/actions/runs/19624009710)
| `20251112.124` (new) | ✅ Success |
| Nov 25 |
[19658955974](https://github.com/Eventual-Inc/Daft/actions/runs/19658955974)
| `20251112.124` (new) | ❌ Failure |
| Nov 26 |
[19693197927](https://github.com/Eventual-Inc/Daft/actions/runs/19693197927)
| `20251112.124` (new) | ❌ Failure |
| Nov 28 |
[19754690128](https://github.com/Eventual-Inc/Daft/actions/runs/19754690128)
| `20251112.124` (new) | ❌ Failure |
| Nov 29 |
[19779342609](https://github.com/Eventual-Inc/Daft/actions/runs/19779342609)
| `20251112.124` (new) | ❌ Failure |
| Dec 1 |
[19812067599](https://github.com/Eventual-Inc/Daft/actions/runs/19812067599)
| `20251112.124` (new) | ❌ Failure |

**Note:** The failures are intermittent because disk space is borderline
- sometimes there's just enough, sometimes not.

## Testing

Verified by running the nightly workflow on two branches:

| Branch | PR | Disk Cleanup | integration-test-io Result |
|--------|-----|--------------|---------------------------|
| `fix/nightly-io-disk-cleanup` | #5711 (this PR) | ✅ Yes | ✅
[Passed](https://github.com/Eventual-Inc/Daft/actions/runs/19834011344)
|
| `verify-io-disk-failure` | #5720 | ❌ No | ❌
[Failed](https://github.com/Eventual-Inc/Daft/actions/runs/19837720310)
(65 MB free, disk space error) |

This confirms the disk cleanup step resolves the issue.

Closes #5720
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants