-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Labels
acknowledgedThis issue has been acknowledgedThis issue has been acknowledgedtechnicalThis is a technical pointThis is a technical pointuserThis is an end-user issueThis is an end-user issue
Description
Reporting on behalf of Alex.
It seems that extended usage of the cluster fills up the root partition:
Hail version: 0.2.18-08ec699f0fd4
Error summary: DiskErrorException: No space available in any of the local directories.
You could build a new base image on a larger base instance, but once deployed, the increased root partition size will shrink the size of the tmpfs used by spark to perform its computation
There are some temporary fixes that could help us buying time:
- Investigate why the Ansible
cleanup-baserole in ansible does/did not clean up the download directory/opt/sanger.ac.uk/hgi/download - Replace anaconda with miniconda in the Ansible
anaconda-baserole. Anaconda has 5GBpkgsdirectory with more than 400 packages, most of which are never going to be used.
Long term solutions would be:
- Ship the logs away (Ansible
commonrole), however this might not address the sparkworkdirectory - Encourage the user to the "time-limited" usage model where they tear down the cluster every time the do not use it. This would clean up anything.
Metadata
Metadata
Assignees
Labels
acknowledgedThis issue has been acknowledgedThis issue has been acknowledgedtechnicalThis is a technical pointThis is a technical pointuserThis is an end-user issueThis is an end-user issue