Skip to content

Extended cluster usage fills up the root partition #7

@lucadevitis

Description

@lucadevitis

Reporting on behalf of Alex.

It seems that extended usage of the cluster fills up the root partition:

Hail version: 0.2.18-08ec699f0fd4
Error summary: DiskErrorException: No space available in any of the local directories.

You could build a new base image on a larger base instance, but once deployed, the increased root partition size will shrink the size of the tmpfs used by spark to perform its computation

There are some temporary fixes that could help us buying time:

  1. Investigate why the Ansible cleanup-base role in ansible does/did not clean up the download directory /opt/sanger.ac.uk/hgi/download
  2. Replace anaconda with miniconda in the Ansible anaconda-base role. Anaconda has 5GB pkgs directory with more than 400 packages, most of which are never going to be used.

Long term solutions would be:

  1. Ship the logs away (Ansible common role), however this might not address the spark work directory
  2. Encourage the user to the "time-limited" usage model where they tear down the cluster every time the do not use it. This would clean up anything.

Metadata

Metadata

Assignees

No one assigned

    Labels

    acknowledgedThis issue has been acknowledgedtechnicalThis is a technical pointuserThis is an end-user issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions