Skip to content

IO Error: 'too many open files' when removing many corrupted runs #3224

@Engrammae

Description

@Engrammae

🐛 Bug: Removal of many corrupted runs in one go

I ran a larger experiment tracking a lot of runs and apparently I had quite a few corrupted runs (in my case 539).

I tried removing them by calling aim runs rm --corrupted, but got an error "IO too many open files".
I still could remove single corrupted runs with aim runs rm ${hash}.
I tried increasing the limit with ulimit -n up to 2048, but too no effect

To reproduce

Somehow get a lot of corrupted runs and try to remove them at once with aim runs rm --corrupted

Expected behavior

A removal of runs that respects the limit of open files, so that aim runs rm --corrupted also works, if there are many corrupted runs.

Environment

  • Aim v3.24.0
  • Python 3.11.6
  • pip 24.0
  • OS Ubuntu 22.04.4 LTS

Additional context

As a workaround I wrote a short bash-script to remove corrupted runs one by one, but this still quite cumbersome.

#! /bin/bash

aim runs ls --corrupted | head -n 1  | sed 's/\t/\n/g' > corrupted_runs

while  read -r run;
do
    echo "Removing corrupted run: ${run}"
    aim runs rm ${run} -y
done <  corrupted_runs

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is neededtype / bugIssue type: something isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions