Skip to content

Changing the 1000 glaciers tarfiles to 100 glaciers tar files or 1 glacier tar file? #1902

@fmaussion

Description

@fmaussion

Related to #1896

Many, many years ago I took a few hasty decisions that have stuck until now.

Prehistoric glacier directories were stored directly in RGI region folders, which made linux very slow when navigating the directory tree of regions with well over tens of thousands of glaciers. Therefore I chose to have sub-folders of 1000 glaciers. This is ok and could be changed to 100, or a more incremental tree (e.g. RGIREG/1000/100/10) easily without much change in functionality. The only reason this exists is because many files in a folder is slow.

This choice of 1000 glaciers stuck when we started offering the gdirs for download. We create zipped tars per glaciers and then tar them again in 1000 batches. The decision process here was mostly driven by unsubstantiated worries about having "not too many files to download and unpack". This was a time where we had hash checks on files for example, to discover failed downloads. Increasing the number of files to download felt like increasing the chances of failed/partial downloads.

We should now revisit this choice. 1000 glaciers makes lots of sense for global runs, but exploratory runs or tutorials where you need only one glacier are clearly disadvantaged by this choice.

It would be a very low hanging fruit to change this in a backwards compatible manner (OGGM could discover by itself if its a "1000" or "100"). But we need a decision on the "what is a right number" there. The most obvious answer is "1", but then a global run user would download 270k files multiplied by the level layers implemented in #1900 .

To discuss, but easily implementable (cc @gampnico @pat-schmitt )

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions