feat: consolidating metadata for opening zarr files in read mode#82
Conversation
cwognum
left a comment
There was a problem hiding this comment.
To optimize performance, we don't want to consolidate the meta-data everytime.
From the Zarr docs:
>>> zarr.consolidate_metadata(store)This creates a special key with a copy of all of the metadata from all of the metadata objects in the store.
The key here refers to a single file for us (i.e. by default named .zmetadata, although this can be combined with the metadata_key parameter in the consolidate methods).
I think what we would want to do is:
- Consolidate the archive locally.
- Then copy over the consolidated archive to the Hub with
zarr.convenience.copy_all. - If this doesn't work (i.e. I assume it would copy over the
.zmetadatafile, but maybe not?), then we should find out a way to copy over this single file manually.
Could you look into the above? I would be curious to know if this is possible!
Not having to consolidate everything on the Hub would make things a lot faster!
|
Now that #83 is merged, could you actually look into adding the consolidation in the flow for using Zarr datasets from the Hub: |
cwognum
left a comment
There was a problem hiding this comment.
Great work!
Let's assume that any Zarr archive that is loaded for a dataset has been consolidated. This means that we should also change these lines of code to load in consolidated mode!
cwognum
left a comment
There was a problem hiding this comment.
Almost there!
The test cases are failing though because the test Zarr archive is not consolidated! The formatting also fails right now.
Changelogs
Incorporated Zarr function that consolidates metadata when opening a Zarr group in read mode. This results in reduced number of
lscalls and increases the speed in reading from the Zarr group.Profiling without consolidation
Profiling with consolidation