feat: store extra_metadata as top-level zattrs keys (alternative to #422)#424
feat: store extra_metadata as top-level zattrs keys (alternative to #422)#424ieivanov wants to merge 7 commits into
Conversation
process_single_position previously nested all provenance under a single "extra_metadata" key and overwrote it on each step. Store each entry as a top-level zattrs key instead, so successive steps (flat-field, deskew, ...) accumulate sibling keys (e.g. biahub-flat_field, biahub-deskew) without clobbering one another. create_empty_plate's metadata_sources copy already carries top-level custom keys forward. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Shorten the docstring example and emit a UserWarning before overwriting a pre-existing top-level zattrs key, so clobbering upstream provenance (e.g. metadata copied by create_empty_plate's metadata_sources) is not silent. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@aofei-liu let me know what you think of this implementation as an alternative to #422 |
|
@srivarra could we get a review from you? @edyoshikun @talonchandler do you see any issues with changing where the "extra metadata" is stored? |
There was a problem hiding this comment.
Pull request overview
This PR changes how process_single_position persists per-step provenance metadata so that multi-step pipelines accumulate metadata without clobbering prior steps. Instead of writing a single extra_metadata zattrs key (which can be overwritten on subsequent steps), each entry in the provided extra_metadata dict is stored as its own top-level zattrs key on the output position.
Changes:
- Update
process_single_positionto write eachextra_metadataentry as a separate top-level zattrs key and warn on key collisions. - Adjust
create_empty_platedocumentation to reflect typical “provenance key” usage patterns for copied custom zattrs. - Update NGFF utility tests to assert the new top-level zattrs behavior for
extra_metadata.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/iohub/ngff/utils.py |
Flattens extra_metadata dict entries into top-level position zattrs keys and adds overwrite warnings; updates docstrings accordingly. |
tests/ngff/test_ngff_utils.py |
Updates assertions to check extra_metadata entries as top-level zattrs keys. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Since extra_metadata entries are written as top-level zattrs keys, a key colliding with a reserved OME-Zarr key (ome, multiscales, omero, labels, version) would corrupt the position metadata. Raise ValueError instead. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Address PR #424 review: - Require extra_metadata to be a Mapping (TypeError otherwise) and check 'is not None' rather than truthiness, so invalid falsy/non-mapping values fail fast instead of raising a downstream AttributeError. - Reject non-string keys with a TypeError. - Tests: assert the legacy 'extra_metadata' wrapper key is not written, and add regression tests for the overwrite warning and invalid-type rejection. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Example {
"attributes": {
"ome": {
"multiscales": [
{
"version": "0.5",
"axes": [
{
"name": "T",
"type": "time",
"unit": "second"
},
{
"name": "C",
"type": "channel"
},
{
"name": "Z",
"type": "space",
"unit": "micrometer"
},
{
"name": "Y",
"type": "space",
"unit": "micrometer"
},
{
"name": "X",
"type": "space",
"unit": "micrometer"
}
],
"datasets": [
{
"path": "0",
"coordinateTransformations": [
{
"type": "scale",
"scale": [
1.0,
1.0,
0.16994999999999996,
0.1133,
0.1133
]
}
]
}
],
"name": "0"
}
],
"omero": {
"version": "0.5",
"id": 0,
"name": "fov0",
"channels": [
{
"active": true,
"coefficient": 1.0,
"color": "FFFFFF",
"family": "linear",
"inverted": false,
"label": "BF - Oblique",
"window": {
"start": 0.0,
"end": 65535.0,
"min": 0.0,
"max": 65535.0
}
},
{
"active": false,
"coefficient": 1.0,
"color": "00FF00",
"family": "linear",
"inverted": false,
"label": "GFP EX488 EM525-45",
"window": {
"start": 0.0,
"end": 65535.0,
"min": 0.0,
"max": 65535.0
}
},
{
"active": false,
"coefficient": 1.0,
"color": "FF00FF",
"family": "linear",
"inverted": false,
"label": "mCherry EX561 EM600-37",
"window": {
"start": 0.0,
"end": 65535.0,
"min": 0.0,
"max": 65535.0
}
}
],
"rdefs": {
"defaultT": 0,
"defaultZ": 0,
"model": "color",
"projection": "normal"
}
},
"version": "0.5"
},
"biahub-flat_field": {
"channel_names": [
"BF - Oblique"
],
"output_ome_zarr_version": null
},
"biahub-deskew": {
"pixel_size_um": 0.1133,
"ls_angle_deg": 30.0,
"px_to_scan_ratio": 0.755,
"scan_step_um": 0.15,
"keep_overhang": false,
"overhang_fill": 0,
"average_n_slices": 3,
"device": "cpu",
"output_ome_zarr_version": null
}
},
"zarr_format": 3,
"node_type": "group"
} |
|
@edyoshikun @talonchandler do you have any concerns about this PR? @gav-sturm, do you guys use metadata stored under the |
|
No concerns from me. FWIW WaveOrder currently saves metadata under a top-level We can change that behavior if needed. |
That's good! It matches the pattern I've proposed in this PR. Here is an example zarr.json file following the usage pattern in this PR. You can see that {
"attributes": {
"ome": {
"multiscales": [
{
"version": "0.5",
"axes": [
{
"name": "T",
"type": "time",
"unit": "second"
},
{
"name": "C",
"type": "channel"
},
{
"name": "Z",
"type": "space",
"unit": "micrometer"
},
{
"name": "Y",
"type": "space",
"unit": "micrometer"
},
{
"name": "X",
"type": "space",
"unit": "micrometer"
}
],
"datasets": [
{
"path": "0",
"coordinateTransformations": [
{
"type": "scale",
"scale": [
1.0,
1.0,
0.16994999999999996,
0.1133,
0.1133
]
}
]
}
],
"name": "0"
}
],
"omero": {
"version": "0.5",
"id": 0,
"name": "fov0",
"channels": [
{
"active": true,
"coefficient": 1.0,
"color": "FFFFFF",
"family": "linear",
"inverted": false,
"label": "Phase3D",
"window": {
"start": -0.2,
"end": 0.2,
"min": -10.0,
"max": 10.0
}
}
],
"rdefs": {
"defaultT": 0,
"defaultZ": 0,
"model": "color",
"projection": "normal"
}
},
"version": "0.5"
},
"biahub-flat_field": {
"channel_names": [
"BF - Oblique"
],
"output_ome_zarr_version": null
},
"biahub-deskew": {
"pixel_size_um": 0.1133,
"ls_angle_deg": 30.0,
"px_to_scan_ratio": 0.755,
"scan_step_um": 0.15,
"keep_overhang": false,
"overhang_fill": 0,
"average_n_slices": 3,
"device": "cpu",
"output_ome_zarr_version": null
},
"waveorder": {
"Phase3D": {
"input_channel_names": [
"BF - Oblique"
],
"time_indices": "all",
"reconstruction_dimension": 3,
"device": null,
"birefringence": null,
"phase": {
"transfer_function": {
"wavelength_illumination": 0.532,
"yx_pixel_size": 0.1133,
"z_pixel_size": 0.17,
"z_padding": 5,
"z_focus_offset": 0,
"index_of_refraction_media": 1.4,
"numerical_aperture_detection": 1.35,
"tilt_angle_zenith": 0.0,
"tilt_angle_azimuth": 0.0,
"numerical_aperture_illumination": 0.52,
"invert_phase_contrast": false
},
"apply_inverse": {
"reconstruction_algorithm": "Tikhonov",
"regularization_strength": 0.01,
"TV_rho_strength": 0.001,
"TV_iterations": 1
}
},
"fluorescence": null,
"optimization": null
}
}
},
"zarr_format": 3,
"node_type": "group"
} |
Summary
An alternative implementation to #422. Both PRs address the same root problem:
process_single_positionoverwrites the output position'sextra_metadatazattr, so in multi-step pipelines each step clobbers the previous step's provenance (and metadata copied bycreate_empty_plate'smetadata_sources, added in #419).Where #422 keeps the single
extra_metadatakey and merges dicts into it, this PR removes the shared wrapper key entirely:extra_metadatadict is written as a separate top-level key on the position's zattrs (e.g.biahub-flat_field,biahub-deskew) instead of being nested under oneextra_metadatakey.create_empty_plate'smetadata_sourcescopy already carries top-level custom (non-OME) keys forward, so upstream provenance is inherited without changes there.UserWarningis raised rather than silently clobbering (genuine re-processing or a key collision).Trade-offs vs #422
extra_metadatakey; provenance reads as flat, self-describing keys."<package>-<step>"convention, e.g.biahub-deskew) to avoid collisions. OME-spec keys are never touched.Context
Same cascade as #422 — wiring
metadata_sourcesthrough the biahub PR cascade. With this approach the biahub side needs no read-back workaround; steps just passextra_metadata={"biahub-<step>": settings.model_dump()}and the keys land side by side.Test plan
tests/ngff/test_ngff_utils.py— updated theprocess_single_positionassertion to check each entry as a top-level key; full file passes (30 passed).metadata_sourcestests unchanged and passing (they exercisecreate_empty_plate's generic top-level copy).biahub-flat_field; deskew output carries bothbiahub-flat_field(inherited) andbiahub-deskew; overwrite warning fires only on key collision.🤖 Generated with Claude Code