Skip to content

feat: store extra_metadata as top-level zattrs keys (alternative to #422)#424

Open
ieivanov wants to merge 7 commits into
mainfrom
feat/flat-extra-metadata
Open

feat: store extra_metadata as top-level zattrs keys (alternative to #422)#424
ieivanov wants to merge 7 commits into
mainfrom
feat/flat-extra-metadata

Conversation

@ieivanov

Copy link
Copy Markdown
Contributor

Summary

An alternative implementation to #422. Both PRs address the same root problem: process_single_position overwrites the output position's extra_metadata zattr, so in multi-step pipelines each step clobbers the previous step's provenance (and metadata copied by create_empty_plate's metadata_sources, added in #419).

Where #422 keeps the single extra_metadata key and merges dicts into it, this PR removes the shared wrapper key entirely:

  • Each item of the extra_metadata dict is written as a separate top-level key on the position's zattrs (e.g. biahub-flat_field, biahub-deskew) instead of being nested under one extra_metadata key.
  • Successive steps therefore accumulate sibling provenance keys; there is no shared dict for a later step to overwrite, so the duplicate-field / clobbering problem cannot arise by construction.
  • create_empty_plate's metadata_sources copy already carries top-level custom (non-OME) keys forward, so upstream provenance is inherited without changes there.
  • If a step writes a key that already exists on the output position, it is overwritten and a UserWarning is raised rather than silently clobbering (genuine re-processing or a key collision).

Trade-offs vs #422

  • Pro: no nesting, no merge logic, no possibility of two steps fighting over one extra_metadata key; provenance reads as flat, self-describing keys.
  • Con: custom keys live at the top level of zattrs alongside OME keys, so callers should namespace them (the "<package>-<step>" convention, e.g. biahub-deskew) to avoid collisions. OME-spec keys are never touched.

Context

Same cascade as #422 — wiring metadata_sources through the biahub PR cascade. With this approach the biahub side needs no read-back workaround; steps just pass extra_metadata={"biahub-<step>": settings.model_dump()} and the keys land side by side.

Test plan

  • tests/ngff/test_ngff_utils.py — updated the process_single_position assertion to check each entry as a top-level key; full file passes (30 passed).
  • metadata_sources tests unchanged and passing (they exercise create_empty_plate's generic top-level copy).
  • Manually verified end-to-end: flat-field output carries top-level biahub-flat_field; deskew output carries both biahub-flat_field (inherited) and biahub-deskew; overwrite warning fires only on key collision.

🤖 Generated with Claude Code

ieivanov and others added 3 commits June 9, 2026 16:56
process_single_position previously nested all provenance under a single
"extra_metadata" key and overwrote it on each step. Store each entry as a
top-level zattrs key instead, so successive steps (flat-field, deskew, ...)
accumulate sibling keys (e.g. biahub-flat_field, biahub-deskew) without
clobbering one another. create_empty_plate's metadata_sources copy already
carries top-level custom keys forward.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Shorten the docstring example and emit a UserWarning before overwriting a
pre-existing top-level zattrs key, so clobbering upstream provenance (e.g.
metadata copied by create_empty_plate's metadata_sources) is not silent.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ieivanov

Copy link
Copy Markdown
Contributor Author

@aofei-liu let me know what you think of this implementation as an alternative to #422

@ieivanov

Copy link
Copy Markdown
Contributor Author

@srivarra could we get a review from you? @edyoshikun @talonchandler do you see any issues with changing where the "extra metadata" is stored?

@ieivanov ieivanov requested a review from talonchandler June 10, 2026 00:08

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR changes how process_single_position persists per-step provenance metadata so that multi-step pipelines accumulate metadata without clobbering prior steps. Instead of writing a single extra_metadata zattrs key (which can be overwritten on subsequent steps), each entry in the provided extra_metadata dict is stored as its own top-level zattrs key on the output position.

Changes:

  • Update process_single_position to write each extra_metadata entry as a separate top-level zattrs key and warn on key collisions.
  • Adjust create_empty_plate documentation to reflect typical “provenance key” usage patterns for copied custom zattrs.
  • Update NGFF utility tests to assert the new top-level zattrs behavior for extra_metadata.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/iohub/ngff/utils.py Flattens extra_metadata dict entries into top-level position zattrs keys and adds overwrite warnings; updates docstrings accordingly.
tests/ngff/test_ngff_utils.py Updates assertions to check extra_metadata entries as top-level zattrs keys.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/iohub/ngff/utils.py
Comment thread src/iohub/ngff/utils.py Outdated
Comment thread tests/ngff/test_ngff_utils.py
ieivanov and others added 2 commits June 9, 2026 17:13
Since extra_metadata entries are written as top-level zattrs keys, a key
colliding with a reserved OME-Zarr key (ome, multiscales, omero, labels,
version) would corrupt the position metadata. Raise ValueError instead.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comment thread src/iohub/ngff/utils.py
Comment thread tests/ngff/test_ngff_utils.py
ieivanov and others added 2 commits June 9, 2026 17:22
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Address PR #424 review:
- Require extra_metadata to be a Mapping (TypeError otherwise) and check
  'is not None' rather than truthiness, so invalid falsy/non-mapping values
  fail fast instead of raising a downstream AttributeError.
- Reject non-string keys with a TypeError.
- Tests: assert the legacy 'extra_metadata' wrapper key is not written, and
  add regression tests for the overwrite warning and invalid-type rejection.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ieivanov

Copy link
Copy Markdown
Contributor Author

Example zarr.json file:

{
  "attributes": {
    "ome": {
      "multiscales": [
        {
          "version": "0.5",
          "axes": [
            {
              "name": "T",
              "type": "time",
              "unit": "second"
            },
            {
              "name": "C",
              "type": "channel"
            },
            {
              "name": "Z",
              "type": "space",
              "unit": "micrometer"
            },
            {
              "name": "Y",
              "type": "space",
              "unit": "micrometer"
            },
            {
              "name": "X",
              "type": "space",
              "unit": "micrometer"
            }
          ],
          "datasets": [
            {
              "path": "0",
              "coordinateTransformations": [
                {
                  "type": "scale",
                  "scale": [
                    1.0,
                    1.0,
                    0.16994999999999996,
                    0.1133,
                    0.1133
                  ]
                }
              ]
            }
          ],
          "name": "0"
        }
      ],
      "omero": {
        "version": "0.5",
        "id": 0,
        "name": "fov0",
        "channels": [
          {
            "active": true,
            "coefficient": 1.0,
            "color": "FFFFFF",
            "family": "linear",
            "inverted": false,
            "label": "BF - Oblique",
            "window": {
              "start": 0.0,
              "end": 65535.0,
              "min": 0.0,
              "max": 65535.0
            }
          },
          {
            "active": false,
            "coefficient": 1.0,
            "color": "00FF00",
            "family": "linear",
            "inverted": false,
            "label": "GFP EX488 EM525-45",
            "window": {
              "start": 0.0,
              "end": 65535.0,
              "min": 0.0,
              "max": 65535.0
            }
          },
          {
            "active": false,
            "coefficient": 1.0,
            "color": "FF00FF",
            "family": "linear",
            "inverted": false,
            "label": "mCherry EX561 EM600-37",
            "window": {
              "start": 0.0,
              "end": 65535.0,
              "min": 0.0,
              "max": 65535.0
            }
          }
        ],
        "rdefs": {
          "defaultT": 0,
          "defaultZ": 0,
          "model": "color",
          "projection": "normal"
        }
      },
      "version": "0.5"
    },
    "biahub-flat_field": {
      "channel_names": [
        "BF - Oblique"
      ],
      "output_ome_zarr_version": null
    },
    "biahub-deskew": {
      "pixel_size_um": 0.1133,
      "ls_angle_deg": 30.0,
      "px_to_scan_ratio": 0.755,
      "scan_step_um": 0.15,
      "keep_overhang": false,
      "overhang_fill": 0,
      "average_n_slices": 3,
      "device": "cpu",
      "output_ome_zarr_version": null
    }
  },
  "zarr_format": 3,
  "node_type": "group"
}

Comment thread src/iohub/ngff/utils.py
@ieivanov

Copy link
Copy Markdown
Contributor Author

@edyoshikun @talonchandler do you have any concerns about this PR? @gav-sturm, do you guys use metadata stored under the extra_metadata key in zattrs? Is it OK if these metadata get moved one level up?

@talonchandler

Copy link
Copy Markdown
Contributor

No concerns from me.

FWIW WaveOrder currently saves metadata under a top-level waveorder key:
https://github.com/mehta-lab/waveorder/blob/22a0f72e6cdf93001a94431f7260d39ef026a819/waveorder/cli/apply_inverse_transfer_function.py#L378-L382

We can change that behavior if needed.

@ieivanov

Copy link
Copy Markdown
Contributor Author

FWIW WaveOrder currently saves metadata under a top-level waveorder key:

That's good! It matches the pattern I've proposed in this PR. Here is an example zarr.json file following the usage pattern in this PR. You can see that waveorder metadata and biahub-deskew are at the same level.

{
  "attributes": {
    "ome": {
      "multiscales": [
        {
          "version": "0.5",
          "axes": [
            {
              "name": "T",
              "type": "time",
              "unit": "second"
            },
            {
              "name": "C",
              "type": "channel"
            },
            {
              "name": "Z",
              "type": "space",
              "unit": "micrometer"
            },
            {
              "name": "Y",
              "type": "space",
              "unit": "micrometer"
            },
            {
              "name": "X",
              "type": "space",
              "unit": "micrometer"
            }
          ],
          "datasets": [
            {
              "path": "0",
              "coordinateTransformations": [
                {
                  "type": "scale",
                  "scale": [
                    1.0,
                    1.0,
                    0.16994999999999996,
                    0.1133,
                    0.1133
                  ]
                }
              ]
            }
          ],
          "name": "0"
        }
      ],
      "omero": {
        "version": "0.5",
        "id": 0,
        "name": "fov0",
        "channels": [
          {
            "active": true,
            "coefficient": 1.0,
            "color": "FFFFFF",
            "family": "linear",
            "inverted": false,
            "label": "Phase3D",
            "window": {
              "start": -0.2,
              "end": 0.2,
              "min": -10.0,
              "max": 10.0
            }
          }
        ],
        "rdefs": {
          "defaultT": 0,
          "defaultZ": 0,
          "model": "color",
          "projection": "normal"
        }
      },
      "version": "0.5"
    },
    "biahub-flat_field": {
      "channel_names": [
        "BF - Oblique"
      ],
      "output_ome_zarr_version": null
    },
    "biahub-deskew": {
      "pixel_size_um": 0.1133,
      "ls_angle_deg": 30.0,
      "px_to_scan_ratio": 0.755,
      "scan_step_um": 0.15,
      "keep_overhang": false,
      "overhang_fill": 0,
      "average_n_slices": 3,
      "device": "cpu",
      "output_ome_zarr_version": null
    },
    "waveorder": {
      "Phase3D": {
        "input_channel_names": [
          "BF - Oblique"
        ],
        "time_indices": "all",
        "reconstruction_dimension": 3,
        "device": null,
        "birefringence": null,
        "phase": {
          "transfer_function": {
            "wavelength_illumination": 0.532,
            "yx_pixel_size": 0.1133,
            "z_pixel_size": 0.17,
            "z_padding": 5,
            "z_focus_offset": 0,
            "index_of_refraction_media": 1.4,
            "numerical_aperture_detection": 1.35,
            "tilt_angle_zenith": 0.0,
            "tilt_angle_azimuth": 0.0,
            "numerical_aperture_illumination": 0.52,
            "invert_phase_contrast": false
          },
          "apply_inverse": {
            "reconstruction_algorithm": "Tikhonov",
            "regularization_strength": 0.01,
            "TV_rho_strength": 0.001,
            "TV_iterations": 1
          }
        },
        "fluorescence": null,
        "optimization": null
      }
    }
  },
  "zarr_format": 3,
  "node_type": "group"
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants