Skip to content

fix(examples): host deconvolution data on Zenodo + pooch auto-download#48

Merged
alxndrkalinin merged 7 commits into
mainfrom
fix/deconv-data-zenodo
Jun 2, 2026
Merged

fix(examples): host deconvolution data on Zenodo + pooch auto-download#48
alxndrkalinin merged 7 commits into
mainfrom
fix/deconv-data-zenodo

Conversation

@alxndrkalinin

@alxndrkalinin alxndrkalinin commented Jun 2, 2026

Copy link
Copy Markdown
Owner

Closes #47.

Problem

Both Google Drive links in examples/data/README.md (deconvolution section) had stopped resolving, leaving examples/notebooks/deconvolution_iterations_3d.ipynb with no way to bootstrap its inputs (a 3D Hoechst-stained astrocyte stack and a theoretical PSF).

Fix

Republished both files on Zenodo under CC-BY-4.0 with the proper IEEE archived citation:

deconvolution_iterations_3d.ipynb now pulls both files via pooch with sha256 verification on first run, matching the pattern used by the other resolution / segmentation example notebooks. The examples/data/README.md entry is rewritten with filenames, acquisition / modeling details, and the Zenodo DOI.

Tooling

Adds scripts/zenodo_upload.py + scripts/zenodo_metadata.json — the script used to create the deposition. Reusable for future cubic example datasets (point FILES_TO_UPLOAD and the metadata JSON at the new files and rerun; deposition stays as a draft unless --publish is passed).

Test plan

  • Open examples/notebooks/deconvolution_iterations_3d.ipynb from a fresh checkout (no local examples/data/astr_vpa_hoechst*.tif), run cells 1-4 — pooch should download both files into examples/data/ and the load cell should succeed.
  • Confirm sha256 hashes match the values committed in the new fetch cell.

🤖 Generated with Claude Code

Summary by Sourcery

Switch the deconvolution example to auto-download its input data from Zenodo and add tooling to manage the Zenodo deposition.

New Features:

  • Enable automatic download and integrity checking of the 3D deconvolution example dataset from Zenodo using pooch.
  • Add a reusable script and metadata file for creating and publishing Zenodo depositions of example datasets.

Enhancements:

  • Update the deconvolution example notebook to use the new auto-downloaded data files rather than assuming they are present locally.

Documentation:

  • Revise the example data README to document the new Zenodo-hosted deconvolution dataset, including filenames and acquisition details.

alxndrkalinin and others added 2 commits June 2, 2026 11:29
Adds scripts/zenodo_upload.py and a sidecar metadata JSON used to
publish the cubic deconvolution example data (3D astrocyte nuclei
and theoretical PSF) on Zenodo. The script creates an empty
deposition via the REST API, attaches the metadata, streams each
file to the deposition bucket, and verifies the returned md5 against
the local md5 (in addition to a committed sha256 guard against
on-disk bit-rot). Defaults to leaving the deposition as a draft for
manual review; --publish publishes immediately.

Reusable for future cubic example datasets - point FILES_TO_UPLOAD
and the metadata JSON at the new files and rerun.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nload via pooch

The two Google Drive links in examples/data/README.md (3D astrocyte
stack + theoretical PSF) had stopped resolving, so the deconvolution
notebook had no way to bootstrap its inputs. Both files are now
deposited on Zenodo (DOI 10.5281/zenodo.20514102, concept DOI
10.5281/zenodo.20514101) under CC-BY-4.0.

examples/notebooks/deconvolution_iterations_3d.ipynb now fetches both
files via pooch with sha256 verification - matching the pattern
already used by the resolution and segmentation notebooks - so the
manual download step is gone. The README entry is rewritten to list
the filenames, acquisition/modeling details, and the Zenodo DOI.

Closes #47.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@sourcery-ai

sourcery-ai Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Reviewer's Guide

Updates the 3D deconvolution example to auto-download its input data from a new Zenodo deposition using pooch, documents the new data source in the examples data README, and adds a reusable script plus metadata for creating and publishing the Zenodo deposition with checksum verification.

Sequence diagram for deconvolution notebook data fetch via pooch

sequenceDiagram
    actor User
    participant Notebook as deconvolution_iterations_3d.ipynb
    participant Pooch as pooch
    participant Zenodo as Zenodo_API
    participant FS as Data_Dir

    User->>Notebook: run fetch_data for astr_vpa_hoechst.tif
    Notebook->>Pooch: pooch.retrieve(url, known_hash, fname, path)
    alt file not in Data_Dir
        Pooch->>Zenodo: GET /api/records/20514102/files/astr_vpa_hoechst.tif/content
        Zenodo-->>Pooch: file bytes
        Pooch->>FS: save astr_vpa_hoechst.tif
        Pooch-->>Notebook: local path
    else file already present
        Pooch-->>Notebook: existing local path
    end

    User->>Notebook: run fetch_data for astr_vpa_hoechst_psf_na095_cropped.tif
    Notebook->>Pooch: pooch.retrieve(url, known_hash, fname, path)
    Pooch-->>Notebook: local path

    User->>Notebook: load data
    Notebook->>FS: imread(astr_vpa_hoechst.tif)
    FS-->>Notebook: image array
    Notebook->>FS: imread(astr_vpa_hoechst_psf_na095_cropped.tif)
    FS-->>Notebook: psf array
    Notebook->>Notebook: ascupy(image), ascupy(psf) if USE_GPU
Loading

Sequence diagram for zenodo_upload deposition workflow

sequenceDiagram
    participant User
    participant Script as zenodo_upload.py
    participant Zenodo as Zenodo_API
    participant FS as Local_Files

    User->>Script: python zenodo_upload.py [--sandbox] [--publish]
    Script->>FS: read zenodo_metadata.json
    Script->>FS: check FILES_TO_UPLOAD exist

    Script->>Zenodo: POST /api/deposit/depositions
    Zenodo-->>Script: deposition id, links.bucket, links.html

    Script->>Zenodo: PUT /api/deposit/depositions/{id} (metadata)
    Zenodo-->>Script: updated deposition

    loop for each file in FILES_TO_UPLOAD
        Script->>FS: hash_file(path)
        Script->>Script: compare with EXPECTED_SHA256
        Script->>Zenodo: PUT {bucket_url}/remote_name (file bytes)
        Zenodo-->>Script: uploaded file info (checksum)
        Script->>Script: compare md5 with local hash
    end

    alt --publish passed
        Script->>Zenodo: POST /api/deposit/depositions/{id}/actions/publish
        Zenodo-->>Script: published deposition (doi, conceptdoi)
    else no --publish
        Script->>User: print draft URL
    end
Loading

File-Level Changes

Change Details Files
Switch the deconvolution 3D notebook to use pooch-based auto-download from Zenodo instead of manual Google Drive links.
  • Extend the notebook imports to include pathlib.Path and pooch while preserving existing cubic and imaging imports.
  • Update the dataset description markdown to reference the Zenodo DOI and note that files are auto-downloaded on first run.
  • Add a new code cell defining DATA_DIR, a fetch_data() helper that wraps pooch.retrieve with sha256 verification, and two calls that download the image and PSF from the Zenodo record into examples/data.
  • Leave the image/PSF loading and GPU transfer logic unchanged apart from notebook JSON formatting.
examples/notebooks/deconvolution_iterations_3d.ipynb
Refresh the deconvolution data documentation to describe the files and point to the Zenodo record instead of Google Drive.
  • Rewrite the README intro to clarify that most notebooks auto-download data into examples/data via pooch.
  • Replace the deconvolution section’s Google Drive links with a description of each TIFF file, including acquisition and modeling details.
  • Add the Zenodo record URL and DOI as the authoritative source for the deconvolution example data.
examples/data/README.md
Introduce a reusable Zenodo upload script and metadata file to recreate or extend the deconvolution data deposition with checksum verification.
  • Add scripts/zenodo_upload.py, which creates a new (sandbox or production) Zenodo deposition, attaches metadata from zenodo_metadata.json, uploads the configured example TIFF files, verifies local vs returned checksums, and optionally publishes the deposition.
  • Configure FILES_TO_UPLOAD and EXPECTED_SHA256 in the script so subsequent runs can reuse or adapt the same upload flow for other cubic example datasets.
  • Add scripts/zenodo_metadata.json (not shown in the diff body) to store the deposition metadata consumed by the upload script.
scripts/zenodo_upload.py
scripts/zenodo_metadata.json

Assessment against linked issues

Issue Objective Addressed Explanation
#47 Replace the broken Google Drive links for the deconvolution example datasets in examples/data/README.md with working links or sources so users can download the data.
#47 Ensure the deconvolution_iterations_3d example notebook can successfully obtain and use the required deconvolution datasets so the example is runnable.

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The Zenodo URLs and SHA256 hashes are duplicated between the notebook and scripts/zenodo_upload.py; consider centralizing these constants (e.g., a small shared module or JSON) so future updates only need to be made in one place.
  • The notebook hardcodes DATA_DIR = Path("../data"), which may break if the notebook is executed from a different working directory; consider resolving the path relative to the notebook file or using the same helper/path logic as your other pooch-based examples.
  • In zenodo_upload.py, both sandbox and production use the same ZENODO_TOKEN env var; if you plan to use separate tokens, it might be safer to support distinct variables (e.g., ZENODO_SANDBOX_TOKEN) or a --token argument to avoid accidentally using the wrong credentials.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The Zenodo URLs and SHA256 hashes are duplicated between the notebook and `scripts/zenodo_upload.py`; consider centralizing these constants (e.g., a small shared module or JSON) so future updates only need to be made in one place.
- The notebook hardcodes `DATA_DIR = Path("../data")`, which may break if the notebook is executed from a different working directory; consider resolving the path relative to the notebook file or using the same helper/path logic as your other pooch-based examples.
- In `zenodo_upload.py`, both sandbox and production use the same `ZENODO_TOKEN` env var; if you plan to use separate tokens, it might be safer to support distinct variables (e.g., `ZENODO_SANDBOX_TOKEN`) or a `--token` argument to avoid accidentally using the wrong credentials.

## Individual Comments

### Comment 1
<location path="scripts/zenodo_upload.py" line_range="125-134" />
<code_context>
+            )
+            return 3
+        with path.open("rb") as fh:
+            r = requests.put(
+                f"{bucket_url}/{remote_name}",
+                params=params,
+                data=fh,
+                timeout=None,
+            )
+        r.raise_for_status()
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Consider using a finite timeout for the file upload request instead of `timeout=None`.

An infinite timeout can cause the upload to hang forever if the connection stalls. Using a finite (possibly large) timeout, or separate connect/read timeouts, makes failures predictable and better suited for automated runs. You could also expose the timeout as a CLI option for large uploads.

Suggested implementation:

```python
            r = requests.put(
                f"{bucket_url}/{remote_name}",
                params=params,
                data=fh,
                # Use a finite timeout to avoid hanging indefinitely on stalled connections.
                # This tuple is (connect_timeout, read_timeout); adjust as appropriate.
                timeout=(10, 600),
            )

```

To fully implement the suggestion of making this timeout configurable:
1. Add a CLI option (for example `--upload-timeout`) in the argument parser to allow the user to specify either a single timeout value or a connect/read timeout pair.
2. Parse that option into either a float/int or a `(connect_timeout, read_timeout)` tuple.
3. Replace the hard-coded `(10, 600)` in the `requests.put` call with the parsed timeout value (for example, a variable like `upload_timeout` that you pass through to the function containing this code).
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread scripts/zenodo_upload.py

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR restores the deconvolution example’s dataset availability by moving the input TIFFs from broken Google Drive links to a Zenodo record and updating the deconvolution notebook to auto-download the files via pooch (with SHA256 verification). It also adds a reusable helper script + metadata JSON for creating/uploading Zenodo depositions for future example datasets.

Changes:

  • Update deconvolution_iterations_3d.ipynb to fetch required TIFF inputs from Zenodo via pooch on first run.
  • Rewrite the deconvolution dataset entry in examples/data/README.md to document filenames, acquisition/modeling details, and Zenodo source.
  • Add scripts/zenodo_upload.py and scripts/zenodo_metadata.json to automate creating and optionally publishing a Zenodo deposition.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
scripts/zenodo_upload.py Adds a CLI script that creates a Zenodo deposition, uploads files, verifies checksums, and optionally publishes.
scripts/zenodo_metadata.json Provides the deposition metadata used by the upload script (title/description/citation/related identifiers).
examples/notebooks/deconvolution_iterations_3d.ipynb Adds a pooch-based fetch cell to download the deconvolution dataset from Zenodo with SHA256 verification.
examples/data/README.md Replaces broken Google Drive links with Zenodo-based dataset documentation for the deconvolution example.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/zenodo_upload.py Outdated
Comment thread examples/data/README.md Outdated
alxndrkalinin and others added 5 commits June 2, 2026 11:41
The Zenodo upload tool used to fail with a bare ModuleNotFoundError
because requests is not part of cubic's runtime dependency closure.
Wrap the import and raise SystemExit with the pip/uv install line so
the failure mode is self-explanatory.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
timeout=None could let the script wait forever if the connection
stalled mid-upload. Use a (10s connect, 30min read) tuple — generous
enough for hundreds of MB of streamed data on slow uplinks, but
fails predictably on a frozen socket.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The docstring claimed the script was specific to the deconvolution
data; the commit message that introduced it described it as reusable
for future cubic example datasets. Reword the opener so future users
see the reuse path (edit FILES_TO_UPLOAD + EXPECTED_SHA256 + the
metadata JSON) without needing to read the commit history.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rce line

The previous wording linked the same DOI twice in one sentence
("Zenodo record" + "(DOI ...)"). One link is enough.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Harmless in Jupyter (it just suppresses repr) but inconsistent with
the preceding fetch_data call in the same cell.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@alxndrkalinin alxndrkalinin merged commit 0494f4e into main Jun 2, 2026
9 checks passed
@alxndrkalinin alxndrkalinin deleted the fix/deconv-data-zenodo branch June 2, 2026 20:07
alxndrkalinin added a commit that referenced this pull request Jun 2, 2026
…ownload

Refreshes the notebook outputs from a clean GPU kernel run after the
switch to pooch/Zenodo fetching (#48). Cells 0-2 now show the
end-to-end bootstrap (GPU available, pooch download path, image/PSF
shapes), and execution counts are renumbered sequentially. All
reported resolution values are unchanged (PSNR iter 35, SSIM iter 37,
FSC iter 41, DCR XY 919->785 nm / Z 974->874 nm); only per-iteration
FSC improvement figures differ at ~1e-5 from GPU FP non-determinism.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
alxndrkalinin added a commit that referenced this pull request Jun 2, 2026
Covers the deconvolution example data move to Zenodo + pooch
auto-download (#48) on top of v0.7.0a9.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Example datasets for deconvolution are not available

2 participants