fix(examples): host deconvolution data on Zenodo + pooch auto-download#48
Conversation
Adds scripts/zenodo_upload.py and a sidecar metadata JSON used to publish the cubic deconvolution example data (3D astrocyte nuclei and theoretical PSF) on Zenodo. The script creates an empty deposition via the REST API, attaches the metadata, streams each file to the deposition bucket, and verifies the returned md5 against the local md5 (in addition to a committed sha256 guard against on-disk bit-rot). Defaults to leaving the deposition as a draft for manual review; --publish publishes immediately. Reusable for future cubic example datasets - point FILES_TO_UPLOAD and the metadata JSON at the new files and rerun. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nload via pooch The two Google Drive links in examples/data/README.md (3D astrocyte stack + theoretical PSF) had stopped resolving, so the deconvolution notebook had no way to bootstrap its inputs. Both files are now deposited on Zenodo (DOI 10.5281/zenodo.20514102, concept DOI 10.5281/zenodo.20514101) under CC-BY-4.0. examples/notebooks/deconvolution_iterations_3d.ipynb now fetches both files via pooch with sha256 verification - matching the pattern already used by the resolution and segmentation notebooks - so the manual download step is gone. The README entry is rewritten to list the filenames, acquisition/modeling details, and the Zenodo DOI. Closes #47. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reviewer's GuideUpdates the 3D deconvolution example to auto-download its input data from a new Zenodo deposition using pooch, documents the new data source in the examples data README, and adds a reusable script plus metadata for creating and publishing the Zenodo deposition with checksum verification. Sequence diagram for deconvolution notebook data fetch via poochsequenceDiagram
actor User
participant Notebook as deconvolution_iterations_3d.ipynb
participant Pooch as pooch
participant Zenodo as Zenodo_API
participant FS as Data_Dir
User->>Notebook: run fetch_data for astr_vpa_hoechst.tif
Notebook->>Pooch: pooch.retrieve(url, known_hash, fname, path)
alt file not in Data_Dir
Pooch->>Zenodo: GET /api/records/20514102/files/astr_vpa_hoechst.tif/content
Zenodo-->>Pooch: file bytes
Pooch->>FS: save astr_vpa_hoechst.tif
Pooch-->>Notebook: local path
else file already present
Pooch-->>Notebook: existing local path
end
User->>Notebook: run fetch_data for astr_vpa_hoechst_psf_na095_cropped.tif
Notebook->>Pooch: pooch.retrieve(url, known_hash, fname, path)
Pooch-->>Notebook: local path
User->>Notebook: load data
Notebook->>FS: imread(astr_vpa_hoechst.tif)
FS-->>Notebook: image array
Notebook->>FS: imread(astr_vpa_hoechst_psf_na095_cropped.tif)
FS-->>Notebook: psf array
Notebook->>Notebook: ascupy(image), ascupy(psf) if USE_GPU
Sequence diagram for zenodo_upload deposition workflowsequenceDiagram
participant User
participant Script as zenodo_upload.py
participant Zenodo as Zenodo_API
participant FS as Local_Files
User->>Script: python zenodo_upload.py [--sandbox] [--publish]
Script->>FS: read zenodo_metadata.json
Script->>FS: check FILES_TO_UPLOAD exist
Script->>Zenodo: POST /api/deposit/depositions
Zenodo-->>Script: deposition id, links.bucket, links.html
Script->>Zenodo: PUT /api/deposit/depositions/{id} (metadata)
Zenodo-->>Script: updated deposition
loop for each file in FILES_TO_UPLOAD
Script->>FS: hash_file(path)
Script->>Script: compare with EXPECTED_SHA256
Script->>Zenodo: PUT {bucket_url}/remote_name (file bytes)
Zenodo-->>Script: uploaded file info (checksum)
Script->>Script: compare md5 with local hash
end
alt --publish passed
Script->>Zenodo: POST /api/deposit/depositions/{id}/actions/publish
Zenodo-->>Script: published deposition (doi, conceptdoi)
else no --publish
Script->>User: print draft URL
end
File-Level Changes
Assessment against linked issues
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The Zenodo URLs and SHA256 hashes are duplicated between the notebook and
scripts/zenodo_upload.py; consider centralizing these constants (e.g., a small shared module or JSON) so future updates only need to be made in one place. - The notebook hardcodes
DATA_DIR = Path("../data"), which may break if the notebook is executed from a different working directory; consider resolving the path relative to the notebook file or using the same helper/path logic as your other pooch-based examples. - In
zenodo_upload.py, both sandbox and production use the sameZENODO_TOKENenv var; if you plan to use separate tokens, it might be safer to support distinct variables (e.g.,ZENODO_SANDBOX_TOKEN) or a--tokenargument to avoid accidentally using the wrong credentials.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The Zenodo URLs and SHA256 hashes are duplicated between the notebook and `scripts/zenodo_upload.py`; consider centralizing these constants (e.g., a small shared module or JSON) so future updates only need to be made in one place.
- The notebook hardcodes `DATA_DIR = Path("../data")`, which may break if the notebook is executed from a different working directory; consider resolving the path relative to the notebook file or using the same helper/path logic as your other pooch-based examples.
- In `zenodo_upload.py`, both sandbox and production use the same `ZENODO_TOKEN` env var; if you plan to use separate tokens, it might be safer to support distinct variables (e.g., `ZENODO_SANDBOX_TOKEN`) or a `--token` argument to avoid accidentally using the wrong credentials.
## Individual Comments
### Comment 1
<location path="scripts/zenodo_upload.py" line_range="125-134" />
<code_context>
+ )
+ return 3
+ with path.open("rb") as fh:
+ r = requests.put(
+ f"{bucket_url}/{remote_name}",
+ params=params,
+ data=fh,
+ timeout=None,
+ )
+ r.raise_for_status()
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Consider using a finite timeout for the file upload request instead of `timeout=None`.
An infinite timeout can cause the upload to hang forever if the connection stalls. Using a finite (possibly large) timeout, or separate connect/read timeouts, makes failures predictable and better suited for automated runs. You could also expose the timeout as a CLI option for large uploads.
Suggested implementation:
```python
r = requests.put(
f"{bucket_url}/{remote_name}",
params=params,
data=fh,
# Use a finite timeout to avoid hanging indefinitely on stalled connections.
# This tuple is (connect_timeout, read_timeout); adjust as appropriate.
timeout=(10, 600),
)
```
To fully implement the suggestion of making this timeout configurable:
1. Add a CLI option (for example `--upload-timeout`) in the argument parser to allow the user to specify either a single timeout value or a connect/read timeout pair.
2. Parse that option into either a float/int or a `(connect_timeout, read_timeout)` tuple.
3. Replace the hard-coded `(10, 600)` in the `requests.put` call with the parsed timeout value (for example, a variable like `upload_timeout` that you pass through to the function containing this code).
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Pull request overview
This PR restores the deconvolution example’s dataset availability by moving the input TIFFs from broken Google Drive links to a Zenodo record and updating the deconvolution notebook to auto-download the files via pooch (with SHA256 verification). It also adds a reusable helper script + metadata JSON for creating/uploading Zenodo depositions for future example datasets.
Changes:
- Update
deconvolution_iterations_3d.ipynbto fetch required TIFF inputs from Zenodo viapoochon first run. - Rewrite the deconvolution dataset entry in
examples/data/README.mdto document filenames, acquisition/modeling details, and Zenodo source. - Add
scripts/zenodo_upload.pyandscripts/zenodo_metadata.jsonto automate creating and optionally publishing a Zenodo deposition.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
scripts/zenodo_upload.py |
Adds a CLI script that creates a Zenodo deposition, uploads files, verifies checksums, and optionally publishes. |
scripts/zenodo_metadata.json |
Provides the deposition metadata used by the upload script (title/description/citation/related identifiers). |
examples/notebooks/deconvolution_iterations_3d.ipynb |
Adds a pooch-based fetch cell to download the deconvolution dataset from Zenodo with SHA256 verification. |
examples/data/README.md |
Replaces broken Google Drive links with Zenodo-based dataset documentation for the deconvolution example. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The Zenodo upload tool used to fail with a bare ModuleNotFoundError because requests is not part of cubic's runtime dependency closure. Wrap the import and raise SystemExit with the pip/uv install line so the failure mode is self-explanatory. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
timeout=None could let the script wait forever if the connection stalled mid-upload. Use a (10s connect, 30min read) tuple — generous enough for hundreds of MB of streamed data on slow uplinks, but fails predictably on a frozen socket. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The docstring claimed the script was specific to the deconvolution data; the commit message that introduced it described it as reusable for future cubic example datasets. Reword the opener so future users see the reuse path (edit FILES_TO_UPLOAD + EXPECTED_SHA256 + the metadata JSON) without needing to read the commit history. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rce line
The previous wording linked the same DOI twice in one sentence
("Zenodo record" + "(DOI ...)"). One link is enough.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Harmless in Jupyter (it just suppresses repr) but inconsistent with the preceding fetch_data call in the same cell. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ownload Refreshes the notebook outputs from a clean GPU kernel run after the switch to pooch/Zenodo fetching (#48). Cells 0-2 now show the end-to-end bootstrap (GPU available, pooch download path, image/PSF shapes), and execution counts are renumbered sequentially. All reported resolution values are unchanged (PSNR iter 35, SSIM iter 37, FSC iter 41, DCR XY 919->785 nm / Z 974->874 nm); only per-iteration FSC improvement figures differ at ~1e-5 from GPU FP non-determinism. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Covers the deconvolution example data move to Zenodo + pooch auto-download (#48) on top of v0.7.0a9. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes #47.
Problem
Both Google Drive links in
examples/data/README.md(deconvolution section) had stopped resolving, leavingexamples/notebooks/deconvolution_iterations_3d.ipynbwith no way to bootstrap its inputs (a 3D Hoechst-stained astrocyte stack and a theoretical PSF).Fix
Republished both files on Zenodo under CC-BY-4.0 with the proper IEEE archived citation:
10.1109/ICCVW69036.2025.00608, IEEE Xplore URL, CVF open-access PDF, arXiv preprint, GitHub repo.deconvolution_iterations_3d.ipynbnow pulls both files viapoochwithsha256verification on first run, matching the pattern used by the other resolution / segmentation example notebooks. Theexamples/data/README.mdentry is rewritten with filenames, acquisition / modeling details, and the Zenodo DOI.Tooling
Adds
scripts/zenodo_upload.py+scripts/zenodo_metadata.json— the script used to create the deposition. Reusable for future cubic example datasets (pointFILES_TO_UPLOADand the metadata JSON at the new files and rerun; deposition stays as a draft unless--publishis passed).Test plan
examples/notebooks/deconvolution_iterations_3d.ipynbfrom a fresh checkout (no localexamples/data/astr_vpa_hoechst*.tif), run cells 1-4 — pooch should download both files intoexamples/data/and the load cell should succeed.🤖 Generated with Claude Code
Summary by Sourcery
Switch the deconvolution example to auto-download its input data from Zenodo and add tooling to manage the Zenodo deposition.
New Features:
Enhancements:
Documentation: