Skip to content

surprising behavior of datasets opened by xpystac #45

@keewis

Description

@keewis

I've tried to open an asset without specifying the chunks (i.e. chunks=None) and then calling compute, but that resulted in surprising behavior:

import pystac

asset = pystac.Asset(
    href="https://data-marineinsitu.ifremer.fr/glo_multiparameter_nrt/monthly/TG/202205/NO_TS_TG_SwanagePierTG_202205.nc",
    media_type="application/netcdf",
)

ds = xr.open_dataset(asset, engine="stac")

which worked seemingly fine. However, once I called .compute() on it, the result was a dataset containing dask arrays:

with xr.set_options(display_expand_attrs=False):
    print(repr(ds.load()))

displays as

<xarray.Dataset> Size: 36kB
Dimensions:    (TIME: 741, DEPTH: 1)
Coordinates:
  * TIME       (TIME) datetime64[ns] 6kB 2022-05-09T12:30:00 ... 2022-05-31T2...
    LATITUDE   float32 4B 50.61
    LONGITUDE  float32 4B -1.949
    DEPH       (DEPTH) float32 4B dask.array<chunksize=(1,), meta=np.ndarray>
    STATION    |S64 64B b'SwanagePierTG'
Dimensions without coordinates: DEPTH
Data variables:
    TIME_QC    (TIME) int8 741B dask.array<chunksize=(741,), meta=np.ndarray>
    VTPK       (TIME, DEPTH) float32 3kB dask.array<chunksize=(741, 1), meta=np.ndarray>
    VTPK_QC    (TIME, DEPTH) float32 3kB dask.array<chunksize=(741, 1), meta=np.ndarray>
    VTM02      (TIME, DEPTH) float32 3kB dask.array<chunksize=(741, 1), meta=np.ndarray>
    VTM02_QC   (TIME, DEPTH) float32 3kB dask.array<chunksize=(741, 1), meta=np.ndarray>
    VZMX       (TIME, DEPTH) float32 3kB dask.array<chunksize=(741, 1), meta=np.ndarray>
    VZMX_QC    (TIME, DEPTH) float32 3kB dask.array<chunksize=(741, 1), meta=np.ndarray>
    SLEV       (TIME, DEPTH) float32 3kB dask.array<chunksize=(741, 1), meta=np.ndarray>
    SLEV_QC    (TIME, DEPTH) float32 3kB dask.array<chunksize=(741, 1), meta=np.ndarray>
    VHM0       (TIME, DEPTH) float32 3kB dask.array<chunksize=(741, 1), meta=np.ndarray>
    VHM0_QC    (TIME, DEPTH) float32 3kB dask.array<chunksize=(741, 1), meta=np.ndarray>
Attributes: (44)

I think the reason for this is that xpystac, as a composite backend, is calling xr.open_dataset with a chunks parameter. However, as it currently is implemented, xr.open_dataset wants to be in charge of anything related to dask, and thus the result of the call above is a Dataset object where dask-arrays are wrapped by xarray's lazy-loading arrays.

As far as I can tell, the recommended fix would be to not use dask within the backend, but I guess that doesn't work if you're trying to perform computations within the backend. So I guess either we have to change something within xarray's backend machinery to officially support composite backends, or you'd have to warn about not supporting in-memory loading in the documentation of xpystac (or maybe change the API, for example to require open_datatree for item collections, which can then be aggregated into a mosaic with a separate call)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions