-
Notifications
You must be signed in to change notification settings - Fork 4
Description
I've tried to open an asset without specifying the chunks (i.e. chunks=None) and then calling compute, but that resulted in surprising behavior:
import pystac
asset = pystac.Asset(
href="https://data-marineinsitu.ifremer.fr/glo_multiparameter_nrt/monthly/TG/202205/NO_TS_TG_SwanagePierTG_202205.nc",
media_type="application/netcdf",
)
ds = xr.open_dataset(asset, engine="stac")which worked seemingly fine. However, once I called .compute() on it, the result was a dataset containing dask arrays:
with xr.set_options(display_expand_attrs=False):
print(repr(ds.load()))displays as
<xarray.Dataset> Size: 36kB
Dimensions: (TIME: 741, DEPTH: 1)
Coordinates:
* TIME (TIME) datetime64[ns] 6kB 2022-05-09T12:30:00 ... 2022-05-31T2...
LATITUDE float32 4B 50.61
LONGITUDE float32 4B -1.949
DEPH (DEPTH) float32 4B dask.array<chunksize=(1,), meta=np.ndarray>
STATION |S64 64B b'SwanagePierTG'
Dimensions without coordinates: DEPTH
Data variables:
TIME_QC (TIME) int8 741B dask.array<chunksize=(741,), meta=np.ndarray>
VTPK (TIME, DEPTH) float32 3kB dask.array<chunksize=(741, 1), meta=np.ndarray>
VTPK_QC (TIME, DEPTH) float32 3kB dask.array<chunksize=(741, 1), meta=np.ndarray>
VTM02 (TIME, DEPTH) float32 3kB dask.array<chunksize=(741, 1), meta=np.ndarray>
VTM02_QC (TIME, DEPTH) float32 3kB dask.array<chunksize=(741, 1), meta=np.ndarray>
VZMX (TIME, DEPTH) float32 3kB dask.array<chunksize=(741, 1), meta=np.ndarray>
VZMX_QC (TIME, DEPTH) float32 3kB dask.array<chunksize=(741, 1), meta=np.ndarray>
SLEV (TIME, DEPTH) float32 3kB dask.array<chunksize=(741, 1), meta=np.ndarray>
SLEV_QC (TIME, DEPTH) float32 3kB dask.array<chunksize=(741, 1), meta=np.ndarray>
VHM0 (TIME, DEPTH) float32 3kB dask.array<chunksize=(741, 1), meta=np.ndarray>
VHM0_QC (TIME, DEPTH) float32 3kB dask.array<chunksize=(741, 1), meta=np.ndarray>
Attributes: (44)
I think the reason for this is that xpystac, as a composite backend, is calling xr.open_dataset with a chunks parameter. However, as it currently is implemented, xr.open_dataset wants to be in charge of anything related to dask, and thus the result of the call above is a Dataset object where dask-arrays are wrapped by xarray's lazy-loading arrays.
As far as I can tell, the recommended fix would be to not use dask within the backend, but I guess that doesn't work if you're trying to perform computations within the backend. So I guess either we have to change something within xarray's backend machinery to officially support composite backends, or you'd have to warn about not supporting in-memory loading in the documentation of xpystac (or maybe change the API, for example to require open_datatree for item collections, which can then be aggregated into a mosaic with a separate call)