-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
open_dataset is not thread-safe #4100
Comments
Thanks for the clear report! I know we use backend-specific locks by default when opening netCDF files, so I was initially puzzled by this. But now that I've looked back over the implementation, this makes sense. We currently only guarantee thread safety when reading data after files have been opened. For example, you could write something like: dataset = xr.open_dataset(SAVED_FILE_NAME, engine="netcdf4")
threads = [
threading.Thread(target=lambda: do_something_with_xarray(dataset))
for _ in range(N_THREADS)
] For many use-cases (e.g., in dask), this is a sufficient form of parallelism, because xarray's file opening is lazy and only needs to read metadata, not array values. It would indeed be nice if |
There are also a few work-arounds you might consider in the meantime here:
|
Hey @shoyer, thanks very much for the quick response and for suggesting possible workarounds! |
@shoyer could you tell me more about what it means to compile HDF5 in "thread safe" mode? |
Take a look here: https://portal.hdfgroup.org/display/knowledge/Questions+about+thread-safety+and+concurrent+access I haven't actually tried compiling in thread-safe mode myself |
In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity If this issue remains relevant, please comment here or remove the |
Closing as no bug, but feel free to reopen with a suggested path format |
👋 Hi, great library! I've been trying to use
xarray
from a flask server and have encountered frequent segfaults when trying to load a file.MCVE Code Sample
Putting together this MCVE example took a bit of time, but it was a good exercise for me. I enjoyed learning more about the MVCE philosophy! 😊 The only caveat is that reproducibility is difficult for this kind of threading issue, so I can't guarantee that the bug will reproduce. If you have any suggestions for improved reproducibility, please let me know what I can do!
Expected Output
Program prints
No segfault!
and exits successfullyProblem Description
The program sometimes segfaults. When running with the Python fault handler, I often get an output that looks like this:
Versions
Output of xr.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.7.4 (default, Sep 7 2019, 18:27:02)
[Clang 10.0.1 (clang-1001.0.46.4)]
python-bits: 64
OS: Darwin
OS-release: 19.0.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.2
libnetcdf: 4.6.3
xarray: 0.15.1
pandas: 1.0.3
numpy: 1.18.4
scipy: None
netCDF4: 1.5.3
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.1.3
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
setuptools: 46.4.0
pip: 20.1.1
conda: None
pytest: None
IPython: None
sphinx: None
The text was updated successfully, but these errors were encountered: