Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

open_dataset is not thread-safe #4100

Closed
paulkernfeld opened this issue May 27, 2020 · 7 comments
Closed

open_dataset is not thread-safe #4100

paulkernfeld opened this issue May 27, 2020 · 7 comments

Comments

@paulkernfeld
Copy link

👋 Hi, great library! I've been trying to use xarray from a flask server and have encountered frequent segfaults when trying to load a file.

MCVE Code Sample

Putting together this MCVE example took a bit of time, but it was a good exercise for me. I enjoyed learning more about the MVCE philosophy! 😊 The only caveat is that reproducibility is difficult for this kind of threading issue, so I can't guarantee that the bug will reproduce. If you have any suggestions for improved reproducibility, please let me know what I can do!

import threading
import xarray as xr


SAVED_FILE_NAME = "saved.nc"

# Modifying these items may change the likelihood of hitting a segfault
N_ELEMENTS = 100
N_THREADS = 2

if __name__ == '__main__':
    xr.Dataset({'foo': ('x', range(N_ELEMENTS))}).to_netcdf(SAVED_FILE_NAME)

    threads = [
        threading.Thread(target=lambda: xr.load_dataset(SAVED_FILE_NAME, engine="netcdf4"))
        for _ in range(N_THREADS)
    ]
    for thread in threads:
        thread.start()

    for thread in threads:
        thread.join()

print("No segfault!")

Expected Output

Program prints No segfault! and exits successfully

Problem Description

The program sometimes segfaults. When running with the Python fault handler, I often get an output that looks like this:

Fatal Python error: Segmentation fault

Thread 0x0000700002c2f000 (most recent call first):
  File "/Users/paul/projects/reflectivity-map-segfault/venv/lib/python3.7/site-packages/xarray/backends/file_manager.py", line 204 in _acquire_with_cache_info
  File "/Users/paul/projects/reflectivity-map-segfault/venv/lib/python3.7/site-packages/xarray/backends/file_manager.py", line 186 in acquire_context
  File "/usr/local/Cellar/python/3.7.4_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/contextlib.py", line 112 in __enter__
  File "/Users/paul/projects/reflectivity-map-segfault/venv/lib/python3.7/site-packages/xarray/backends/netCDF4_.py", line 362 in _acquire
  File "/Users/paul/projects/reflectivity-map-segfault/venv/lib/python3.7/site-packages/xarray/backends/file_manager.py", line 211 in _acquire_with_cache_info
  File "/Users/paul/projects/reflectivity-map-segfault/venv/lib/python3.7/site-packages/xarray/backends/file_manager.py", line 192 in acquire_context

Current thread 0x0000700004a41000 (most recent call first):
  File "/Users/paul/projects/reflectivity-map-segfault/venv/lib/python3.7/site-packages/xarray/backends/file_manager.py", line 204 in _acquire_with_cache_info
  File "/Users/paul/projects/reflectivity-map-segfault/venv/lib/python3.7/site-packages/xarray/backends/file_manager.py", line 186 in acquire_context
  File "/usrzsh: segmentation fault  ./xarray-segfault

Versions

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.4 (default, Sep 7 2019, 18:27:02)
[Clang 10.0.1 (clang-1001.0.46.4)]
python-bits: 64
OS: Darwin
OS-release: 19.0.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.2
libnetcdf: 4.6.3

xarray: 0.15.1
pandas: 1.0.3
numpy: 1.18.4
scipy: None
netCDF4: 1.5.3
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.1.3
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
setuptools: 46.4.0
pip: 20.1.1
conda: None
pytest: None
IPython: None
sphinx: None

@shoyer
Copy link
Member

shoyer commented May 28, 2020

Thanks for the clear report!

I know we use backend-specific locks by default when opening netCDF files, so I was initially puzzled by this. But now that I've looked back over the implementation, this makes sense.

We currently only guarantee thread safety when reading data after files have been opened. For example, you could write something like:

    dataset = xr.open_dataset(SAVED_FILE_NAME, engine="netcdf4")
    threads = [
        threading.Thread(target=lambda: do_something_with_xarray(dataset))
        for _ in range(N_THREADS)
    ]

For many use-cases (e.g., in dask), this is a sufficient form of parallelism, because xarray's file opening is lazy and only needs to read metadata, not array values.

It would indeed be nice if open_dataset() itself were thread safe. Mostly I think this could be achieved by making use of the existing lock attribute found on NetCDF4DataStore and most other DataStore classes.

@shoyer
Copy link
Member

shoyer commented May 28, 2020

There are also a few work-arounds you might consider in the meantime here:

  1. If you're reading netCDF4 files, HDF5 can be compiled in "thread safe" mode (which just adds its own global lock).
  2. If you're reading netCDF3 files, the "scipy" backend is thread safe.
  3. Other file formats like "zarr" don't have this issue at all, and more gracefully scale to very large datasets.

@paulkernfeld
Copy link
Author

Hey @shoyer, thanks very much for the quick response and for suggesting possible workarounds!

@paulkernfeld
Copy link
Author

@shoyer could you tell me more about what it means to compile HDF5 in "thread safe" mode?

@shoyer
Copy link
Member

shoyer commented May 28, 2020

Take a look here: https://portal.hdfgroup.org/display/knowledge/Questions+about+thread-safety+and+concurrent+access

I haven't actually tried compiling in thread-safe mode myself

@stale
Copy link

stale bot commented Apr 19, 2022

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

@stale stale bot added the stale label Apr 19, 2022
@shoyer shoyer changed the title Segfault loading a file from multiple threads Opening files is not thread-safe Apr 19, 2022
@stale stale bot removed the stale label Apr 19, 2022
@shoyer shoyer changed the title Opening files is not thread-safe open_dataset is not thread-safe Apr 19, 2022
@max-sixty
Copy link
Collaborator

Closing as no bug, but feel free to reopen with a suggested path format

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants