Reuse of keys in blockwise fusion can cause spurious KeyErrors on distributed cluster #9888

fjetter · 2023-01-27T15:57:58Z

Subsequent Blockwise layers are currently fused into a single layer. This reduces the number of tasks, the overhead and is very generally a good thing to do. Currently, the fused output does not generate unique key names which is a problem from a UX perspective but can also cause severe failure cases when being executed on the distributed scheduler since distributed assumes that a task key is a unique identifier for the entire task. While it is true that the data output of the fused key and the non-fused key is identical, the runspec and the local topology is intentionally very different. Specifically, a fused task, for example, may not have any dependencies while the non-fused task does have dependencies.

An example where this matters is the following (async code is not necessary but the condition is actually a bit difficult to trigger and this helps. Paste this code in a Jupyter notebook and run it a couple of times).

import asyncio
from distributed import Client, Scheduler, Worker

import dask
import time
import dask.dataframe as dd
async with (
    Scheduler() as s,
    Worker(s.address) as a,
    Worker(s.address) as b,
    Client(s.address, asynchronous=True) as c
):
    df = dask.datasets.timeseries(
        start="2000-01-01",
        end="2000-01-10",
        dtypes={"x": float, "y": float},
        freq="100 s",
    )
    out = dd.shuffle.shuffle(df, "x")
    out = out.persist()

    while not a.tasks:
        await asyncio.sleep(0.05)

    del out

    out = dd.shuffle.shuffle(df, "x")
    x, y = c.compute([df.x.size, out.x.size])
    x = await c.gather(x, y)

Note how initial shuffle is persisted and a slightly different version of this graph is computed again below but the graph below is slightly different. From what I can gather, the initial persist is fusing the keys while the latter one does not (maybe it's the other way round, I'm not sure. Either way a different issue).

This specific reproducer actually triggers (not every time) a KeyError in a workers data buffer while trying to read data.

023-01-27 16:28:34,648 - distributed.worker - ERROR - Exception during execution of task ('size-chunk-0482e0de93343089cd64837d139d9a80-49c0404470df4695b3f5aa383f11c345', 3).
Traceback (most recent call last):
  File "/Users/fjetter/workspace/distributed/distributed/worker.py", line 2366, in _prepare_args_for_execution
    data[k] = self.data[k]
  File "/Users/fjetter/workspace/distributed/distributed/spill.py", line 257, in __getitem__
    return super().__getitem__(key)
  File "/Users/fjetter/mambaforge/envs/dask-distributed-310/lib/python3.10/site-packages/zict/buffer.py", line 108, in __getitem__
    raise KeyError(key)
KeyError: "('simple-shuffle-539a650502e21de2dabd2021c9c9e684', 3)"

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/fjetter/workspace/distributed/distributed/worker.py", line 2247, in execute
    args2, kwargs2 = self._prepare_args_for_execution(ts, args, kwargs)
  File "/Users/fjetter/workspace/distributed/distributed/worker.py", line 2370, in _prepare_args_for_execution
    data[k] = Actor(type(self.state.actors[k]), self.address, k, self)
KeyError: "('simple-shuffle-539a650502e21de2dabd2021c9c9e684', 3)"

This is caused because the dependency relations between tasks are no longer accurate on the scheduler and it considers a task "ready", i.e. all dependencies in memory, too soon, causing a failure on the worker.
When validate is activated, the scheduler catches these cases earlier and raises appropriate AssertionErrors. This is not checked at runtime for performance reasons and is typically not necessary since we rely on the assumption that keys identify a task uniquely.

Apart from this artificial example, we do have internal reports about such a spurious KeyError in combination with an xgboost workload

The text was updated successfully, but these errors were encountered:

crusaderky · 2023-01-30T12:28:44Z

If I understood correctly:

x is in memory on a
a receives the stimulus to compute y, which depends on x. y transitions to executing.
before y can reach _prepare_args_for_execution, y is cancelled and x is released (which deletes it from the SpillBuffer).
before Worker.execute can complete, a receives a stimulus to execute a new task, which has the same key as y but different spec and crucially no dependency on x.
y goes into resumed state
Worker.execute terminates with KeyError, raised by _prepare_args_for_execution. If the task was still cancelled this would not be a problem as the error would be ignored.

I don't think this should be in dask/dask. This is (yet another) fragility of the cancelled state and the dask/dask graph shouldn't work around it. It should be OK to submit to the scheduler a new, different task with the same key as a released future.

crusaderky · 2023-01-30T13:09:57Z

Actually - I don't think you need to change the graph. You can get this error if you resubmit the same exact tasks.

x is in memory on a
a receives the stimulus to compute y, which depends on x. y transitions to executing.
before y can reach _prepare_args_for_execution, y is cancelled and x is released (which deletes it from the SpillBuffer).
client resubmits x and y
x transitions to memory on b
before Worker.execute can complete on a, a receives {op: compute-task, key: y, who_has: {x: b.address}}
y transitions back to executing (_transition_cancelled_waiting)
Worker.execute terminates with KeyError, raised by _prepare_args_for_execution.

This requires x to go through its whole lifecycle between client, scheduler, and worker b before a has had the chance of doing even a single round of the event loop. The easiest way to achieve this is to hamstring a's event loop by spilling many GiBs worth of data.

fjetter · 2023-01-30T13:45:44Z

TLDR The current system assumes strongly, in many places, that a key uniquely identifies a task, not just the output. Reusing a key for a different task violates this and distributed can fail on multiple levels.

The scheduler doesn't have the possibility to understand that the tasks are new and different. The problem has nothing to do with the Worker, this error is just surfacing there. As stated above, if validation is on, this raises in various places on the scheduler (I found three different assertion that could be triggered depending on queuing/no-queuing and some timing; here, here and I forgot where the third one was).

If you inspect closely what happens if you are executing the above script (and the del is not necessary, the del is merely there for being explicit) is that the second update-graph contains a key assign-* with a different runspec but the same key as in the first update-graph. This new state is then not properly handled by the scheduler since the scheduler assumes in many places that keys are uniquely identifying a task.

What happens here is that the scheduler knows a certain task A already that doesn't have any dependencies (first persist, w/ task fusion). Then a subsequent update-graph submits the same task but with dependencies now. This causes the previous TaskState object to be updated, i.e if task A/wout dependencies is in state ready it is updated to now have dependencies. The TaskState object after this update is now corrupt and would be caught by the scheduler validation rules but instead we're submitting this to the worker where it somehow ends up being executed.

It may be true that the worker could handle this better but the scheduler is also corrupted already.

It should be OK to submit to the scheduler a new, different task with the same key as a released future.

I believe this is where the first fallacy is hidden. In the code above, what actually happens is

update-graph (from the persist)
Client gets NPartitions future handles
update-graph (from compute; uses same keys with different spec+topologies)
and only then...
client-releases-keys for every future

i.e. the scheduler doesn't even know that the futures were released. This timing is unavoidable due to how refcounting works.

Even if you disregard this release-chain, the same issue can very very easily be constructed with a second client and different graphs.

fjetter · 2023-01-30T13:57:46Z

@crusaderky I can't truly follow your example here but it is a different issue. I'm talking about a guarantee that the task has to be uniquely identifiable by the key which is violated here which causes all sorts of failures in distributed. If you found something else, I would like you to open a new ticket, please.

I believe this issue is cause by this line reusing root instead of generating a new key.

dask/dask/blockwise.py

Line 1589 in a8327a3

root,

@rjzamora @jrbourbeau Can either one of you motivate why blockwise fusion layers cannot generate a new key? @jrbourbeau you mentioned this recently as a UX problem that could be solved by a "better version of HLGs". I feel like I'm missing some context

rjzamora · 2023-01-30T14:42:30Z

Can either one of you motivate why blockwise fusion layers cannot generate a new key?

Agree that this is a significant problem with HLGs. The reason is the simple fact that there is no (formal/robust) mechanism to "replay" the construction of a graph after the names/properties for a subset of its tasks has been changed. That is, you can't really change the name of a Blockwise layer from "operation-c-*" to "operation-abc-*" after it has been fused with its dependencies, because you would also need to materialize the rest of the graph and manually replace all task keys beginning with "operation-c-*" to "operation-abc-*". So, to clarify: You can change the name, but it forces graph materialization.

fjetter · 2023-01-30T15:27:45Z

The reason is the simple fact that there is no (formal/robust) mechanism to "replay" the construction of a graph after the names/properties for a subset of its tasks has been changed.

So this is a limitation because there might've been an "accidental" materialization before the fusing?

rjzamora · 2023-01-30T15:38:39Z

So this is a limitation because there might've been an "accidental" materialization before the fusing?

Yes and no - The limitation is that there is no API to replace a layer in the HLG (e.g. hlg_graph.replace(<old-key>=new_layer)), because ~~(1) there is no formal rule linking the output task names to the Layer name, and (2)~~ [EDIT: Mostly just this second reason] many HLG Layers are simply wrapping already-materialized dictionaries (on purpose). Therefore, renaming the output tasks of a Blockwise Layer will often require you to materialized the entire graph.

gjoseph92 · 2023-01-30T17:46:02Z

xref #8635. Like @rjzamora said:

if you change the name of the layer, you also have to change the names referenced in the dependent layers. That's just not currently possible with HLG layers—we could feasibly do it to MaterializedLayers, but in general, there's no rename_input_key interface on a Layer.

fjetter · 2023-01-31T11:34:13Z

xref #8635. Like @rjzamora said:

if you change the name of the layer, you also have to change the names referenced in the dependent layers. That's just not currently possible with HLG layers—we could feasibly do it to MaterializedLayers, but in general, there's no rename_input_key interface on a Layer.

I'm still trying to wrap my head around it but I'm getting there.

IIUC this is mostly about layer dependencies? Best case, layer dependencies is simply exchanging a single key; Worst case would be if the layer dependent of that Blockwise layer is a MaterializedLayer in which case we'd need to walk the entire graph to rename the key. is this roughly correct?

rjzamora · 2023-01-31T17:18:22Z

IIUC this is mostly about layer dependencies?

Yes. The problem is that you would need to regenerate all dependent layers, and there is no clear way to do this.

Why there is no clear way to do this: There is no formal mechanism to change the dependencies for an existing Layer. The required Layer API is very limited, and does not provide a way to set the dependencies. Therefore, different logic would need to be used to change dependencies for different layers. For the MaterializedLayer case we can do this manually. However, we can't use the same logic for other layers unless we materialize them first.

Given that the logic here would be "ugly" with the existing Layer API, the ideal solution is probably to update the Layer API to include a clear mechanism for tracking dependencies. However, we have been waiting to clean up serialization. I have personally been waiting to remove the Mapping foundation as well.

fjetter · 2023-01-31T17:59:40Z

Thanks, that clears things up for me.

crusaderky · 2024-02-06T14:05:46Z

Found a nice textbook example in the distributed tests https://github.com/dask/distributed/pull/8185/files#r1479864806

crusaderky · 2024-02-09T16:00:03Z

(repost from dask/distributed#8185 (comment))
Found another use case:

ddf = dd.from_pandas(df, npartitions=4)
with dask.config.set({"dataframe.shuffle.method": "p2p"}):
    ddf = ddf.set_index("a").sort_values("b")
    result = ddf.compute()

This code:

sends some keys to the scheduler on set_index,
waits for the result to return,
then sends keys to the scheduler on sort_values,
waits for the result to return,
and finally sends the computation proper with compute.

Some keys are the same, but run_spec changes; this occasionally triggers a race condition where the scheduler didn't have the time to forget those keys yet.

This is (fortuitously) not causing failures because the key is always close to its released->forgotten transition when this happens (although in other cases it may be in memory; I'm not sure).

If I add a line print(tokenize(ts.run_spec)) to _generate_taskstates, you'll get:

('assign-f0dc4405b24a6c54160e65ddfeb867eb', 0) 60dfef541bc8fe30d091a769b90d6fae
('assign-f0dc4405b24a6c54160e65ddfeb867eb', 0) 6d2d7e33c168fc1a0d0902905b811aca
('assign-f0dc4405b24a6c54160e65ddfeb867eb', 1) 4c91541c56f2890b6a7dce01aa06fab9
('assign-f0dc4405b24a6c54160e65ddfeb867eb', 1) 81485c4a5cf24f8cedf4fe5dcf84acee
('assign-f0dc4405b24a6c54160e65ddfeb867eb', 2) 96326838be2ca074533222b80e986acd
('assign-f0dc4405b24a6c54160e65ddfeb867eb', 2) f0096bf0edd773d5a48fe6f8495ac4f2
('assign-f0dc4405b24a6c54160e65ddfeb867eb', 3) 153143fb0b822dbeeb17601e883b1e45
('assign-f0dc4405b24a6c54160e65ddfeb867eb', 3) c0fb8a09ad926986969e39adf100a1a3

They are currently triggering dask warnings related to a bug dask/dask#9888 . Retry including them after it's fixed.

commit 4e0a9e95301ed6497703b3fb61711abfd4bdfb7a Author: jan <152862650+j-haacker@users.noreply.github.com> Date: Tue Jun 4 15:08:18 2024 +0200 fix(l3): fix dask KeyError This commit fixes the dask issues, previously related to memory shortage. The issues were actually due to non-unique keys for dask tasks (or results or something), see dask/dask#9888 for details. Using closures instead of preliminary graph computation (`.persist()`) to fix the values of `i` in the for-loops, fixes the issue. commit fc5431d440473e3a1b89bf61ff10404d8273f2bf Author: jan <152862650+j-haacker@users.noreply.github.com> Date: Tue Jun 4 11:59:41 2024 +0200 fix(l1b+l2): fix CRS logic Repair logic to choose a projection CRS. Now, x and y in l1b_data always refer to the CRS of the reference DEM. For l2 data, a CRS can be chosen. Fixes #22 commit 9a8d1b22832c67245953f6e977f0d19b383c6952 Author: jan <152862650+j-haacker@users.noreply.github.com> Date: Tue Jun 4 11:45:45 2024 +0200 chore(l1b): make `drop_non_glacier_area` more robust commit c2dde15fc038c84d645b5498dac482e28aaa7343 Author: jan <152862650+j-haacker@users.noreply.github.com> Date: Mon Jun 3 18:16:22 2024 +0200 refactor(l2+l3): revise cache backup strategy commit 7dd521ed428a9ca5bc2132188289c410fd31b7ae Merge: f486345 664aaec Author: jan <152862650+j-haacker@users.noreply.github.com> Date: Mon Jun 3 11:37:25 2024 +0200 Merge branch 'main' into debug_l3 commit f4863457d781eaf82aa21498189c282e9857db23 Merge: e042679 a5377b8 Author: jan <152862650+j-haacker@users.noreply.github.com> Date: Sun Jun 2 21:31:15 2024 +0200 Merge branch 'main' into debug_l3 commit e04267976566db303b28d2852ca35609d0d7176d Merge: 92762a8 6daef87 Author: jan <152862650+j-haacker@users.noreply.github.com> Date: Sun Jun 2 16:37:50 2024 +0200 Merge branch 'main' into debug_l3 commit 92762a836237de1363f0bb75180ffdd34f6e3fb6 Author: jan <152862650+j-haacker@users.noreply.github.com> Date: Thu May 30 12:12:49 2024 +0200 Revert "feat(misc): improve multiprocessing in request_workers" This reverts commit c362e88. commit bc3d669 Author: jan <152862650+j-haacker@users.noreply.github.com> Date: Wed May 29 17:53:24 2024 +0200 WIP debugging strange dask errors in l3 commit 3240554 Author: jan <152862650+j-haacker@users.noreply.github.com> Date: Wed May 29 14:25:48 2024 +0200 On (no branch): migration stash commit c362e88 Author: jan <152862650+j-haacker@users.noreply.github.com> Date: Wed May 29 12:11:34 2024 +0200 feat(misc): improve multiprocessing in request_workers commit 483e9bf Author: jan <152862650+j-haacker@users.noreply.github.com> Date: Fri May 24 15:13:05 2024 +0200 chore: add xarray v2024.1.1 requirement commit ab5cbb9 Author: jan <152862650+j-haacker@users.noreply.github.com> Date: Wed Apr 3 19:06:32 2024 +0200 fix(l2): fix hardcoded path commit 4d01038 Author: jan <152862650+j-haacker@users.noreply.github.com> Date: Wed Apr 3 18:53:22 2024 +0200 fix(l3): WIP fix issues

josephnowak · 2024-08-24T20:33:17Z

Hi,
I'm trying to update some of my Zarr files following a DAG order, but some of the tasks reuse the resulting delayed arrays from others tasks, and as they run in parallel and load the same data I think that is making Dask raise a warning mentioning this issue. I do not have a problem if the data is loaded multiple times per task, I just would like to preserve my DAG execution and run everything in parallel, I would appreciate if you could let me know if there is some workaround/alternative to be able to run my DAG without receiving this warning and without affecting the results, I let attached an example to clarify in a better way what I'm trying to achieve.

If there is no workaround/alternative, I have the time to contribute to this issue (it would be my first in Dask, I have only contributed to Xarray).

import xarray as xr
import dask.array as da
from dask.distributed import LocalCluster, Client

# Change the base_path
base_path = "YOUR_PATH"
sizes = [("a", 3), ("b", 5), ("c", 2)]


def update_dataset(factor):
    coords = {
        dim: list(range(size))
        for dim, size in sizes
    }
    dataset = xr.Dataset(
        {
            "data": xr.DataArray(
                da.ones(
                    shape=[size for _, size in sizes],
                    chunks=(1, 1, 1)
                ) * factor,
                coords=coords,
                dims=[dim for dim, _ in sizes]
            )
        }
    )
    path = f"{base_path}/test_zarr_{factor}"
    dataset.to_zarr(path, mode="w", compute=True)
    return xr.open_zarr(path)


def calculate1(x, y):
    path = f"{base_path}/test_zarr_calculate1"
    result = x * y
    result.to_zarr(path, mode="w", compute=True)
    return result


def calculate2(x, y):
    path = f"{base_path}/test_zarr_calculate2"
    result = x / y
    result.to_zarr(path, mode="w", compute=True)
    return result


dag = {
    "update_arr1": (update_dataset, 2),
    "update_arr2": (update_dataset, 4),
    "calculate1": (calculate1, "update_arr1", "update_arr2"),
    "calculate2": (calculate2, "update_arr1", "update_arr2"),
    "report": (lambda x, y: True, "calculate1", "calculate2")
}

with LocalCluster(
    n_workers=1, 
    threads_per_worker=2, 
    memory_limit="1GB",
) as cluster:
    with cluster.get_client() as client:
        print(client.get(dag, "report"))

Warning raised:

2024-08-24 16:13:02,764 - distributed.scheduler - WARNING - Detected different `run_spec` for key 'original-open_dataset-data-c801781151f576ac3a1c261dc894c1d0' between two consecutive calls to `update_graph`. This can cause failures and deadlocks down the line. Please ensure unique key names. If you are using a standard dask collections, consider releasing all the data before resubmitting another computation. More details and help can be found at https://github.com/dask/dask/issues/9888. 
Debugging information
---------------------
old task state: processing
old run_spec: (<function execute_task at 0x0000023975DFF600>, (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyIndexedArray(array=<xarray.backends.zarr.ZarrArrayWrapper object at 0x00000239772DD380>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))),), {})
new run_spec: (<function execute_task at 0x0000023975DFF600>, (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyIndexedArray(array=<xarray.backends.zarr.ZarrArrayWrapper object at 0x0000023977322A40>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))),), {})
old token: ('tuple', [('913ceb5b5beb463a9010ec0790bc30002ca34164', []), ('tuple', [('7a2b6b38794817e675b96ba80026355719825fb1', ['6693512388b528fbf156d0cc4c7b9449588c44bf'])]), ('dict', [])])
new token: ('tuple', [('913ceb5b5beb463a9010ec0790bc30002ca34164', []), ('tuple', [('7a2b6b38794817e675b96ba80026355719825fb1', ['45ef8ea51c5d50300dd28843ec098db9cbb85566'])]), ('dict', [])])
old dependencies: set()
new dependencies: set()

I'm using Xarray 2024.07.0 and Dask 2024.08.01

fjetter · 2024-08-26T09:01:15Z

@josephnowak this looks like a bug that's been reported on the xarray tracker pydata/xarray#9325 which was fixed in #11320

josephnowak · 2024-08-27T18:15:12Z

@fjetter Thanks for your response, I tried using Dask 2024.08.01 which in theory contains the fix that you are mentioning but
I'm still getting the warning.

JoranDox · 2024-11-28T11:11:22Z

Hi all, I'm running dask.array.linalg.svd_compressed on a dask array, and getting an error linking to this issue, with

Detected different `run_spec` for key ('unique-chunk-9a6ff65c242a27e80238a4b391175021-fef5c953b2221b0ee5e1bf565ad89ba4', 35) between two consecutive calls to `update_graph`. This can cause failures and deadlocks down the line. Please ensure unique key names. If you are using a standard dask collections, consider releasing all the data before resubmitting another computation. More details and help can be found at https://github.com/dask/dask/issues/9888.

and

2024-11-28 09:51:12,805 - distributed.scheduler - WARNING - Detected different `run_spec` for key ('assign-b03a54a86dd939287458141b89b34755', 35) between two consecutive calls to `update_graph`. This can cause failures and deadlocks down the line. Please ensure unique key names. If you are using a standard dask collections, consider releasing all the data before resubmitting another computation. More details and help can be found at https://github.com/dask/dask/issues/9888.

and basically the same warning tens of times with different keys (sometimes it's "assign-...", sometimes it's "unique-...", etc)

I am wondering whether that's because of a bug somewhere, or because I'm doing something kind of weird to build my dask array from a dask dataframe initially?

Context is recommender systems, and the typical user-items matrix decomposition that goes with it.

I've got a dask dataframe with user-item interaction rows, (essentially equivalent to a sparse coo matrix I believe), and when trying to turn it into a dask array (so I can use dask.array.linalg.svd_compressed on it), I had to normalise the user ids to be between 0 and n_users, but also for each chunk of the dask array I had to start from 0 again, and do the bookkeeping myself:

    def partition_to_coo(partition):
        users_from_this_partition = (partition["encoded_userid"] % division_size).values
        items_from_this_partition = partition["encoded_itemid"].values
        values = partition["value"].values
        np_array = sparse.COO(
            (values, (users_from_this_partition, items_from_this_partition)),
            shape=(max(users_from_this_partition) + 1, n_items),
        ).todense()
        return np_array

    dense_user_items_matrix = views_sorted.map_partitions(
        partition_to_coo, meta=np.ndarray(shape=(0, 0), dtype=int)
    )

Can/Should this result in the above warning?
Am I doing something stupid and is there a nicer way to do this?
Should I delete this & ask this in a separate github issue and just refer here?

Thanks!

github-actions bot added the needs triage Needs a response from a contributor label Jan 27, 2023

fjetter added highlevelgraph Issues relating to HighLevelGraphs. and removed needs triage Needs a response from a contributor labels Jan 27, 2023

fjetter mentioned this issue Jan 27, 2023

P2P shuffle deduplicates data and can be run several times dask/distributed#7486

Merged

2 tasks

fjetter mentioned this issue Jan 31, 2023

Blockwise optimization doesn't combine task names, like low-level fusion does #8635

Open

fjetter mentioned this issue Feb 24, 2023

Fix default fused key renamer #9974

Closed

3 tasks

crusaderky mentioned this issue Mar 29, 2023

Transition to released to cut dependencies dask/distributed#7723

Closed

fjetter mentioned this issue Sep 14, 2023

Warn if tasks are submitted with identical keys but different run_spec dask/distributed#8185

Merged

crusaderky mentioned this issue Oct 3, 2023

Suppress SpillBuffer stack traces for cancelled tasks dask/distributed#8232

Merged

fjetter mentioned this issue Oct 18, 2023

Update Fused keynames dask/dask-expr#341

Merged

fjetter mentioned this issue Jan 18, 2024

add test that shows how lambda tokenization is broken dask/dask-expr#765

Closed

This was referenced Feb 6, 2024

Tokenization meta-issue #10905

Closed

Config toggle to disable blockwise fusion #10909

Closed

crusaderky mentioned this issue Feb 19, 2024

Keep old dependencies on run_spec collision dask/distributed#8512

Merged

github-actions bot added the needs attention It's been a while since this was pushed on. Needs attention from the owner or a maintainer. label Mar 18, 2024

coroa added a commit to IAMconsortium/concordia that referenced this issue Mar 22, 2024

fix(grid): Remove persist calls on proxies

f51af85

They are currently triggering dask warnings related to a bug dask/dask#9888 . Retry including them after it's fixed.

j-haacker mentioned this issue Jun 5, 2024

building l3 dataset should work with less memory j-haacker/cryoswath#15

Closed

templiert mentioned this issue Aug 8, 2024

Inconsistent run_spec during zarr dataset opening by multiple bound methods pydata/xarray#9325

Closed

5 tasks

templiert mentioned this issue Sep 7, 2024

different run_spec between consecutive calls to update_graph | zarr-formatted xarray #11379

Open

norlandrhagen mentioned this issue Sep 19, 2024

Dask distribued warnings ahuang11/streamjoy#49

Open

fjetter mentioned this issue Nov 27, 2024

Thread safety issue with open_mfdataset, Zarr, and dask pydata/xarray#8815

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reuse of keys in blockwise fusion can cause spurious KeyErrors on distributed cluster #9888

Reuse of keys in blockwise fusion can cause spurious KeyErrors on distributed cluster #9888

fjetter commented Jan 27, 2023

crusaderky commented Jan 30, 2023 •

edited

Loading

crusaderky commented Jan 30, 2023 •

edited

Loading

fjetter commented Jan 30, 2023

fjetter commented Jan 30, 2023

rjzamora commented Jan 30, 2023

fjetter commented Jan 30, 2023

rjzamora commented Jan 30, 2023 •

edited

Loading

gjoseph92 commented Jan 30, 2023

fjetter commented Jan 31, 2023

rjzamora commented Jan 31, 2023

fjetter commented Jan 31, 2023

crusaderky commented Feb 6, 2024

crusaderky commented Feb 9, 2024

josephnowak commented Aug 24, 2024 •

edited

Loading

fjetter commented Aug 26, 2024

josephnowak commented Aug 27, 2024 •

edited

Loading

JoranDox commented Nov 28, 2024

Reuse of keys in blockwise fusion can cause spurious KeyErrors on distributed cluster #9888

Reuse of keys in blockwise fusion can cause spurious KeyErrors on distributed cluster #9888

Comments

fjetter commented Jan 27, 2023

crusaderky commented Jan 30, 2023 • edited Loading

crusaderky commented Jan 30, 2023 • edited Loading

fjetter commented Jan 30, 2023

fjetter commented Jan 30, 2023

rjzamora commented Jan 30, 2023

fjetter commented Jan 30, 2023

rjzamora commented Jan 30, 2023 • edited Loading

gjoseph92 commented Jan 30, 2023

fjetter commented Jan 31, 2023

rjzamora commented Jan 31, 2023

fjetter commented Jan 31, 2023

crusaderky commented Feb 6, 2024

crusaderky commented Feb 9, 2024

josephnowak commented Aug 24, 2024 • edited Loading

fjetter commented Aug 26, 2024

josephnowak commented Aug 27, 2024 • edited Loading

JoranDox commented Nov 28, 2024

crusaderky commented Jan 30, 2023 •

edited

Loading

crusaderky commented Jan 30, 2023 •

edited

Loading

rjzamora commented Jan 30, 2023 •

edited

Loading

josephnowak commented Aug 24, 2024 •

edited

Loading

josephnowak commented Aug 27, 2024 •

edited

Loading