Open
Description
We're using dask distributed scheduler with multiprocessing workers on an EC2 cluster.
dask 0.15.4 and distributed 1.19.3
I'm trying to publish named dataset (dataframe) and then retrieve and continue working on it. Basically:
frame = df.read_csv(url, ...)
client.publish_dataset(ds_name=frame)
ds = client.get_dataset(ds_name)
client.compute(ds)
This results in 'TypeError: can't pickle thread.lock objects'
error.
I suppose this might be related to:
#780
dask/dask#1683
#539
I don't know how to work around this issue because read_csv() doesn't seem to accept lock argument.
full traceback:
traceback.txt
Metadata
Metadata
Assignees
Labels
No labels