"can't pickle thread.lock objects" when working with published dataframe

We're using dask distributed scheduler with multiprocessing workers on an EC2 cluster.
dask 0.15.4 and distributed 1.19.3

I'm trying to publish named dataset (dataframe) and then retrieve and continue working on it. Basically:
```
frame = df.read_csv(url, ...)
client.publish_dataset(ds_name=frame)
ds = client.get_dataset(ds_name)
client.compute(ds)
```

This results in `'TypeError: can't pickle thread.lock objects'` error.

I suppose this might be related to:
https://github.com/dask/distributed/issues/780
https://github.com/dask/dask/issues/1683
https://github.com/dask/distributed/issues/539

I don't know how to work around this issue because read_csv() doesn't seem to accept lock argument.

full traceback:
[traceback.txt](https://github.com/dask/distributed/files/1470670/traceback.txt)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

"can't pickle thread.lock objects" when working with published dataframe #1556

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

"can't pickle thread.lock objects" when working with published dataframe #1556

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions