Skip to content

"can't pickle thread.lock objects" when working with published dataframe #1556

Open
@konrad-roze

Description

@konrad-roze

We're using dask distributed scheduler with multiprocessing workers on an EC2 cluster.
dask 0.15.4 and distributed 1.19.3

I'm trying to publish named dataset (dataframe) and then retrieve and continue working on it. Basically:

frame = df.read_csv(url, ...)
client.publish_dataset(ds_name=frame)
ds = client.get_dataset(ds_name)
client.compute(ds)

This results in 'TypeError: can't pickle thread.lock objects' error.

I suppose this might be related to:
#780
dask/dask#1683
#539

I don't know how to work around this issue because read_csv() doesn't seem to accept lock argument.

full traceback:
traceback.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions