Skip to content

Custom serializer is not transfered to the workers #5561

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
holdenk opened this issue Dec 2, 2021 · 5 comments
Closed

Custom serializer is not transfered to the workers #5561

holdenk opened this issue Dec 2, 2021 · 5 comments

Comments

@holdenk
Copy link

holdenk commented Dec 2, 2021

What happened:

I got an exception:
distributed.protocol.core - CRITICAL - Failed to deserialize
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/distributed/protocol/core.py", line 111, in loads
return msgpack.loads(
File "msgpack/_unpacker.pyx", line 195, in msgpack._cmsgpack.unpackb
File "/opt/conda/lib/python3.8/site-packages/distributed/protocol/core.py", line 103, in _decode_default
return merge_and_deserialize(
File "/opt/conda/lib/python3.8/site-packages/distributed/protocol/serialize.py", line 475, in merge_and_deserialize
return deserialize(header, merged_frames, deserializers=deserializers)
File "/opt/conda/lib/python3.8/site-packages/distributed/protocol/serialize.py", line 407, in deserialize
return loads(header, frames)
File "/opt/conda/lib/python3.8/site-packages/distributed/protocol/serialize.py", line 170, in serialization_error_loads
raise TypeError(msg)
TypeError: Could not serialize object of type ConnectionClass.
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/distributed/protocol/pickle.py", line 49, in dumps
result = pickle.dumps(x, **dump_kwargs)
_pickle.PicklingError: Can't pickle <class 'main.ConnectionClass'>: attribute lookup ConnectionClass on main failed

What you expected to happen:

Get the result

Minimal Complete Verifiable Example:

class ConnectionClass:
    def __init__(self, host, port):
        import socket
        self.socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        self.socket.connect((host, port))

@dask.delayed
def bad_fun(x):
    return ConnectionClass("www.scalingpythonml.com", 80)

from distributed.protocol import dask_serialize, dask_deserialize

@dask_serialize.register(ConnectionClass)
def serialize(bad: ConnectionClass) -> Tuple[Dict, List[bytes]]:
    import cloudpickle
    header = {}
    frames = [cloudpickle.dumps({"host": bad.socket.getpeername()[0], "port": bad.socket.getpeername()[1]})]
    return header, frames

@dask_deserialize.register(ConnectionClass)
def deserialize(bad: Dict, frames: List[bytes]) -> ConnectionClass:
    import cloudpickle
    info = cloudpickle.loads(frames[0])
    return ConnectionClass(info["host"], info["port"])

# now we can sort of serialize the connection
dask.compute(bad_fun(1))

Anything else we need to know?:

Environment:

Packages are:

aiohttp @ file:///drone/src/build_artifacts/aiohttp_1623684437452/work
alembic @ file:///home/conda/feedstock_root/build_artifacts/alembic_1633571415941/work
anyio @ file:///home/conda/feedstock_root/build_artifacts/anyio_1634574746487/work/dist
argon2-cffi @ file:///drone/src/build_artifacts/argon2-cffi_1633994177304/work
async-generator==1.10
async-timeout==3.0.1
attrs @ file:///home/conda/feedstock_root/build_artifacts/attrs_1620387926260/work
Babel @ file:///home/conda/feedstock_root/build_artifacts/babel_1619719576210/work
backcall @ file:///home/conda/feedstock_root/build_artifacts/backcall_1592338393461/work
backports.functools-lru-cache @ file:///home/conda/feedstock_root/build_artifacts/backports.functools_lru_cache_1618230623929/work
bleach @ file:///home/conda/feedstock_root/build_artifacts/bleach_1629908509068/work
blinker==1.4
blosc @ file:///drone/src/build_artifacts/python-blosc_1630634123013/work
bokeh @ file:///drone/src/build_artifacts/bokeh_1634390110209/work
brotlipy==0.7.0
cachetools @ file:///home/conda/feedstock_root/build_artifacts/cachetools_1633010882559/work
cachey @ file:///home/conda/feedstock_root/build_artifacts/cachey_1594854602465/work
certifi==2021.10.8
certipy==0.1.3
cffi @ file:///home/conda/feedstock_root/build_artifacts/cffi_1631636376126/work
chardet @ file:///home/conda/feedstock_root/build_artifacts/chardet_1610093542801/work
charset-normalizer @ file:///home/conda/feedstock_root/build_artifacts/charset-normalizer_1626371162869/work
click @ file:///drone/src/build_artifacts/click_1634004931134/work
cloudpickle @ file:///home/conda/feedstock_root/build_artifacts/cloudpickle_1631273254894/work
colorama @ file:///home/conda/feedstock_root/build_artifacts/colorama_1602866480661/work
conda==4.10.3
conda-package-handling @ file:///drone/src/build_artifacts/conda-package-handling_1618231462646/work
cryptography @ file:///drone/src/build_artifacts/cryptography_1634241277464/work
cudf==21.10.1
cupy @ file:///home/conda/feedstock_root/build_artifacts/cupy_1633142984802/work
cytoolz==0.11.0
dask @ file:///home/conda/feedstock_root/build_artifacts/dask-core_1632274741088/work
dask-cuda @ file:///home/conda/feedstock_root/build_artifacts/dask-cuda_1633753968703/work
dask-cudf==21.10.1
dask-kubernetes @ git+https://github.com/dask/dask-kubernetes.git@1af05396ced5a581e530222c02f3bd9dc0a5a7b3
dask-labextension==5.1.0
debugpy @ file:///drone/src/build_artifacts/debugpy_1627077093858/work
decorator @ file:///home/conda/feedstock_root/build_artifacts/decorator_1631346842025/work
defusedxml @ file:///home/conda/feedstock_root/build_artifacts/defusedxml_1615232257335/work
distributed @ file:///home/conda/feedstock_root/build_artifacts/distributed_1632281034958/work
entrypoints @ file:///home/conda/feedstock_root/build_artifacts/entrypoints_1602702108613/work/dist/entrypoints-0.3-py2.py3-none-any.whl
fastavro @ file:///home/conda/feedstock_root/build_artifacts/fastavro_1632376622933/work
fastrlock==0.6
fsspec @ file:///home/conda/feedstock_root/build_artifacts/fsspec_1634317979613/work
google-auth==2.3.0
greenlet @ file:///drone/src/build_artifacts/greenlet_1632930616667/work
HeapDict==1.0.1
idna @ file:///home/conda/feedstock_root/build_artifacts/idna_1609836280497/work
importlib-metadata @ file:///home/conda/feedstock_root/build_artifacts/importlib-metadata_1630267579445/work
importlib-resources @ file:///home/conda/feedstock_root/build_artifacts/importlib_resources_1634509907544/work
ipykernel @ file:///home/conda/feedstock_root/build_artifacts/ipykernel_1631291277055/work/dist/ipykernel-6.4.1-py3-none-any.whl
ipython @ file:///drone/src/build_artifacts/ipython_1632769053287/work
ipython-genutils==0.2.0
ipywidgets @ file:///home/conda/feedstock_root/build_artifacts/ipywidgets_1631590360471/work
jedi @ file:///home/conda/feedstock_root/build_artifacts/jedi_1610146886964/work
Jinja2 @ file:///home/conda/feedstock_root/build_artifacts/jinja2_1633656206378/work
json5 @ file:///home/conda/feedstock_root/build_artifacts/json5_1600692310011/work
jsonschema @ file:///home/conda/feedstock_root/build_artifacts/jsonschema_1633875207482/work
jupyter-client @ file:///home/conda/feedstock_root/build_artifacts/jupyter_client_1633454794268/work
jupyter-core @ file:///drone/src/build_artifacts/jupyter_core_1631855060205/work
jupyter-server @ file:///home/conda/feedstock_root/build_artifacts/jupyter_server_1633398189934/work
jupyter-server-proxy @ file:///home/conda/feedstock_root/build_artifacts/jupyter-server-proxy_1625381556128/work
jupyter-telemetry @ file:///home/conda/feedstock_root/build_artifacts/jupyter_telemetry_1605173804246/work
jupyterhub @ file:///drone/src/build_artifacts/jupyterhub-feedstock_1626471664957/work
jupyterlab @ file:///home/conda/feedstock_root/build_artifacts/jupyterlab_1634218716496/work
jupyterlab-pygments @ file:///home/conda/feedstock_root/build_artifacts/jupyterlab_pygments_1601375948261/work
jupyterlab-server @ file:///home/conda/feedstock_root/build_artifacts/jupyterlab_server_1632590716858/work
jupyterlab-widgets @ file:///home/conda/feedstock_root/build_artifacts/jupyterlab_widgets_1631590465624/work
kubernetes==18.20.0
kubernetes-asyncio==18.20.0
llvmlite==0.36.0
locket==0.2.0
lz4 @ file:///home/conda/feedstock_root/build_artifacts/lz4_1611332488229/work
Mako @ file:///home/conda/feedstock_root/build_artifacts/mako_1629523042001/work
mamba @ file:///drone/src/build_artifacts/mamba_1634177359406/work
MarkupSafe @ file:///home/conda/feedstock_root/build_artifacts/markupsafe_1621455763868/work
matplotlib-inline @ file:///home/conda/feedstock_root/build_artifacts/matplotlib-inline_1631080358261/work
mistune @ file:///home/conda/feedstock_root/build_artifacts/mistune_1624941427850/work
msgpack @ file:///home/conda/feedstock_root/build_artifacts/msgpack-python_1610121794835/work
multidict @ file:///drone/src/build_artifacts/multidict_1633332807805/work
nbclassic @ file:///home/conda/feedstock_root/build_artifacts/nbclassic_1631880505492/work
nbclient @ file:///home/conda/feedstock_root/build_artifacts/nbclient_1629120697898/work
nbconvert @ file:///drone/src/build_artifacts/nbconvert_1632537385166/work
nbformat @ file:///home/conda/feedstock_root/build_artifacts/nbformat_1617383142101/work
nest-asyncio @ file:///home/conda/feedstock_root/build_artifacts/nest-asyncio_1617163391303/work
notebook @ file:///home/conda/feedstock_root/build_artifacts/notebook_1631733685426/work
numba @ file:///drone/src/build_artifacts/numba_1623568561288/work
numpy @ file:///drone/src/build_artifacts/numpy_1626650155929/work
nvtx @ file:///drone/src/build_artifacts/nvtx_1612462327208/work
oauthlib @ file:///home/conda/feedstock_root/build_artifacts/oauthlib_1622563202229/work
olefile @ file:///home/conda/feedstock_root/build_artifacts/olefile_1602866521163/work
packaging @ file:///home/conda/feedstock_root/build_artifacts/packaging_1625323647219/work
pamela==1.0.0
pandas==1.3.0
pandocfilters @ file:///home/conda/feedstock_root/build_artifacts/pandocfilters_1631603243851/work
parso @ file:///home/conda/feedstock_root/build_artifacts/parso_1617148930513/work
partd @ file:///home/conda/feedstock_root/build_artifacts/partd_1617910651905/work
pexpect==4.8.0
pickleshare @ file:///home/conda/feedstock_root/build_artifacts/pickleshare_1602535881525/work
Pillow @ file:///home/conda/feedstock_root/build_artifacts/pillow_1630696775644/work
prometheus-client @ file:///home/conda/feedstock_root/build_artifacts/prometheus_client_1622586138406/work
prompt-toolkit @ file:///home/conda/feedstock_root/build_artifacts/prompt-toolkit_1629903925368/work
protobuf==3.16.0
psutil @ file:///drone/src/build_artifacts/psutil_1610136644938/work
ptyprocess @ file:///home/conda/feedstock_root/build_artifacts/ptyprocess_1609419310487/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl
pyarrow==5.0.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycosat @ file:///drone/src/build_artifacts/pycosat_1610104060453/work
pycparser @ file:///home/conda/feedstock_root/build_artifacts/pycparser_1593275161868/work
pycurl==7.44.1
Pygments @ file:///home/conda/feedstock_root/build_artifacts/pygments_1629119114968/work
PyJWT @ file:///home/conda/feedstock_root/build_artifacts/pyjwt_1634405536383/work
pynvml @ file:///home/conda/feedstock_root/build_artifacts/pynvml_1622742739951/work
pyOpenSSL @ file:///home/conda/feedstock_root/build_artifacts/pyopenssl_1633192417276/work
pyparsing==2.4.7
pyrsistent @ file:///home/conda/feedstock_root/build_artifacts/pyrsistent_1610146904898/work
PySocks @ file:///home/conda/feedstock_root/build_artifacts/pysocks_1610291513426/work
python-dateutil @ file:///home/conda/feedstock_root/build_artifacts/python-dateutil_1626286286081/work
python-json-logger @ file:///home/conda/feedstock_root/build_artifacts/python-json-logger_1602545356084/work
pytz @ file:///home/conda/feedstock_root/build_artifacts/pytz_1633452062248/work
PyYAML==6.0
pyzmq @ file:///home/conda/feedstock_root/build_artifacts/pyzmq_1631793438762/work
requests @ file:///home/conda/feedstock_root/build_artifacts/requests_1626393743643/work
requests-oauthlib==1.3.0
requests-unixsocket==0.2.0
rmm==21.10.1
rsa==4.7.2
ruamel-yaml-conda @ file:///drone/src/build_artifacts/ruamel_yaml_1611943934841/work
ruamel.yaml @ file:///drone/src/build_artifacts/ruamel.yaml_1630317445803/work
ruamel.yaml.clib @ file:///drone/src/build_artifacts/ruamel.yaml.clib_1610161221437/work
Send2Trash @ file:///home/conda/feedstock_root/build_artifacts/send2trash_1628511208346/work
simpervisor @ file:///home/conda/feedstock_root/build_artifacts/simpervisor_1609865618711/work
six @ file:///home/conda/feedstock_root/build_artifacts/six_1620240208055/work
sniffio @ file:///drone/src/build_artifacts/sniffio_1611360841510/work
sortedcontainers @ file:///home/conda/feedstock_root/build_artifacts/sortedcontainers_1621217038088/work
SQLAlchemy @ file:///home/conda/feedstock_root/build_artifacts/sqlalchemy_1632383569186/work
streamz @ file:///home/conda/feedstock_root/build_artifacts/streamz_1633363454258/work
tblib @ file:///home/conda/feedstock_root/build_artifacts/tblib_1616261298899/work
terminado @ file:///home/conda/feedstock_root/build_artifacts/terminado_1631128279723/work
testpath @ file:///home/conda/feedstock_root/build_artifacts/testpath_1621261527237/work
toolz @ file:///home/conda/feedstock_root/build_artifacts/toolz_1600973991856/work
tornado @ file:///home/conda/feedstock_root/build_artifacts/tornado_1610094765207/work
tqdm @ file:///home/conda/feedstock_root/build_artifacts/tqdm_1632160078689/work
traitlets @ file:///home/conda/feedstock_root/build_artifacts/traitlets_1630423529112/work
typing-extensions @ file:///home/conda/feedstock_root/build_artifacts/typing_extensions_1632313171031/work
urllib3 @ file:///home/conda/feedstock_root/build_artifacts/urllib3_1632350318291/work
wcwidth @ file:///home/conda/feedstock_root/build_artifacts/wcwidth_1600965781394/work
webencodings==0.5.1
websocket-client @ file:///drone/src/build_artifacts/websocket-client_1607981704757/work
widgetsnbextension @ file:///drone/src/build_artifacts/widgetsnbextension_1605475527516/work
yarl @ file:///drone/src/build_artifacts/yarl_1633683920101/work
zict==2.0.0
zipp @ file:///home/conda/feedstock_root/build_artifacts/zipp_1633302054558/work
  • Operating System:
    Ubuntu
@jakirkham
Copy link
Member

Yeah unfortunately this doesn't work as well as we might like today.

Basically what one would need to do is write this registration code and put it in some module that one imports on all Workers (running something like client.run(importlib.import_module, "mycustomobjser.py")). A better workaround would be to do this registration in a preload script, which would be more robust to Workers scaling up/down.

Dask has a bunch of serializers that get registered in this __init__.py, which gets imported by all workers. Though atm lacks a good way to add to them.

Issue ( #3831 ) discusses how we could better handle this registration of 3rd party serializes (possibly with entrypoints).

@holdenk
Copy link
Author

holdenk commented Dec 3, 2021

Legit. The docs could maybe use a cleanup then since it's not very clear from the docs that you need to add this registration. The current easy workaround is to just extend the class and add custom pickling getstate/setstate but it's a bit of a pain when your wrapping someone elses library.

@holdenk holdenk closed this as completed Dec 3, 2021
@jakirkham
Copy link
Member

Yeah that's a fair criticism. If you have thoughts on how this should be worded/where it should go, please feel free to share.

Yep I think improvements on the pickling side of the story have made this less pressing. For example we handle out-of-band pickling, which avoids additional copies that older pickle protocols incurred. This combined with the fact that NumPy supports this protocol ( numpy/numpy#12011 ) really gives one a lot of mileage with pickle already. For example Pandas use of NumPy makes things just work ( pandas-dev/pandas#34244 ).

Maybe you already know this and it doesn't work for some reason, but one could register a pickler for a 3rd party object. If that doesn't work though, would be curious to know more 🙂

@holdenk
Copy link
Author

holdenk commented Dec 3, 2021

Oh interesting, although I suspect that it also wouldn't get picked up and sent through to the workers unless the registering manipulates the running class.

@jakirkham
Copy link
Member

Hmm...yeah we don't have a dispatch table shared between Workers. Though this is an interesting idea in its own right

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants