-
Notifications
You must be signed in to change notification settings - Fork 4
Increase version flexibility of Dask for initial deployment #37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
That's unfortunate : /
Agreed that short term this seems okay. Using just
Note duplicating We might try updating coiled-examples to run on binder for the short-to-medium term. Users would need to do |
Another option would be to make a version of cloudpickle that is version-flexible and then have I tried this for about half an hour today and didn't manage to get it work, but it's an option. @llllllllll do you have any thoughts on how hard it would be to make a cloudpickle fork that was generic/dumb enough to support moving most things between Python versions? |
I think there are a few challenges with cross version compatible cloudpickle. The first is that cloudpickle sends code objects, including the bytecode itself. The actual format of Python bytecode changes across versions, and instructions are added and removed. This might be solvable with some sort of "upgrade/downgrade" converters that can rewrite bytecode from one version to the next. There is code in codetransformer that handles multiple versions of Python which might serve as a good starting point for these converters. Another problem is that the standard library changes in each minor version. For example, if sys.minor_version < 8:
def func(data):
# use fallback code
else:
def func(data):
# use code that depends on some 3.8 specific feature
execute(func, data) My initial strategy for trying this would be: |
Hrm, that does sound challenging
Agreed. I'm not actually that worried about data movement. It's more often an issue with dynamiclly generated code. From what you say above it sounds like this is probably a no-go anyway. |
The reason to worry about data when sending code is that you need to capture the closure and the globals, which may include some data structures. I assume it is rare to have a large dataframe or ndarray in the globals or closure, so that is likely not a big deal. Another idea for sending code would be to use something like uncompyle6 (possible license issue) to convert the bytecode to source code and then send that over to be recompiled on the server. This seems to work by decompiling the bytecode instead of reading the source, so it will work in a repl or notebook. |
Hello, I'm going through the open issues on this repo and closing some of them. I'm assuming that the conversation moved somewhere else, so I'm closing this issue. |
OK, so Dask itself is now relatively robust to different versions of Python and compression.
However, as @jrbourbeau predicted, cloudpickle is not. This stops users from being able to do things like send along lambdas
This turns out to be somewhat debilitating, For example, our basic example fails because top-level Pandas functions themselves are not reliably pickle-serializable (see pandas-dev/pandas#35611). I suspect that this happens in other cases in our examples as well.
So what to do?
Short term we could duplicate every software environment by Python version. Short term, if we're only supporting
coiled/default
for quickstart purposes then this probably isn't horrible. We would set the default value for theconfiguration=
keyword tocoiled/default
orcoiled/default-37
depending on the Python version at import time, and then change the quickstart tocoiled.Cluster()
and use that default.We could stick with
coiled install
and be explicit about requiring users to install things. I'm somewhat against this as a startup process. It's unpleasant.Long term I'd like to see us increase hygiene in Dask and upstream about using pickle-serializable functions. This is a good driver for that. If we were very ambitious we could try to modify cloudpickle to be Python-version agnostic, but putting on my cloudpickle-maintainer hat I'll probably vote against that.
Other thoughts? @jrbourbeau ?
The text was updated successfully, but these errors were encountered: