-
-
Notifications
You must be signed in to change notification settings - Fork 329
Passing dask array to zarr leads to a TypeError #962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @michael-sutherland. Can you include which version of dask? |
dask: 2021.12.0 |
numpy docs for astype: https://numpy.org/doc/stable/reference/generated/numpy.ndarray.astype.html |
Thanks for the info, @michael-sutherland. Looking at the stacktrace again, any objections to migrating this to github.com/ome/ome-zarr-py ? cc: @sbesson @will-moore |
Oops! Sorry if I misidentified which library the problem was in. Would you like me to migrate the issue or would you prefer to do it? |
No worries. Hmmm.... looks like I can't transfer automatically between orgs anyway. See ome/ome-zarr-py#169 |
As mentioned on #962, the example can be simplified to remove
|
Updated @will-moore's example to define |
Ok. I assume this is related to @madsbk's #934 in the sense that typically dask arrays are assumed to wrap zarr arrays rather than vice versa. cc: @jakirkham |
Hi @michael-sutherland, is there a reason you don't want to use the dask.array.to_zarr() function? I imagine there might be some cases where it makes sense, and this example could have lost that context when it was simplified to make debugging easier. |
I poked around with this a little, and there seem to be two places where things go wrong: one in dask, and one in zarr.
If it were changed to this, I think we'd fix the dask part of the problem: extra = set(kwargs) - {"casting", "copy", "order"}
> /Users/genevieb/mambaforge/envs/dask-dev/lib/python3.9/site-packages/zarr/core.py(2189)_encode_chunk()
2187
2188 # check object encoding
-> 2189 if ensure_ndarray(chunk).dtype == object:
2190 raise RuntimeError('cannot write object array without object codec') Zarr could potentially work around this second issue by trying to Josh says above that typically it is expected dask will wrap zarr, and not the other way around. Making changes to (2) above would be a bit in conflict with that expectation. |
On the dask side of things, I've opened an issue and PR:
As discussed above, this won't completely fix the problem here. |
Thanks @GenevieveBuckley , hopefully, this will solve the issue on my side. |
Yes, let us know! |
cc: @madsbk just in case his recent work will or could handle this. |
I agree, calling |
I'm +1 on redirecting everyone back to use the |
…9317) Allows the `order` kwarg to be passed in to the dask `astype` method without triggering an error. Our friends over at zarr have found a small bug in dask. While the numpy `astype` method allows the user to use an `order` keyword argument ([docs here](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.astype.html)), the corresponding dask `astype` method produces an error Why we should do this: 1. It's not always as obvious as telling the user to edit the line in their code that uses `astype`, since it's often used very indirectly. Here's one example: zarr-developers/zarr-python#962 (comment) 2. It reflects better on dask not to have odd errors popping up, even if this PR won't completely solve the issue being discussed over at zarr zarr-developers/zarr-python#962
@michael-sutherland, can you let us know if you've found a solution that works for you? |
I ended up moving to a custom solution using an H5 file with a similar structure. My application involved displaying a 2D mosiac that was being generated from a tool (in this case an X-ray microscope) in "real time". I don't know the area ahead of time and I needed to be able to expand the arrays and update the pyramid as I went. It is all working now, although I'd prefer a more standard format if possible. Also, expanding "to the left and up" is painful and involves copying, which I might be able to work around in a zarr file structure. If I can get the time, I'd like to try porting it to zarr. Sorry if you were only doing this for me, I think supporting dask and other numpy-like arrays is important, although I think doing a custom call to "compute()" isn't the answer since that is so dask specific. Maybe wrapping in a call to np.array(), which will pull in data from h5py or dask or whatever lazy loading system someone might be using would be better? It also won't make a copy if it is already a numpy array (as far as I know). |
No worries @michael-sutherland, it is always good to discuss design decisions :) I am not sure that converting Dask Arrays to a local |
cc: @mrocklin just to clarify that it wasn't just lack of Dask-:heart: but the deeper question of to |
Typically we handle this by using protocols like |
Chatting live with @joshmoore . Some thoughts: Maybe things work now?@GenevieveBuckley did some work, maybe stuff works. We should test this (I don't know this, Josh is suggesting it, don't blame if he's wrong 🙂 ) Maybe call np.asarray on each sliced chunk?If it doesn't work, make sure that the sliced chunk you're about to write is concrete and in memory (probably a numpy but maybe something else, like a cupy array or some other buffer thing). In dumb code, this probably looks like this: for slice in ...:
chunk = array[slice]
chunk = np.asarray(chunk)
write(chunk, path) Maybe use Dask Array?This is a bit circular, but Zarr could use dask array def give_array_to_zarr(self, arr, path):
if arr.shape == self.chunks:
# do numpy thing
else:
# Dask
x = da.from_array(array)
x.to_zarr(...) We need to be a little careful to avoid an infinite zarr-dask-zarr loop, but I think the first if statement handles it. There's also some concern of this code being split between two repositories. I generously offer that you all own this code in the future 🙂 |
Minimal, reproducible code sample, a copy-pastable example if possible
Problem description
Get an error when trying to write a dask array to a zarr file. I took the proposed example from ome/ome-zarr-py#121 and wrapped the numpy array in a dask array. It appears that the dask array function astype doesn't support the parameter "order". The example code works when using a numpy array for me. Here's the resulting traceback:
Version and installation information
Please provide the following:
zarr.__version__
: 2.10.3numcodecs.__version__
: 0.9.1Also, if you think it might be relevant, please provide the output from
pip freeze
orconda env export
depending on which was used to install Zarr.The text was updated successfully, but these errors were encountered: