-
Notifications
You must be signed in to change notification settings - Fork 97
direct-to-GPU decoding? #316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Exactly! I am not surprise that these exist, but numcodecs doesn't know about them, and there's no way to say "lz4" codec, but "gpu" version. We should think about this. |
We are thinking about this ( zarr-developers/community#19 (comment) ) 😀 We should probably leverage the entrypoint hooks that you added ( #300 ) in KvikIO ( rapidsai/kvikio#66 ) |
Perhaps another set of entrypoints? Perhaps a global config or context that says "make GPU (cupy?) arrays" or something like that? |
Ok going to disentangle a few things here. First there is a big question, how do we use Zarr with other arrays (like CuPy)? There are changes in Numcodecs ( #305 ) and pending changes for Zarr ( zarr-developers/zarr-python#934 ) to address this use case. Second what does the end user workflow look like for users working with Zarr on GPUs? This would involve using Third how can we make this process smooth/seamless for the end user? Maybe some additional flags are needed in APIs and/or global config to select between different backends for opening files. This could also be an iterative process with users. |
I can see how making a new GPU implementation of zarr might be easier than putting options throughout the existing code. Just a thought. Or you might say that making zarr agnostic to the buffer and array implementations is essential, but I don't yet know how hard that is. |
We've been going with the latter approach. It's been ok so far. |
cc @akshaysubr |
Agreed! How we can achieve this though? Maybe a new |
There's another issue with GPU support and numcodecs that can potentially be solved with an interface change that I'd like to bring up. Typically, we would want to schedule work to the GPU asynchronously, but because of the single buffer in, single buffer out interface of numcodecs, if you get compressed data on the GPU, you still need to know how much space to allocate for the decompressed buffer before you can schedule the decompression work. The makes the API synchronous since you would need to pull the header of the compressed buffer onto the CPU, synchronize, use that data to figure out how much space to allocate for the decompressed buffer and then schedule the decompression work. And this is made worse for compression formats that do not have the decompressed size in the header, like LZ4. Numcodecs currently gets around this issue for LZ4 on the CPU by adding a 4 byte header with the decompressed size, essentially making the compressed buffer not compatible with another LZ4 implementation: https://github.com/zarr-developers/numcodecs/blob/main/numcodecs/lz4.pyx#L91. For the zarr use case though, the |
Note that then algorithms within blosc always have blosc's framing around them, so the sieze should be known, and other algorithms like zstd and snappy include the decompression size in the standard (not necessarily guaranteed, but usually).
This is only true if the compression is the only codec - but there could in principle be more in a chain. |
Wonder if we could just store these size(s) somehow when doing compression so they can be more easily retrieved prior to decompression. How might we encode this size metadata effectively? |
Since this is a per-chunk thing, you'd either have to have a separate source of information for each chunk (like kerchunk might), or store it as bytes at the start/end of the chunk, thus making it a non-standard implementation. The former possibility is of course something I have been thinking about because of kerchunk, and other per-chunk information is conceivable such as scale factor for scale/offset encoding. For sharded chunks, the shard index could play this role maybe. Or of course, insist on using codecs that do know the size, as I mentioned above. |
There are two issues with having the size be part of the chunk header (either like in the LZ4 codec or like in zstd):
The kerchunk approach is a nice solution to these issues since that second stream of relatively lightweight information can be read into the CPU and is mainly used for orchestration/control. Do you see a way this can be generalized? Maybe this information can be stored at encode time at the array level after a reduction for max size at each codec stage? This would allow decoders to allocate memory once and use that for multiple chunks. |
Yeah agree having the size information separate. Maybe in any metadata. Or perhaps in some small metadata adjacent binary file (cc @joshmoore as we discussed something like this a while back) |
What would it take to make codecs which can interact with cupy arrays (or TF tensors, etc) as the origin or output of zarr? I assume it would be a simple change in zarr, but would all the codecs need to be rewritten in CODA (or numba or ....?)?
The text was updated successfully, but these errors were encountered: