-
-
Notifications
You must be signed in to change notification settings - Fork 329
Proposal: Add Array.blocks using new BlockIndexer (Prototype Code Included) #991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @tasansal. I do have the feeling that this issue comes up frequently. (I have a vague sense that both @GenevieveBuckley and @thewtex have dealt with related issues.) Are the code snippets above the entirety of the prototype? If so, would it make sense to get them into a PR with tests? (And, would those tests require dask? If so, it could potentially be added as an optional dev requirement as with |
@joshmoore It passes my simple manual tests, but unit tests need to be written for sure. Especially for edge cases like negative indices etc. If there are existing mechanisms to calculate chunk boundaries, that can also be reused, but I didn't see one. To be more specific, this part: start, stop, _ = dim_sel.indices(dim_len // dim_chunk_len)
block_start = start * dim_chunk_len
block_stop = stop * dim_chunk_len
block_slice = slice(block_start, block_stop) |
👍 Sounds good, @tasansal. |
@joshmoore, do we need more input from others or should I go ahead and start? |
For my part, I'd say go for it. 👍 |
Problem description
I have a use case downstream where we want to access "blocks" of chunks. I have implemented a prototype that functions like
dask.array.Array.blocks
.It uses the existing
zarr.indexing
machinery and matches the API of existing indexers.This allows us to pull a "block" of chunks from data using slicing logic.
For instance, if we have an array with shape
(10, 20, 30)
with chunk sizes(5, 4, 10)
array.blocks[0]
maps toarray[:5]
array.blocks[..., 0]
maps toarray[:, :, :10]
array.blocks[1, 1, 1]
maps toarray[5:10, 4:8, 10:20]
array.blocks[:, 1:4, :]
maps toarray[:, 4:16, :]
Why not just use
dask
array and method?dask
as a hard requirement in our library.dask.array
+ scheduling overhead.dask
for just this functionality to keep our library lightweight.If there are no objections I can start adding this in a new PR as soon as possible.
If we don't want this in
zarr
, I can keep this in our library as an extension to zarr.Once the code below are implemented, all of the following evaluates to
True
as expected.Tests and docs must be written, of course, I don't have that yet.
Implementation Details
Methods and attributes that would go into
zarr.core.Array
BlockIndexer
class (compare tozarr.indexing.OrthogonalIndexer
)Blocks
property class for slicing (compare tozarr.indexing.VIndex
orzarr.indexing.OIndex
The text was updated successfully, but these errors were encountered: