Skip to content

Expand efficient row iteration with possibility to start iterating anywhere #615

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
yetyetanotherusername opened this issue Sep 23, 2020 · 5 comments

Comments

@yetyetanotherusername
Copy link
Contributor

yetyetanotherusername commented Sep 23, 2020

As sort of a follow up to #398, would it be possible to expand this useful feature with a way to set the start point for the iterator? This way, when one wants to efficiently iterate over only a subset of the array with a generator, you don't have to walk over the unneeded parts of the array.

@yetyetanotherusername
Copy link
Contributor Author

yetyetanotherusername commented Sep 23, 2020

If I am not mistaken, this small modification to #398 should already do the trick:

    def islice(self, start=None, end=None):
        if len(self.shape) == 0:
            # Same error as numpy
            raise TypeError("iteration over a 0-d array")
        if start is None:
            start = 0
        if end is None:
            end = self.shape[0]
        # Avoid repeatedly decompressing chunks by iterating over the chunks
        # in the first dimension.
        chunk_size = self.chunks[0]
        chunk = None
        for j in range(start, end):
            if j % chunk_size == 0:
                chunk = self[j: j + chunk_size]
            # init chunk if we start offset of chunk borders
            elif chunk is None:
                chunk_start = j - j % chunk_size
                chunk_end = chunk_start + chunk_size
                chunk = self[chunk_start:chunk_end]
            yield chunk[j % chunk_size]

__iter__ could then just call this function without passing start and end.

With a little guidance, I would offer to make a pull request.

@yetyetanotherusername
Copy link
Contributor Author

I've gone ahead and implemented this in a fork together with a test. I can still make a pull request, but would like some confirmation that it has a chance of getting accepted before I proceed.

@jeromekelleher
Copy link
Member

It looks good to me @yetyetanotherusername, I'm happy to review the PR. Seems like a fairly uncontroversial addition, but I'm not one of the core devs (who may disagree).

@yetyetanotherusername
Copy link
Contributor Author

Thanks @jeromekelleher! I've opened a pull request. I still have some stuff to do regarding documentation and I'm a bit unhappy about the tests for __iter__ and islice partially duplicating each other. For starters I left the original test unchanged since I wanted to make extra sure I don't break anything.

I'm not intimately familiar with the rest of the zarr interface and its error handling, so if you feel that something should be caught that isn't right now, I'd be very happy about any input. For example, I don't support reverse iteration right now, so probably trying it should raise a NotImplementedError.

@yetyetanotherusername
Copy link
Contributor Author

Closed by #621

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants