Skip to content

Add visitor pattern methods #122

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 24, 2017
Merged

Add visitor pattern methods #122

merged 3 commits into from
Feb 24, 2017

Conversation

jakirkham
Copy link
Member

Fixes https://github.com/alimanfoo/zarr/issues/92

Provides visit and visititem that behave identically to their h5py counterparts. This should make it easier to traverse the full hierarchy of a Zarr group. Also should make it easier for users with h5py code to try out using Zarr with fewer changes.

@jakirkham
Copy link
Member Author

Only has doctests ATM. Could use some unit tests as well. That said, please let me know what you think.

@jakirkham jakirkham mentioned this pull request Feb 22, 2017
Copy link
Member

@alimanfoo alimanfoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Only very minor suggestions.


Examples
--------
>>> from __future__ import print_function, unicode_literals
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is needed in the example. For Zarr docstrings are standardised to PY3(6).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I figured it couldn't hurt.

>>> g2 = g1.create_group('foo')
>>> g3 = g1.create_group('bar')
>>> d1 = g1.create_dataset('baz', shape=100, chunks=10, fill_value=0)
>>> d2 = g1.create_dataset('quux', shape=200, chunks=20, fill_value=0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use 'foo/quux' as the dataset name or something like that, to show what happens when you have a hierarchy?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do that. Was thinking about doing something similar. Though the tests explore this as well.

May also drop datasets (or at least disable compression) as the compression levels seem to vary, which would make the doctest annoying to keep.

>>> g2 = g1.create_group('foo')
>>> g3 = g1.create_group('bar')
>>> d1 = g1.create_dataset('baz', shape=100, chunks=10, fill_value=0)
>>> d2 = g1.create_dataset('quux', shape=200, chunks=20, fill_value=0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again suggest to have an example showing what happens with a hierarchy, e.g., 'foo/quux'.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -414,6 +416,78 @@ def arrays(self):
chunk_store=self._chunk_store,
synchronizer=self._synchronizer)

def visit(self, func):
"""Run callable on each object name.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to replace callable with `func`. Also maybe worth adding a note about how the return value is handled, e.g., from h5py source: "Returning None continues iteration; returning anything else aborts iteration and returns that value."

return value

def visititems(self, func):
"""Run callable on each object name and object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for visit(), suggest to replace callable with `func` and adding a note about how the return value is handled.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


Examples
--------
>>> from __future__ import print_function, unicode_literals
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed I don't think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

def _visit(obj, func=func):
keys = sorted(getattr(obj, "keys", lambda : [])())
for each_key in keys:
for value in _visit(obj[each_key], func=func):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think there's a call to func(each_key) missing here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was playing with stuff and pushed a bad commit. Unfortunately have been AFK until now. Should be fixed.


def _visit(obj, func=func):
keys = sorted(getattr(obj, "keys", lambda : [])())
for each_key in keys:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing call to func(each_key, obj[each_key])?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

@jakirkham jakirkham changed the title WIP: Add visitor pattern methods Add visitor pattern methods Feb 23, 2017
@jakirkham
Copy link
Member Author

Ok, addressed comments and fixed bugs. This is ok on my end.

@jakirkham jakirkham mentioned this pull request Feb 23, 2017
13 tasks
@alimanfoo
Copy link
Member

Something weird happening with CI, maybe related to install of appdirs? Try upgrading to appdirs==1.4.1 in requirements_dev.txt? Or maybe remove appdirs altogether from requirements_dev.txt as it's probably not important to pin it, and let latest get brought in by whichever package requires it?

@jakirkham
Copy link
Member Author

Yeah, this cropped up with the release of 1.4.1. Have raised upstream to see if they have any idea.

xref: ActiveState/appdirs#89

@jakirkham
Copy link
Member Author

Put together PR ( https://github.com/alimanfoo/zarr/pull/124 ) to try and fix the CI issues.

@jakirkham jakirkham mentioned this pull request Feb 23, 2017
Copy link
Member

@alimanfoo alimanfoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of small things, otherwise looks good.


Examples
--------
>>> from __future__ import print_function, unicode_literals
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you remove this, just for consistency with other docstrings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done.


Examples
--------
>>> from __future__ import print_function, unicode_literals
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also done.


"""

def _visit(obj):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now a duplicate of _visit above. Suggest to factor out into a single private function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, was actually thinking of refactoring it differently though. I think we could have a visitvalues function that both visit and visititems use. Have a WIP branch for this, but it isn't quite done yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have pushed a refactoring using visitvalues, which seems to work nicely.

@jakirkham
Copy link
Member Author

Also made changes so that all of these will show up in the docs. Built the docs locally to make sure.

@alimanfoo
Copy link
Member

Thanks for the changes, looks good.

So I'm just reading the h5py docs, and apparently the name passed into the visiting function "will be the name of the object relative to the current group". As currently coded, this PR will pass the object path, which is the name relative to the root group. I guess if we are aiming for h5py compatibility where possible then we need to follow the same behaviour? Unless you have a reason to do it differently?

@jakirkham
Copy link
Member Author

Just to make sure we are on the same page, by current group, I believe they mean the group we called visit* on. At least that is what the h5py's visit* methods seem to do.

That's a good point. Thanks for raising it. This was definitely an oversight on my part. Not intentional.

Do you have any suggestions for how we should do this? If not, I'm thinking we could just strip out that first x characters that correspond to the parent group path or replace the first match.

@alimanfoo
Copy link
Member

No problem, yes that's how I read the docs, name is supposed to be relative to the group we called visit* on. I think it would be fine to do something like slice the .path property of the object being visited, using the length of the .path property on the group we called visit* on as the slice start coordinate.

Provides `visit` and `visititem` that behave identically to their h5py
counterparts. This should make it easier to traverse the full hierarchy
of a Zarr group. Also should make it easier for users with h5py code to
try out using Zarr with fewer changes.
This method seems to be at the core of `visit` and `visititems`. So it
makes sense to refactor it out. Though it also provides usable
functionality of its own. So it makes sense to expose it as part of the
API too.
@jakirkham
Copy link
Member Author

Alright, I have now updated the visit and visititems functions to provide relative paths. Added an example of this to the documentation. Also included a bunch of related unit tests. Hopefully that addresses it.

@alimanfoo alimanfoo merged commit f8da93a into zarr-developers:master Feb 24, 2017
@alimanfoo
Copy link
Member

Great, thank you!

@alimanfoo alimanfoo modified the milestone: v2.2 Feb 24, 2017
@jakirkham
Copy link
Member Author

Thanks for the good review.

@jakirkham jakirkham deleted the add_visitor_patterns branch February 24, 2017 14:27
@alimanfoo alimanfoo mentioned this pull request Oct 24, 2017
4 tasks
@alimanfoo alimanfoo added enhancement New features or improvements release notes done Automatically applied to PRs which have release notes. labels Nov 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New features or improvements release notes done Automatically applied to PRs which have release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants