-
-
Notifications
You must be signed in to change notification settings - Fork 330
Add tree #140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tree #140
Conversation
zarr/hierarchy.py
Outdated
|
||
def gen_tree(g): | ||
r = OrderedDict() | ||
d = r.setdefault(self.name.strip("/"), OrderedDict()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a question of what to do with the root node, /
. Right now I have opted to leave it blank, but could see an argument for making it /
. Tweaking how the name is handled here should allow that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to have /
.
Thanks for this, big thumbs up! I'm happy to merge this in then do some work on it. For my own reference, three things worth considering: (1) show root node as "/" as suggested; (2) expose a style argument so output can be customised; (3) consider moving get_tree out as first class method. |
Oh, and one other thing: (4) add an option to include some indicator showing whether a node is a group or a dataset. |
Glad you like it.
SGTM.
Was thinking about giving
How do you mean? Like a function instead?
Good call. Does the Edit: Clarified response to 4. |
On Fri, Mar 3, 2017 at 2:35 PM, jakirkham ***@***.***> wrote:
Glad you like it.
(1) show root node as "/" as suggested;
SGTM.
(2) expose a style argument so output can be customised
Was thinking about giving kwargs that just get forwarded to that
function. Maybe could allow user to pass the argument to draw. Whatever
works really. 👍
(3) consider moving get_tree out as first class method.
How do you mean? Like a function instead?
I was just thinking that get_tree could be a method on the Group class,
rather than being an inner function, which would make the code a bit more
straightforward.
(4) add an option to include some indicator showing whether a node is a
group or a dataset.
Good call. Does the tree command have some way of showing this? If so,
maybe we could mimic it. If not, putting say [] around the name is easy.
Yes some simple convention for adding something to the label would be good.
…--
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Email: [email protected]
Web: http://purl.org/net/aliman
Twitter: https://twitter.com/alimanfoo
Tel: +44 (0)1865 287721
|
Good point. Actually we could even skip the stylizing with asciitree altogether and leave that for the examples. Alternatively if we keep it, we could have a function or two for particular styles and let users either take those or make their own. |
I think it would be nice to keep something that allows printing of a
default ascii tree with minimal typing. I imagine this is something I'll do
a lot while exploring datasets, so I liked the short and sweet 'tree()'. I
almost wondered if calling 'tree()' should actually do the printing as well
as generating the ascii text, so I could just write 'g.tree()' rather than
'print(g.tree())' to see the output.
…On Tue, Mar 7, 2017 at 4:14 PM, jakirkham ***@***.***> wrote:
I was just thinking that get_tree could be a method on the Group class,
rather than being an inner function, which would make the code a bit more
straightforward.
Good point. Actually we could even skip the stylizing with asciitree
altogether and leave that for the examples. Alternatively if we keep it, we
could have a function or two for particular styles and let users either
take those or make their own.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<https://github.com/alimanfoo/zarr/pull/140#issuecomment-284768995>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAq8QrCZ1zoUW5hHpt2GX3dKzyLyy2x4ks5rjYJggaJpZM4MRuec>
.
--
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Email: [email protected]
Web: http://purl.org/net/aliman
Twitter: https://twitter.com/alimanfoo
Tel: +44 (0)1865 287721
|
Just to clarify my suggestion, what I think we should do is have |
bar | ||
├── baz | ||
└── quux | ||
└── baz[...] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opted to try displaying the datasets like this. Please let me know what you think.
Have updated to address points 1 and 4. As point 1 is straightforward, there is likely not much to say about it. Point 4 I opted to add |
Hi @jakirkham, thinking about this some more, how about the following API... The The The This is enough to allow users to roll their own display function via The The
This API would mean that all a user has to do is type What do you think? |
Unfortunately I have to run to a meeting soon. So don't have lots of time to think/respond to this ATM, but plan on giving it more thought. Can give some gut reactions though. Having an object with repr methods sounds like a great idea. Any thoughts on having this same object also expose a dict-like interface as well? Doing this would allow us to pass the object to |
Cool. Re exposing a dict-like interface, I would say whatever is simplest,
happy if you want to try something out and see how it goes.
…On Wednesday, April 26, 2017, jakirkham ***@***.***> wrote:
Unfortunately I have to run to a meeting soon. So don't have lots of time
to think/respond to this ATM, but to plan on giving it more thought. Can
give some gut reactions though.
Having an object with repr methods sounds like a great idea.
Any thoughts on having this same object also expose a dict-like interface
as well? Doing this would allow us to pass the object to asciitree
directly. Also this saves us having two similar methods in the Group
interface. Can add some methods to it for changing the Group/Dataset
representation as well.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<https://github.com/alimanfoo/zarr/pull/140#issuecomment-297474103>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAq8QsyeZ_e4C5nfR4lwvI8fqU6my6zBks5rz3bCgaJpZM4MRuec>
.
--
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Email: [email protected]
Web: http://purl.org/net/aliman
Twitter: https://twitter.com/alimanfoo
Tel: +44 (0)1865 287721
|
Do we want to try and clean this up for the next release or do we want to table it for after? I'm ok with either. |
On the HTML front, there are not any libraries that I see that meet our constraints (e.g. Python, not GPL'd, etc.), but maybe one could use a strategy like this one to roll their own. |
I think it would be great to get this into 2.2 if possible. The HTML example you link to looks great. I don't know enough about jupyter notebook to know how easy it is to have objects with html repr with javascript dependencies. FWIW I'd be happy with an ascii implementation for now if adding html is not trivial. |
Ok, so I tried to update this roughly based on what we talked about before. We now have an object with a |
I tried the HTML formatting in another branch, but I'm running into issues trying to get what I want from CSS. Not sure if I'm just missing something or if the notebook is overriding some CSS values in such a way that it is messing up my rendering. |
Also the code here has seen a lot of changes in the past 24-hrs. Namely we override what |
Now with an HTML representation and tests! 🎉 Given some code like that below, we can see the regular output and a screenshot of the HTML output. Basically we use an HTML list and apply a specialized CSS to get a tree looking layout. Code: import zarr
g1 = zarr.group()
g3 = g1.create_group('bar')
g3.create_group('baz')
g5 = g3.create_group('quux')
g5.create_dataset('baz', shape=100, chunks=10)
g7 = g3.create_group('zoo') Vanilla
...then Edit: Refreshed based on the latest commit ( alimanfoo@017de3f ). |
Added a notebook demonstrating how this works. |
Should add that GitHub blocks CSS when rendering things like Jupyter Notebooks'. So one needs to either look locally or take a look via the nbviewer webservice. Here's the link to new notebook generated with the nbviewer webservice based off of commit ( alimanfoo@017de3f ). |
Made some improvements and simplifications to the CSS. Have updated the comments above to reflect the new state. To my eyes this looks pretty good, but would welcome feedback from others. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great, thank you.
zarr/hierarchy.py
Outdated
@@ -523,6 +523,34 @@ def visititems(self, func): | |||
base_len = len(self.name) | |||
return self.visitvalues(lambda o: func(o.name[base_len:].lstrip("/"), o)) | |||
|
|||
def tree(self): | |||
"""Provide a ``print`-able display of the hierarchy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra backtick here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Yep, will fix.
>>> g4 = g3.create_group('baz') | ||
>>> g5 = g3.create_group('quux') | ||
>>> d1 = g5.create_dataset('baz', shape=100, chunks=10) | ||
>>> print(g1.tree()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest just:
>>> g1.tree()
I.e., no need to call print().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a strong thought by any means, but print
does force str
-> repr
. Other circumstances (e.g. in a Jupyter Notebook) not using print
would have a different effect. Though again not a strong thought happy to change if it is preferred without print
.
zarr/hierarchy.py
Outdated
| +-- quux | ||
| +-- baz[...] | ||
+-- foo | ||
>>> print(g3.tree()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same point. Again am flexible on this.
@@ -523,6 +523,34 @@ def visititems(self, func): | |||
base_len = len(self.name) | |||
return self.visitvalues(lambda o: func(o.name[base_len:].lstrip("/"), o)) | |||
|
|||
def tree(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a depth
argument here which defaults to None
(i.e., render the whole subtree) but can be an integer which would limit the tree depth - useful where you only want to view a few levels down, but whole tree might be unwieldy.
zarr/util.py
Outdated
|
||
result += ( | ||
"""</li>\n""".format( | ||
indent, traverser.get_text(group) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No format placeholders in this string, did you mean to call .format() on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! That's likely a remnant from code changes. Will clear that out.
zarr/util.py
Outdated
for l in custom_html_sublist(c, indent).splitlines(): | ||
result += "{0}{0}{1}\n".format(indent, l) | ||
if children: | ||
result += "{0}{0}</ul>\n{0}".format(indent) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section maybe more concise as:
children = traverser.get_children(group)
if children:
result += """\n{0}{0}<ul>\n""".format(indent)
for c in children:
for l in custom_html_sublist(c, indent).splitlines():
result += "{0}{0}{1}\n".format(indent, l)
result += "{0}{0}</ul>\n{0}".format(indent)
|
||
def get_text(self, node): | ||
name = node.name.split("/")[-1] or "/" | ||
name += "[...]" if hasattr(node, "dtype") else "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that '[...]' is a concise way to indicate an array. I did wonder if instead people would prefer to see array shape and dtype, which serves to indicate array and also provides some more diagnostics. I'm in two minds. Any thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong feelings either way. Went with this initially as it was easy and concise, but am happy to change it if you would prefer something else. We could also let this function be overridden by a user.
zarr/util.py
Outdated
result += "</style>\n\n" | ||
|
||
# Insert the HTML list | ||
result += """<div class="zarrTree">\n""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, could you make the CSS class "zarr-tree" for consistency of naming with "zarr-info".
zarr/util.py
Outdated
return result | ||
|
||
|
||
class TreeHierarchy(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very small point, but could we call this "TreeViewer" or something like that, just to be clear this is a class for viewing/visualising the tree?
zarr/util.py
Outdated
|
||
class TreeHierarchy(object): | ||
|
||
def __init__(self, group): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to earlier comment, would it be possible to limit this to a given depth, to help with visualising very large trees?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Might have to revisit implementing it later, but am open to the idea.
Here's what I see via nbviewer as at 017de3f: Would it be possible to tweak CSS so the list aligns to the left of the output area? Also, there's quite a bit of vertical space between parent and sublist. Could we make this more compact? |
Provides a quick demo showing how to use `tree` to get text-based and HTML-based representations of a Zarr `Group` object.
Works around some Python 2/3 pains by implementing `__bytes__` and `__unicode__`. This allows a user to get a nicer Unicode text-based representation of `tree` on Python 2 even though `__repr__` will not permit it. Also allows Python 3 to get the nicer Unicode representation for free. Though users can still force the `bytes` representation should they prefer it for some reason.
As all Python `object`s will fallback to `__repr__` if `__str__` is undefined on both Python 2 and Python 3, there is no need for us to implement `__str__` in basically the same way as `__repr__`. So this just drops `__str__` so that `__repr__` can fill in as needed.
After a bit of thrashing I have an example working with jstree providing styling and interactive expand/collapse of groups. Here's a screenshot from jupyter notebook: Here's the notebook with example code. This also works in nbviewer. What do you think @jakirkham, would you be happy to update your PR to use jstree? It seems worth it to me. As a side note I think that nbviewer has a problem in that it loads jquery but not via require, and so I had to make sure it was defined because jstree declares a dependency on jquery. Probably worth raising this with the nbviewer folks. |
xref jupyter/nbviewer#736 |
Hi @jakirkham, I'm getting near done with other work for the 2.2 release, I'd like to include this too. I think we should go for the jstree implementation for |
Sorry. The past few weeks have been a bit busy and am now getting over a cold. So might not be super helpful. Have tried to resolve the merge conflicts here at least. Using |
Thank you, I'll merge now and add in the jstree implementation. I think we can leave the depth option for now. Hope you feel better! |
SGTM. Thanks. |
Fixes https://github.com/alimanfoo/zarr/issues/82
Adds a
tree
method toGroup
that constructs a pretty-printed hierarchy listing of everything below that group. Similar in nature to the Unixtree
command. Borrowed the styling from @alimanfoo's example.