Skip to content

Add tree #140

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Nov 14, 2017
Merged

Add tree #140

merged 22 commits into from
Nov 14, 2017

Conversation

jakirkham
Copy link
Member

Fixes https://github.com/alimanfoo/zarr/issues/82

Adds a tree method to Group that constructs a pretty-printed hierarchy listing of everything below that group. Similar in nature to the Unix tree command. Borrowed the styling from @alimanfoo's example.

@jakirkham jakirkham mentioned this pull request Mar 3, 2017

def gen_tree(g):
r = OrderedDict()
d = r.setdefault(self.name.strip("/"), OrderedDict())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a question of what to do with the root node, /. Right now I have opted to leave it blank, but could see an argument for making it /. Tweaking how the name is handled here should allow that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to have /.

@alimanfoo
Copy link
Member

Thanks for this, big thumbs up!

I'm happy to merge this in then do some work on it. For my own reference, three things worth considering: (1) show root node as "/" as suggested; (2) expose a style argument so output can be customised; (3) consider moving get_tree out as first class method.

@alimanfoo
Copy link
Member

Oh, and one other thing: (4) add an option to include some indicator showing whether a node is a group or a dataset.

@jakirkham
Copy link
Member Author

jakirkham commented Mar 3, 2017

Glad you like it.

(1) show root node as "/" as suggested;

SGTM.

(2) expose a style argument so output can be customised

Was thinking about giving kwargs that just get forwarded to that function. Maybe could allow user to pass the argument to draw. Whatever works really. 👍

(3) consider moving get_tree out as first class method.

How do you mean? Like a function instead?

(4) add an option to include some indicator showing whether a node is a group or a dataset.

Good call. Does the tree command have some way of showing this (i.e. between directories and files)? If so, maybe we could mimic it. If not, putting say [] around the name is easy.

Edit: Clarified response to 4.

@alimanfoo
Copy link
Member

alimanfoo commented Mar 7, 2017 via email

@jakirkham
Copy link
Member Author

I was just thinking that get_tree could be a method on the Group class, rather than being an inner function, which would make the code a bit more straightforward.

Good point. Actually we could even skip the stylizing with asciitree altogether and leave that for the examples. Alternatively if we keep it, we could have a function or two for particular styles and let users either take those or make their own.

@alimanfoo
Copy link
Member

alimanfoo commented Mar 7, 2017 via email

@jakirkham
Copy link
Member Author

Just to clarify my suggestion, what I think we should do is have g.tree() (or possibly a different name) just return the dictionary representation not even the string. Then we can have some function like show_tree(g) call g.tree() and feed it to asciitree with this configuration to get the string. That way we don't need to muck with exposing asciitree parameters as users can roll their own with a few lines. The net result is this is less brittle for testing and more flexible for displaying without increasing the maintenance burden much.

bar
├── baz
└── quux
└── baz[...]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opted to try displaying the datasets like this. Please let me know what you think.

@jakirkham
Copy link
Member Author

jakirkham commented Mar 7, 2017

Have updated to address points 1 and 4. As point 1 is straightforward, there is likely not much to say about it. Point 4 I opted to add [...] to the end of a dataset. We could probably add a flag to control whether this is added or not.

@alimanfoo alimanfoo mentioned this pull request Apr 6, 2017
@alimanfoo
Copy link
Member

Hi @jakirkham, thinking about this some more, how about the following API...

The Group class gets a hierarchy() method. This method returns a nested dictionary data structure representing the hierarchy, of the kind that can be used as input to asciitree.

The hierarchy() method accepts a depth argument which is None by default but can be an integer specifying the maximum depth.

The hierarchy() method also accepts a label argument which is None by default but can be a function that returns a label given a group or dataset, to allow user customisation of how the text labels for each node in the hierarchy are constructed. If None the default label for groups is the name of the group (i.e., last path segment, e.g., 'foo') and the default label for arrays is the name plus dtype and shape (e.g., 'bar int64 (200, 10)').

This is enough to allow users to roll their own display function via asciitree. However, as a user I would very much like a convenience method which can allow me to visualise the hierarchy when working in an interactive terminal or jupyter notebook with minimal typing on the keyboard, and with some reasonable defaults regarding visual representation. The minimal typing is important, because I anticipate I will use this quite a lot. To satisfy this I propose the following.

The Group class also gets a tree() method. This method returns an instance of a new Tree class. The Tree class implements __repr__ which constructs a text representation of the tree via asciitree with some defaults matching the output of the command line tree utility. The Tree class could also implement _repr_html_() and/or _repr_svg_() to generate a representation for the jupyter notebook, although we could leave implementing this for a future release. I particularly like the idea that we could in future generate an HTML representation that had some simple dynamic behaviour where tree nodes can be expanded or collapsed, allowing exploration of the tree without having to view the entire expanded tree.

The Tree class would accept the hierarchy data in the constructor. So the implementation of the Group.tree() method would just be something like:

class Group(object):

    ...

    def hierarchy(self, depth=None, label=None):
        # build dictionary data structure
        ...
        return data

    def tree(self, depth=None, label=None):
        data = self.hierarchy(depth=depth, label=label)
        return Tree(data)


class Tree(object):

    def __init__(self, data):
        self.data = data

    def __repr__(self):
        # pass self.data to asciitree and return

    def _repr_html_(self):
        # for future work, use self.data to construct an HTML representation

This API would mean that all a user has to do is type mygrp.tree() at the terminal or last line of a notebook cell to view the tree.

What do you think?

@jakirkham
Copy link
Member Author

jakirkham commented Apr 26, 2017

Unfortunately I have to run to a meeting soon. So don't have lots of time to think/respond to this ATM, but plan on giving it more thought. Can give some gut reactions though.

Having an object with repr methods sounds like a great idea.

Any thoughts on having this same object also expose a dict-like interface as well? Doing this would allow us to pass the object to asciitree directly. Also this saves us having two similar methods in the Group interface. Can add some methods to it for changing the Group/Dataset representation as well.

@alimanfoo
Copy link
Member

alimanfoo commented Apr 26, 2017 via email

@jakirkham
Copy link
Member Author

Do we want to try and clean this up for the next release or do we want to table it for after? I'm ok with either.

@jakirkham
Copy link
Member Author

On the HTML front, there are not any libraries that I see that meet our constraints (e.g. Python, not GPL'd, etc.), but maybe one could use a strategy like this one to roll their own.

@alimanfoo
Copy link
Member

I think it would be great to get this into 2.2 if possible.

The HTML example you link to looks great. I don't know enough about jupyter notebook to know how easy it is to have objects with html repr with javascript dependencies. FWIW I'd be happy with an ascii implementation for now if adding html is not trivial.

@jakirkham
Copy link
Member Author

Ok, so I tried to update this roughly based on what we talked about before. We now have an object with a __repr__ method. The object takes arguments to configure how it may be viewed. One can also tweak the view after the fact using update_ascii_kwargs, which returns the object. This allows one to use the REPL to tweak the view on the fly as well. Names are rough and am open to suggestions for them. The tests need tweaking, but the general idea seems ok and works fine in IPython.

@jakirkham jakirkham changed the title Add tree WIP: Add tree Oct 25, 2017
@jakirkham jakirkham changed the title WIP: Add tree Add tree Oct 25, 2017
@jakirkham
Copy link
Member Author

I tried the HTML formatting in another branch, but I'm running into issues trying to get what I want from CSS. Not sure if I'm just missing something or if the notebook is overriding some CSS values in such a way that it is messing up my rendering.

@jakirkham
Copy link
Member Author

Also the code here has seen a lot of changes in the past 24-hrs. Namely we override what asciitree calls a Traversal to specify how to iterate through a Zarr Group instead of creating a dict from the Zarr Group. Simplifies the code a bit and should make it easier to customize the tree generated.

@jakirkham
Copy link
Member Author

jakirkham commented Oct 25, 2017

Now with an HTML representation and tests! 🎉

Given some code like that below, we can see the regular output and a screenshot of the HTML output. Basically we use an HTML list and apply a specialized CSS to get a tree looking layout.

Code:

import zarr

g1 = zarr.group()
g3 = g1.create_group('bar')
g3.create_group('baz')
g5 = g3.create_group('quux')
g5.create_dataset('baz', shape=100, chunks=10)
g7 = g3.create_group('zoo')

Vanilla __repr__:

/
 +-- bar
     +-- baz
     +-- quux
     |   +-- baz[...]
     +-- zoo

...then _html_repr_:

HTML repr

Edit: Refreshed based on the latest commit ( alimanfoo@017de3f ).

@jakirkham
Copy link
Member Author

Added a notebook demonstrating how this works.

@jakirkham
Copy link
Member Author

jakirkham commented Oct 25, 2017

Should add that GitHub blocks CSS when rendering things like Jupyter Notebooks'. So one needs to either look locally or take a look via the nbviewer webservice. Here's the link to new notebook generated with the nbviewer webservice based off of commit ( alimanfoo@017de3f ).

@jakirkham
Copy link
Member Author

Made some improvements and simplifications to the CSS. Have updated the comments above to reflect the new state. To my eyes this looks pretty good, but would welcome feedback from others.

Copy link
Member

@alimanfoo alimanfoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great, thank you.

@@ -523,6 +523,34 @@ def visititems(self, func):
base_len = len(self.name)
return self.visitvalues(lambda o: func(o.name[base_len:].lstrip("/"), o))

def tree(self):
"""Provide a ``print`-able display of the hierarchy.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra backtick here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Yep, will fix.

>>> g4 = g3.create_group('baz')
>>> g5 = g3.create_group('quux')
>>> d1 = g5.create_dataset('baz', shape=100, chunks=10)
>>> print(g1.tree())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest just:

>>> g1.tree()

I.e., no need to call print().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a strong thought by any means, but print does force str -> repr. Other circumstances (e.g. in a Jupyter Notebook) not using print would have a different effect. Though again not a strong thought happy to change if it is preferred without print.

| +-- quux
| +-- baz[...]
+-- foo
>>> print(g3.tree())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same point. Again am flexible on this.

@@ -523,6 +523,34 @@ def visititems(self, func):
base_len = len(self.name)
return self.visitvalues(lambda o: func(o.name[base_len:].lstrip("/"), o))

def tree(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a depth argument here which defaults to None (i.e., render the whole subtree) but can be an integer which would limit the tree depth - useful where you only want to view a few levels down, but whole tree might be unwieldy.

zarr/util.py Outdated

result += (
"""</li>\n""".format(
indent, traverser.get_text(group)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No format placeholders in this string, did you mean to call .format() on this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! That's likely a remnant from code changes. Will clear that out.

zarr/util.py Outdated
for l in custom_html_sublist(c, indent).splitlines():
result += "{0}{0}{1}\n".format(indent, l)
if children:
result += "{0}{0}</ul>\n{0}".format(indent)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section maybe more concise as:

children = traverser.get_children(group)
if children:
    result += """\n{0}{0}<ul>\n""".format(indent)
    for c in children:
        for l in custom_html_sublist(c, indent).splitlines():
            result += "{0}{0}{1}\n".format(indent, l)
    result += "{0}{0}</ul>\n{0}".format(indent)


def get_text(self, node):
name = node.name.split("/")[-1] or "/"
name += "[...]" if hasattr(node, "dtype") else ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that '[...]' is a concise way to indicate an array. I did wonder if instead people would prefer to see array shape and dtype, which serves to indicate array and also provides some more diagnostics. I'm in two minds. Any thoughts on this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong feelings either way. Went with this initially as it was easy and concise, but am happy to change it if you would prefer something else. We could also let this function be overridden by a user.

zarr/util.py Outdated
result += "</style>\n\n"

# Insert the HTML list
result += """<div class="zarrTree">\n"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, could you make the CSS class "zarr-tree" for consistency of naming with "zarr-info".

zarr/util.py Outdated
return result


class TreeHierarchy(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very small point, but could we call this "TreeViewer" or something like that, just to be clear this is a class for viewing/visualising the tree?

zarr/util.py Outdated

class TreeHierarchy(object):

def __init__(self, group):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to earlier comment, would it be possible to limit this to a given depth, to help with visualising very large trees?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Might have to revisit implementing it later, but am open to the idea.

@alimanfoo
Copy link
Member

Here's what I see via nbviewer as at 017de3f:

image

Would it be possible to tweak CSS so the list aligns to the left of the output area?

Also, there's quite a bit of vertical space between parent and sublist. Could we make this more compact?
E.g. something like:

image

@alimanfoo
Copy link
Member

Just to see what it would look like, here's a version reporting shape and dtype to indicate array nodes:

image

I think I like this better than using '[...]'?

Provides a quick demo showing how to use `tree` to get text-based and
HTML-based representations of a Zarr `Group` object.
Works around some Python 2/3 pains by implementing `__bytes__` and
`__unicode__`. This allows a user to get a nicer Unicode text-based
representation of `tree` on Python 2 even though `__repr__` will not
permit it. Also allows Python 3 to get the nicer Unicode representation
for free. Though users can still force the `bytes` representation should
they prefer it for some reason.
As all Python `object`s will fallback to `__repr__` if `__str__` is
undefined on both Python 2 and Python 3, there is no need for us to
implement `__str__` in basically the same way as `__repr__`. So this
just drops `__str__` so that `__repr__` can fill in as needed.
@alimanfoo
Copy link
Member

After a bit of thrashing I have an example working with jstree providing styling and interactive expand/collapse of groups. Here's a screenshot from jupyter notebook:

image

Here's the notebook with example code. This also works in nbviewer.

What do you think @jakirkham, would you be happy to update your PR to use jstree? It seems worth it to me.

As a side note I think that nbviewer has a problem in that it loads jquery but not via require, and so I had to make sure it was defined because jstree declares a dependency on jquery. Probably worth raising this with the nbviewer folks.

@alimanfoo
Copy link
Member

xref jupyter/nbviewer#736

@alimanfoo
Copy link
Member

Hi @jakirkham, I'm getting near done with other work for the 2.2 release, I'd like to include this too. I think we should go for the jstree implementation for _repr_html_(), I figure if we're going to do _repr_html_() we might as well provide the user with something over and above what you can get from __repr__(). If you're busy I'm happy to merge this PR as-is and go from there, but let me know what you'd prefer.

@jakirkham
Copy link
Member Author

Sorry. The past few weeks have been a bit busy and am now getting over a cold. So might not be super helpful. Have tried to resolve the merge conflicts here at least.

Using jstree sounds nice. Saw it before, but was trying to start simple to begin with. Happy to switch over to jstree. The one with the "spreadsheet" icons is a nice touch. Might be worth thinking about whether we still want depth if we can collapse layers. Though it may still make sense if we expect the HTML to get unwieldy with the number of items added to it. Otherwise I trust your judgement.

@alimanfoo
Copy link
Member

Thank you, I'll merge now and add in the jstree implementation. I think we can leave the depth option for now. Hope you feel better!

@alimanfoo alimanfoo merged commit 4a5847c into zarr-developers:master Nov 14, 2017
@jakirkham jakirkham deleted the add_tree branch November 15, 2017 00:26
@jakirkham
Copy link
Member Author

SGTM. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New features or improvements release notes done Automatically applied to PRs which have release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants