Skip to content

Implement BaseNode.visit API #68

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
stasm opened this issue Jun 1, 2018 · 9 comments · Fixed by #96
Closed

Implement BaseNode.visit API #68

stasm opened this issue Jun 1, 2018 · 9 comments · Fixed by #96
Assignees
Labels
fluent.syntax Issues related to the fluent.syntax module.

Comments

@stasm
Copy link
Contributor

stasm commented Jun 1, 2018

BaseNode.traverse can be costly because it clones the entire tree under the node it ran on. We should add a new method just for non-destructive visiting. This will be useful for the equals methods, as well as for the compare-locales checks.

@Pike Pike changed the title Implement BaseNode.vitit API Implement BaseNode.visit API Jun 1, 2018
@Pike
Copy link
Contributor

Pike commented Dec 7, 2018

Here's a WIP I did to investigate some performance optimizations in compare-locales, the comment is obviously wrong:

def traverse(node, fun):
    """Postorder-traverse this node and apply `fun` to all child nodes.

    Traverse this node depth-first applying `fun` to subnodes and leaves.
    Children are processed before parents (postorder traversal).

    Return a new instance of the node.
    """

    def visit(value):
        """Call `fun` on `value` and its descendants."""
        if isinstance(value, ftl.BaseNode):
            traverse(value, fun)
            return
        if isinstance(value, list):
            map(visit, value)
        else:
            fun(value)

    if not fun(node):
        return
    # If signaled by the visitor, descend into this node
    props = vars(node).items()
    for name, value in props:
        visit(value)

cc @jotes, as he was looking for something light-weight.

@jotes
Copy link

jotes commented Dec 12, 2018

@Pike Thanks, I'll take a look after bug 1407016 lands on staging/prod.

@jotes
Copy link

jotes commented Jan 2, 2019

@stasm @Pike
Hey,
I've started working on this and I want to confirm the end goal.
The expected result is to have a faster traversal method which could later could be utilized in places like e.g. equals, compare-locales and pontoon.

I've created a small benchmark to get some baseline numbers in order to verify if my changes will affect the performance.
https://github.com/projectfluent/python-fluent/compare/master...jotes:issue-68-axel-traverse-perf-comparison?expand=1#diff-2c9837314490d8a89308e750a026145c
Results are visible at the bottom of file.

It looks like the lack of cloning improves the performance by 5-10% on my machine.
I'm not sure if that's the improvement we're looking for (alternatively, I could misunderstood your intentions).

Thanks in advance for all your support :-)

@Pike
Copy link
Contributor

Pike commented Jan 8, 2019

There's two things here:

One is that microbenchmarks are always tricky. A bit more realistic could be to run the tests over fluent.js/tools/perf/workload-low.ftl.

The other thing is that my example code has an early return. For example, my word-counter code on my branch looks like

    def count_words(self):
        if self._word_count is None:
            self._word_count = 0

            def count_words(node):
                if isinstance(node, ftl.TextElement):
                    self._word_count += len(node.value.split())
                return isinstance(node, (ftl.Message, ftl.Term, ftl.Attribute, ftl.VariantList, ftl.Pattern, ftl.Placeable, ftl.SelectExpression, ftl.Variant))

            traverse(self.root_node, count_words)

        return self._word_count

If you'd adapt that code to run over the workload resource to count all words, I'd expect to see a noticeable difference.

Ride-along question, you added a call to fun around map(visit, value), do you have a particular use-case in mind for that?

@jotes
Copy link

jotes commented Jan 10, 2019

@Pike thanks! I'll change my branchmark to use workload.ftl

@stasm
Copy link
Contributor Author

stasm commented Jan 11, 2019

My goal for filing this issue was to offer an API suitable for applying the visitor pattern to the AST. This could be helpful for many different use-cases which need to walk the tree of nodes. I don't know however if it would be faster than what @Pike suggested in #68 (comment).

Here's a rough draft of what I had in mind. Alternatively, rather than using getattr as I did in this quick example, we could add a visit method to each subclass of BaseNode.

diff --git a/fluent/syntax/ast.py b/fluent/syntax/ast.py
index 7ecd182..ff3db7e 100644
--- a/fluent/syntax/ast.py
+++ b/fluent/syntax/ast.py
@@ -43,16 +43,28 @@ def scalars_equal(node1, node2, ignored_fields):
 
 class BaseNode(object):
     """Base class for all Fluent AST nodes.
 
     All productions described in the ASDL subclass BaseNode, including Span and
     Annotation.  Implements __str__, to_json and traverse.
     """
 
+    def visit(self, visitor, state=None, cont=None):
+        try:
+            fn = getattr(visitor, self.__class__.__name__)
+            fn(self, state, cont)
+        except AttributeError:
+            pass
+
+    def walk(self, visitor, state=None):
+        def cont(node, state):
+            return node.visit(visitor, state, cont)
+        return cont(self, state)
+
     def traverse(self, fun):
         """Postorder-traverse this node and apply `fun` to all child nodes.
 
         Traverse this node depth-first applying `fun` to subnodes and leaves.
         Children are processed before parents (postorder traversal).
 
         Return a new instance of the node.
         """

Then, it would be possible to write custom visitors, specialized for their particular purposes. For instance, the following visitors prints names of variables.

from fluent.syntax import ast, parse

# Methods for Terms, SelectExpressions and Variants omitted for clarity.
class Visitor(object):
    def Resource(self, node, state, cont):
        for entry in node.body:
            cont(entry, state)

    def Message(self, node, state, cont):
        cont(node.value, state)

    def Pattern(self, node, state, cont):
        for element in node.elements:
            cont(element, state)

    def Placeable(self, node, state, cont):
        cont(node.expression, state)

    def VariableReference(self, node, state, cont):
        print(node.id.name)


resource = parse("hello = Hi, {$username}!")
resource.walk(Visitor())

# → "username"

One possible gain is that this approach avoids a lot of if checks run on each node of the tree, and instead leverages the visitor pattern to simulate the multiple dispatch.

@jotes
Copy link

jotes commented Jan 13, 2019

@stasm 🙇‍♂️ I wrote similar visitor in my past, I'll ping you tomorrow with PoC of PR.

@Pike
Copy link
Contributor

Pike commented Jan 14, 2019

I'd prefer a generic visitor to be there to subclass, like in https://docs.python.org/2/library/ast.html#ast.NodeVisitor. Which might get back into introspection to make maintenance easier.

There are two parts to that:

One is if iteration is opt-in or opt-out. I think that opt-out is easier to implement for subclassing.

The other is that of implementation difficulty and maintenance of clients. With opt-in, you need to understand each node on the way to where you're going, with opt-out, you don't. Also, in case of changes to the AST, you need to rewrite your code for each change to each node you're interested in.

@stasm
Copy link
Contributor Author

stasm commented Jan 14, 2019

That's a great suggestion. Even in my minimal example the Visitor class already felt too long for what I was trying to demonstrate.

@stasm stasm added the fluent.syntax Issues related to the fluent.syntax module. label Jan 24, 2019
@Pike Pike self-assigned this Feb 7, 2019
@Pike Pike closed this as completed in #96 Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fluent.syntax Issues related to the fluent.syntax module.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants