-
Notifications
You must be signed in to change notification settings - Fork 15
Treesitter challenges #808
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Random musing - it's possible that what we actually want would be broken into two passes Pass 1 - A helper for "Am I somewhere in an Arguments node?", which would return Pass 2 - Now that we know we are inside an Arguments node, we need to find the child of that Arguments node that directly precedes the user's cursor. This may not necessarily span the user's cursor (like the Pass 1 is roughly Pass 2 is roughly We could likely end up with various flavors of Pass 1 helpers for I'm somewhat convinced that each feature (different types of completions, hover, signature, etc) has its own specific set of requirements about the exact node you care about starting from. So while it has proven useful for us to cache a Another way of thinking about this is that tree-sitter isn't inherently broken or anything. We likely just have the wrong set of abstractions built on top of it for our required tasks. But now that we have a bunch of examples of the tasks we'd like to do, we can probably build better tree-sitter tooling. |
Great summary:
|
I had a feeling that somehow the direction from which we look from the cursor is important and that part of our struggles is how "closest node at point" is insensitive to direction. I reviewed approaches in Ruff and Rust-Analyzer and found interesting patterns:
So my general feeling regarding actions to take:
|
Uh oh!
There was an error while loading. Please reload this page.
My recent foray into the language server and, more specifically, completions has given me Some Opinions™️, which I discuss with @lionel- and @DavisVaughan. We agreed I'd capture some of this friction I'm noticing as a newcomer to the codebase. This issue is inspired by work on #778 and #805.
As I get to know the completions codebase, I've been surprised at the widespread, very low-level interaction with the treesitter syntax tree. In hindsight, it's clear I expected activities to be framed in terms of nodes of type, e.g., "call", "arguments", or "argument". I didn't expect so much logic around anonymous nodes, such as
"("
,")"
,","
, and"="
.To a considerable extent, you cannot avoid working with these low-level nodes. I mean, someone has to do it! Therefore one coping strategy is to keep building out a nice layer of well-tested wrapping around treesitter and to increase usage of this wrapping (i.e. try to eliminate bespoke low-level tree handling in functions that do high-level tasks).
But it may also be true that this is a legitimate downside of using the treesitter tree or parser directly (or at all?). This issue is a place to record challenges that come from treesitter.
Whitespace is hard
The fact that whitespace is basically not accounted for in the syntax tree is quite painful, because the cursor quite often has whitespace on one or both sides. In the language server, we constantly need to determine which node is "most associated" with the cursor. It's accurate-ish to say that treesitter's treatment of whitespace makes the "most associated" node almost undefined in these cases. It certainly puts you in a gray area.
Let's look at an example! Consider this code, where
@
indicates the cursor:Here's treesitter's view of that code. On the left, I overlay treesitter coordinates and on the right is the resulting syntax tree.
To paraphrase the treesitter docs about [i, j] coordinates:
(It's really bytes, not characters, but that's not important for this discussion.)
A treesitter position:
In the example, the cursor
@
is at position [1, 6].So which node is the cursor "in"?
If your job is to provide completions, which bit of syntax are you helping the user to fill in?
IMO there are two reasonable answers. You're either in:
a =
.I view these as equivalent, because if you chose option 1, you would then have logic to bring yourself to option 2. That's just a matter of how you design the interface.
If you use bare treesitter tooling, here's the node you are "in":
"("
and the")"
.You can read this off the tree, because the "arguments" node is the smallest node with a span that contains position [1, 6].
Selecting the "arguments" node is very unfavorable for providing completions, though. It's too high.
What if there was no space between the
"="
and the cursor?Bare treesitter tooling would still say the cursor is in the "arguments" node.
Ark already has some wrappers around treesitter where we have (somewhat) fixed this up.
find_closest_node_to_point()
would latch on to the"="
in this case.(And quite a bit of existing logic expects to solve problems in this "bottom up" way, although I'm not sure it has to be this way.)
This is a good place to record the capturing behaviour at node boundaries.
In treesitter, a node span is sticky / inclusive on the left and not sticky / exclusive on the right.
Concretely, where
@
indicates the cursor position and[ ... ]
indicates a node's span:find_closest_node_to_point()
would say the cursor is in the node in both cases.Executive summary
The treatment of whitespace makes treesitter syntax trees tricky to use directly for language server tasks.
You generally need to walk up and/or down to identify the node that really drives your actions.
It feels like ark's language server currently has these tricky gymnastics inlined throughout the codebase.
In the future, it would be nice to give ourselves a more ergonomic interface to the tree.
The text was updated successfully, but these errors were encountered: