-
-
Notifications
You must be signed in to change notification settings - Fork 35
Trivia interface #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think the rust-analyzer folks consider their choice a mistake in hindsight. At least that is how I am interpreting matklad's comment helix-editor/helix#5231 (comment) from December (he seems to be the person who contributed the vast majority of rust-analyzer initially). I think the discussion in rust-lang/rust-analyzer#6584 is probably less clear on that point because they of course face the choice of disrupting a lot of work if they change now, years after the initial choice. To me this all kind of points towards following the Roslyn model... At least right now I have the impression that most of the folks that have really dug into this question and then built large scale tools on top of things seem to think that the Roslyn design is the way to go. I think the fact that rust-analyzer tried out the other option and now considers that a mistake is particularly telling. I guess one other option would be that we try to implement a Rosly like tree structure just for the LS side of things? |
That's an interesting comment for sure. Thanks so much for pointing it out :) I have several thoughts about this.
I doubt there is "one true trivia attachment to rule them all", or if there is, it's quite heuristic - not necessarily something you want to hardcode into the parser, I'd say. However, it's possible to support various whitespace attachment algorithms with the existing parser, because we can shuffle whitespace between nodes after parsing but before tree building. This is what I did in the prototype conversion to CSTParser.jl data structures here JuliaSyntax.jl/prototypes/cst_conversion.jl Line 185 in 9c21de0
I think we can support "whitespace attached to AST leaf nodes", (the Roslyn model?), as a generalization of our current rules; a fairly small tweak to how things work right now. Right now,
How about, instead:
I think this provides a data model which supports the various kinds of trivia attachment. I somewhat suspect we may need a tree type or utilities which is somewhere between Related to the example shown in helix-editor/helix#5231 (comment), I think this is a good example, though I've got some other thoughts about that: I feel like moving around code like that generally requires indentation or other whitespace changes, and thus it's a hopeless task to think that a "proper" attachment of trivia would ever solve this problem. Instead, it would be better to focus on really good code reformatting which tries to preserve the ambient style while also adjusting to the location where the code is pasted. (I've been thinking about formatting a lot lately. I feel like training an ML model of whitespace is the right way to go here - with a data driven approach there's some hope of going beyond a bunch of complicated heuristics and into the land of "good taste in formatting is following these examples but knowing when to break them". Which is how humans express style guides. I've yet to see any formatting tool achieve this to my satisfaction.) |
I was thinking a bit about the "right" interface to trivia.
The rust-analyzer people are discussing it over at rust-lang/rust-analyzer#6584 so they've got some good background reading there. It seems generally awkward with no obviously right answer.
IIUC there's two common interfaces:
The rust-analyzer model is appealing because it leads to simpler data structures with less internal structure. Also it's more general because the trivia might be naturally interspersed with nontrivia children but without a natural attachment to any of the children. But we could go for either approach, or something else entirely.
Whitespace trivia
A useful observation: we can't attaching whitespace so that
In general for a refactoring pass, I guess whitespace will become inconsistent during refactoring and will need to be regenerated. This is obviously true for moving blocks but it's even true for refactoring as simple as renaming identifiers. For example, renaming elements of expressions which span multiple lines:
So I'm kind of convinced that there's no natural representation of whitespace within the green tree, so we may as well do whatever is efficient and simple to implement.
Symbols
Consider a simple thing like
(b + c) + (b + c)^2
and a pass which identifies common subexpressions to getHere we can and should remove the parentheses (which are trivia after parsing, due to being used for grouping only). What do we even do here? Like whitespace, it seems refactorings will regularly break this kind of trivia and require that it's regenerated from a model of the precedence rules.
Comments
What about comments? This is much more relevant and I think we should aim for "comments are likely to survive symbolic refactoring and remain attached in the right places".
It seems likely there's cases where one or other model wins here, depending on the situation. Some prototyping with simple example refactoring passes might be necessary to get a feel for the pros and cons.
Impact on the parser
One big benefit we have in the
ParseStream
interface is that trivia is mostly invisible to the parser. So in theory we can adjust trivia attachment heuristics (within whichever model is chosen) independently of the parser code. Julia is sensitive to whitespace and newlines in selected situations, but after parsing is done this information is no longer needed and it may be consistent to split and recombine trivia however we like by floating the boundaries of nodes across the trivia tokens.The text was updated successfully, but these errors were encountered: