Skip to content

Optimize syntax coloring #4383

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

georgewfraser
Copy link
Contributor

Rust-analyzer semantic syntax coloring is fantastic---the best implementation of semantic coloring in a language server. However, there are several medium-sized opportunities for optimization and changes that would bring the coloring in line with VSCode best-practice. This PR introduces the following changes:

Use semantic coloring for the tricky parts, use TextMate grammar for the simple parts

The current implementation applies semantic coloring to every part of a rust source file. This is contrary to the guidance from the VSCode docs:

Semantic tokenization allows language servers to provide additional token information based on the language server's knowledge on how to resolve symbols in the context of a project. Themes can opt-in to use semantic tokens to improve and refine the syntax highlighting from grammars. The editor applies the highlighting from semantic tokens on top of the highlighting from grammars.

This isn't a strictly academic concern. When editing large Rust source files, the current implementation generates large messages that contain semantic coloring for every token in the file. This causes small but perceptible delays in updates to the coloring.

This PR optimizes semantic syntax coloring by using it only for the tricky parts, like type names.
At the same time, this PR replaces VSCode's built-in TextMate grammar with a more conservative grammar that only colors the simple parts, like keywords. You can see how this works by running the extension with semantic coloring turned off. The simplified TextMate grammar only colorizes the basics:

Screen Shot 2020-05-08 at 5 32 41 PM

And then running the extension with the TextMate coloring turned off. Semantic coloring only colorizes the tricky parts:

Screen Shot 2020-05-08 at 5 33 31 PM

Together, the two modes produce syntax coloring that is accurate and fast:

Screen Shot 2020-05-08 at 5 32 22 PM

Use the "variable" color more selectively

Because VSCode has historically used only TextMate grammars, which are very inaccurate, the existing grammars tend to color everything that isn't a type or a keyword as "variable" (light blue in the dark+ theme). This is contrary to how most other IDEs do syntax coloring---IntelliJ and Visual Studio, for example, use semantic information to do more selective syntax coloring. Now that we have semantic information, we should take advantage of it to visually differentiate variables from fields and namespaces:

Screen Shot 2020-05-08 at 6 26 08 PM

Mark mutable and static variables with underline and italic

VSCode has the capability to indicate additional semantic information using underline and italic, similar to IntelliJ, but TextMate grammars aren't smart enough to take advantage of it. Now that we have semantic information available, we can provide these additional hints to the programmer:

Screen Shot 2020-05-08 at 6 28 13 PM

Removed features

rust-analyzer currently injects Rust syntax coloring into raw strings that are used as arguments to functions when the function argument name starts with ra_fixture. This feature is harder to support in this PR, because we're relying on the TextMate coloring for most tokens. This "language injection" seems like a micro-feature that is only useful in one specific repository (this one), so I dropped the feature in this PR. I can revive this feature, it will just make things a bit more complicated---LMK if this is really important and I should bring it back.

@georgewfraser
Copy link
Contributor Author

Also, LMK what you want in the way of tests. Happy to write them, just wanted to give you a chance to take a look before I do that work.

@matklad
Copy link
Member

matklad commented May 9, 2020

I'll give this a closer look later, but I want to share a couple of design principles we use

  • I think textmate grammar (or, rather, any approximate language grammars) should not be used for anything. Thou shalt not parse programming languages with regular expressions :-) So, I'd love to explore ways to solve specific problem within editor-agnostic and precise framework

    • SemanticTokens have support for viewports, so ideally the editor should ask only for bounded amount of data for any latency-critical interraction (it's OK to color the whole file "in the background").
    • SemanticTokens have a capability to send a diff of highlighting, to reduce bandwidth. We don't use this today, but we should. Note that this is strictly bandwidth optimization -- we always compute full set of highlights, but then, during encoding, we compress them against the previous full set.
  • It's true that injection right now serves directly only a very narrow use-case (highlighitng tests in rust-analyzer itself). However injection feature in general is important, and by keeping it in we make sure that architecture is well-prepared for handling injections. A specific case where we'll certainly need a similar capability is highlighting of doctests. But, for example, highlighting SQL inside sqlx via some kind of a plugin is also someting we could do in the future.

@matklad
Copy link
Member

matklad commented May 9, 2020

Oh, it's also worth noting that IntelliJ does highlighting in two passes:

  • It has a lexer based highlighting (but it uses a real lexer and not a textmate grammar)
  • It has semantic highlighitng on top

We currently only have 2. IntelliJ's lexer is just faster, and it is also trivially incremental (any point where the lexer is in a default state can be a suspend/resume point).

@flodiebold
Copy link
Member

editor-agnostic

I think that's worth highlighting -- dividing the highlighting between client and server this way seems like it would make it a lot more complicated to support in other clients. I don't subscribe to the idea that there shouldn't need to be any language-specific code on the client, but it's worth keeping in mind how other editors would implement this.

@bjorn3
Copy link
Member

bjorn3 commented May 9, 2020

This PR optimizes semantic syntax coloring by using it only for the tricky parts, like type names.
At the same time, this PR replaces VSCode's built-in TextMate grammar with a more conservative grammar that only colors the simple parts, like keywords.

This means that while rust-analyzer is starting, if it couldn't handle a file outside of a workspace or for some other reason it couldn't generate semantic highlighting information, the syntax highlighting is degraded compared to the current state.

@georgewfraser
Copy link
Contributor Author

@bjorn3

This means that while rust-analyzer is starting, if it couldn't handle a file outside of a workspace or for some other reason it couldn't generate semantic highlighting information, the syntax highlighting is degraded compared to the current state.

This is true today---the TextMate syntax coloring is different than the semantic syntax coloring, because it's so much less accurate. A reasonable compromise (the same one made by IntelliJ) is to make the first-pass syntax coloring (the one without semantic coloring) a bit more conservative. It can still color most things---keywords, strings and other literals, builtin types---but it leaves the tricky things uncolored---user-defined types, enum constants, function references.

@georgewfraser
Copy link
Contributor Author

@matklad

I think textmate grammar (or, rather, any approximate language grammars) should not be used for anything. Thou shalt not parse programming languages with regular expressions :-)

I mean, I completely agree with you, and the approach to highlighting in rust-analyzer is approximately 10 billion times better than the TextMate grammar used in VSCode. But VSCode is the way it is, and that means the "first-pass" syntax coloring is going to be a TextMate grammar. So, it's worth being thoughtful about what we want to happen in this first pass.

SemanticTokens have support for viewports ... SemanticTokens have a capability to send a diff of highlighting

I had no idea! I agree this is the correct way to optimize the current communication bottleneck in large files.

Given your feedback, I propose to make a couple of new PRs accomplishing the goals of this PR:

  1. Make sure the VSCode TextMate coloring and the semantic coloring "play nice" with each other.
  2. Small improvements to the syntax coloring: only apply "variable" to variables, color template parameters the same way typescript does, apply appropriate scopes to keywords versus control keywords.
  3. New features in the syntax coloring: underlined mutables, italic statics.
  4. Optimize communication by implementing viewports or incremental tokens or both.

I'm thinking I'll do 1+2, then 3, then 4. If you want to see something different, let me know.

@georgewfraser georgewfraser marked this pull request as draft May 9, 2020 16:49
@georgewfraser
Copy link
Contributor Author

OK, here's the first PR, which tries to minimize change and just make the TextMate grammar and the semantic coloring cooperate better: #4397

@georgewfraser
Copy link
Contributor Author

Second PR, adds coloring for attributes, statics and mutables: #4400

Third one is going to take a little longer, need to really dig in and understand the incremental sync mechanism.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants