Optimize syntax coloring #4383

georgewfraser · 2020-05-09T01:39:23Z

Rust-analyzer semantic syntax coloring is fantastic---the best implementation of semantic coloring in a language server. However, there are several medium-sized opportunities for optimization and changes that would bring the coloring in line with VSCode best-practice. This PR introduces the following changes:

Use semantic coloring for the tricky parts, use TextMate grammar for the simple parts

The current implementation applies semantic coloring to every part of a rust source file. This is contrary to the guidance from the VSCode docs:

Semantic tokenization allows language servers to provide additional token information based on the language server's knowledge on how to resolve symbols in the context of a project. Themes can opt-in to use semantic tokens to improve and refine the syntax highlighting from grammars. The editor applies the highlighting from semantic tokens on top of the highlighting from grammars.

This isn't a strictly academic concern. When editing large Rust source files, the current implementation generates large messages that contain semantic coloring for every token in the file. This causes small but perceptible delays in updates to the coloring.

This PR optimizes semantic syntax coloring by using it only for the tricky parts, like type names.
At the same time, this PR replaces VSCode's built-in TextMate grammar with a more conservative grammar that only colors the simple parts, like keywords. You can see how this works by running the extension with semantic coloring turned off. The simplified TextMate grammar only colorizes the basics:

And then running the extension with the TextMate coloring turned off. Semantic coloring only colorizes the tricky parts:

Together, the two modes produce syntax coloring that is accurate and fast:

Use the "variable" color more selectively

Because VSCode has historically used only TextMate grammars, which are very inaccurate, the existing grammars tend to color everything that isn't a type or a keyword as "variable" (light blue in the dark+ theme). This is contrary to how most other IDEs do syntax coloring---IntelliJ and Visual Studio, for example, use semantic information to do more selective syntax coloring. Now that we have semantic information, we should take advantage of it to visually differentiate variables from fields and namespaces:

Mark mutable and static variables with underline and italic

VSCode has the capability to indicate additional semantic information using underline and italic, similar to IntelliJ, but TextMate grammars aren't smart enough to take advantage of it. Now that we have semantic information available, we can provide these additional hints to the programmer:

Removed features

rust-analyzer currently injects Rust syntax coloring into raw strings that are used as arguments to functions when the function argument name starts with ra_fixture. This feature is harder to support in this PR, because we're relying on the TextMate coloring for most tokens. This "language injection" seems like a micro-feature that is only useful in one specific repository (this one), so I dropped the feature in this PR. I can revive this feature, it will just make things a bit more complicated---LMK if this is really important and I should bring it back.

georgewfraser · 2020-05-09T01:42:47Z

Also, LMK what you want in the way of tests. Happy to write them, just wanted to give you a chance to take a look before I do that work.

matklad · 2020-05-09T09:05:22Z

I'll give this a closer look later, but I want to share a couple of design principles we use

I think textmate grammar (or, rather, any approximate language grammars) should not be used for anything. Thou shalt not parse programming languages with regular expressions :-) So, I'd love to explore ways to solve specific problem within editor-agnostic and precise framework
- SemanticTokens have support for viewports, so ideally the editor should ask only for bounded amount of data for any latency-critical interraction (it's OK to color the whole file "in the background").
- SemanticTokens have a capability to send a diff of highlighting, to reduce bandwidth. We don't use this today, but we should. Note that this is strictly bandwidth optimization -- we always compute full set of highlights, but then, during encoding, we compress them against the previous full set.
It's true that injection right now serves directly only a very narrow use-case (highlighitng tests in rust-analyzer itself). However injection feature in general is important, and by keeping it in we make sure that architecture is well-prepared for handling injections. A specific case where we'll certainly need a similar capability is highlighting of doctests. But, for example, highlighting SQL inside sqlx via some kind of a plugin is also someting we could do in the future.

matklad · 2020-05-09T09:07:50Z

Oh, it's also worth noting that IntelliJ does highlighting in two passes:

It has a lexer based highlighting (but it uses a real lexer and not a textmate grammar)
It has semantic highlighitng on top

We currently only have 2. IntelliJ's lexer is just faster, and it is also trivially incremental (any point where the lexer is in a default state can be a suspend/resume point).

flodiebold · 2020-05-09T09:34:05Z

editor-agnostic

I think that's worth highlighting -- dividing the highlighting between client and server this way seems like it would make it a lot more complicated to support in other clients. I don't subscribe to the idea that there shouldn't need to be any language-specific code on the client, but it's worth keeping in mind how other editors would implement this.

bjorn3 · 2020-05-09T11:50:57Z

This PR optimizes semantic syntax coloring by using it only for the tricky parts, like type names.
At the same time, this PR replaces VSCode's built-in TextMate grammar with a more conservative grammar that only colors the simple parts, like keywords.

This means that while rust-analyzer is starting, if it couldn't handle a file outside of a workspace or for some other reason it couldn't generate semantic highlighting information, the syntax highlighting is degraded compared to the current state.

georgewfraser · 2020-05-09T16:26:01Z

@bjorn3

This means that while rust-analyzer is starting, if it couldn't handle a file outside of a workspace or for some other reason it couldn't generate semantic highlighting information, the syntax highlighting is degraded compared to the current state.

This is true today---the TextMate syntax coloring is different than the semantic syntax coloring, because it's so much less accurate. A reasonable compromise (the same one made by IntelliJ) is to make the first-pass syntax coloring (the one without semantic coloring) a bit more conservative. It can still color most things---keywords, strings and other literals, builtin types---but it leaves the tricky things uncolored---user-defined types, enum constants, function references.

georgewfraser · 2020-05-09T16:39:49Z

@matklad

I think textmate grammar (or, rather, any approximate language grammars) should not be used for anything. Thou shalt not parse programming languages with regular expressions :-)

I mean, I completely agree with you, and the approach to highlighting in rust-analyzer is approximately 10 billion times better than the TextMate grammar used in VSCode. But VSCode is the way it is, and that means the "first-pass" syntax coloring is going to be a TextMate grammar. So, it's worth being thoughtful about what we want to happen in this first pass.

SemanticTokens have support for viewports ... SemanticTokens have a capability to send a diff of highlighting

I had no idea! I agree this is the correct way to optimize the current communication bottleneck in large files.

Given your feedback, I propose to make a couple of new PRs accomplishing the goals of this PR:

Make sure the VSCode TextMate coloring and the semantic coloring "play nice" with each other.
Small improvements to the syntax coloring: only apply "variable" to variables, color template parameters the same way typescript does, apply appropriate scopes to keywords versus control keywords.
New features in the syntax coloring: underlined mutables, italic statics.
Optimize communication by implementing viewports or incremental tokens or both.

I'm thinking I'll do 1+2, then 3, then 4. If you want to see something different, let me know.

georgewfraser · 2020-05-09T20:08:43Z

OK, here's the first PR, which tries to minimize change and just make the TextMate grammar and the semantic coloring cooperate better: #4397

georgewfraser · 2020-05-09T20:28:46Z

Second PR, adds coloring for attributes, statics and mutables: #4400

Third one is going to take a little longer, need to really dig in and understand the incremental sync mechanism.

Optimize syntax coloring

cf167b7

georgewfraser mentioned this pull request May 9, 2020

Semantic colors don't match other languages #4335

Closed

georgewfraser added 5 commits May 8, 2020 18:51

Remove unused

1aab736

Fix format specifier coloring

7d5fe35

Update tests

8493c44

Color fn references

9f95b2e

Clean up unused tags

c8309b0

georgewfraser marked this pull request as draft May 9, 2020 16:49

georgewfraser closed this May 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize syntax coloring #4383

Optimize syntax coloring #4383

georgewfraser commented May 9, 2020

georgewfraser commented May 9, 2020

matklad commented May 9, 2020

matklad commented May 9, 2020

flodiebold commented May 9, 2020

bjorn3 commented May 9, 2020

georgewfraser commented May 9, 2020

georgewfraser commented May 9, 2020

georgewfraser commented May 9, 2020

georgewfraser commented May 9, 2020

Optimize syntax coloring #4383

Optimize syntax coloring #4383

Conversation

georgewfraser commented May 9, 2020

Use semantic coloring for the tricky parts, use TextMate grammar for the simple parts

Use the "variable" color more selectively

Mark mutable and static variables with underline and italic

Removed features

georgewfraser commented May 9, 2020

matklad commented May 9, 2020

matklad commented May 9, 2020

flodiebold commented May 9, 2020

bjorn3 commented May 9, 2020

georgewfraser commented May 9, 2020

georgewfraser commented May 9, 2020

georgewfraser commented May 9, 2020

georgewfraser commented May 9, 2020