-
Notifications
You must be signed in to change notification settings - Fork 194
Compiler: Modernize the js lexer, now utf-8 aware #1386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c50608c
to
dfc45e6
Compare
I'm not familiar enough with the code base to go through these changes to approve them and don't know how much non US-ASCII entry points there are in the wild JavaScript world but the fact that needless string munging is avoided is very nice. |
Minification and variable renaming is currently broken with this PR. Idents are currently not normalized and even contain escape sequence verbatim.
|
Did you check what regular JavaScript minifier do ? I suspect they don't do these things. Note that there are technologies like |
I had a quick look at terser and flow. I didn't see any unicode normalization. However, they both decode escape sequence in ident. |
According to the ECMAScript specification, one should not normalize but just decode escape sequences (and check that this still results in a valid identifier). |
The last commit decodes escape sequence in ident |
2949e31
to
cf47b57
Compare
cf47b57
to
1bfeab5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just had a look at the last commit. Don't claim I undersood everything of it but the tests look right :-)
fd666e6
to
d9438e7
Compare
In particular, it allows to recognize and emit utf8 identifiers.
This lexer now uses sedlex. The implementation was taken from flow and cleaned to remove unused features.