-
Notifications
You must be signed in to change notification settings - Fork 278
Rewritten lexer using Megaparsec supporting hex and bytes literals #1750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…unction behavior (untested)
Previously, it was popping the layout stack even when the parser was in the 'opening' state. Correct behavior when in 'opening' state is just to push onto the stack and run the parser. Popping and/or emitting virtual semis should only happen when not in the 'opening' state.
…print these properly
Sounds good. Agree we should change Name to something structured and then support escaping segments. And then pull out syntax into an interface :) |
Okay, I think this is all set. I did some work to improve the error reporting and added a transcript to demonstrate (error-messages.md). Feel free to add other failing code to that transcript to check for sensible error messages. @mitchellwrosen you can take a look if you like, otherwise I will merge in a day or two assuming no objections from anyone else. |
|
||
num :: Maybe String -> Integer -> Lexeme | ||
num sign n = Numeric (fromMaybe "" sign <> show n) | ||
sign = P.optional (lit "+" <|> lit "-") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the result of this is always fromMaybe ""
'd - if so, it could be refactored to
sign = lit "+" <|> lit "-" <|> pure ""
LGTM! My main feedback is that the implementation looks a little bit "stringy" - like if you had to get in there and amend something, you might have to lean on a comprehensive test suite rather than the type checker to make sure you've handled all the cases. But for a lexer, that's probably expected :) |
(You can start reviewing, but hold off on merging: I have one todo left, making lexer errors print out nicely.)
This is a drop in replacement for the old lexer, totally rewritten using Megaparsec. This was done to support new documentation syntax which will require some lexer fanciness. The code isn't really much shorter, but it's now more straightforward parser combinator code. The layout handling, however, is still pretty finicky.
I added a couple features during the rewrite:
0xdeadbeef
or0o012723
, which are parsed as the correspondingNat
.0xs
, for instance0xs9ab38bcf83ffabb9012
, which are used by the pretty-printer.Interesting/controversial decisions
I was going to add a syntax for identifiers that include spaces or other special characters, using surrounding double backticks (or any number of enclosing backticks that's greater than 1). The lexer change was very easy, but the pretty-printer change seemed annoying and hard to do properly because of how names are just an opaque text value currently. So I left this feature commented out for now.
Test coverage
There's good coverage from all the existing tests and transcripts. The new bytes literals feature has some testing from existing transcripts. I added some tests of the hex and octal literals as well to the term printer test suite.
I added a new transcript (
error-messages.md
) demonstrating some of the new error messages.Loose ends
I wasn't able to delete some of the old wordy and symboly id parsers which are currently being used in path parsing which I didn't want to mess with for this PR. But I'd like to clean that up at some point.