Skip to content

Rewritten lexer using Megaparsec supporting hex and bytes literals #1750

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 42 commits into from
Nov 21, 2020

Conversation

pchiusano
Copy link
Member

@pchiusano pchiusano commented Nov 19, 2020

(You can start reviewing, but hold off on merging: I have one todo left, making lexer errors print out nicely.)

This is a drop in replacement for the old lexer, totally rewritten using Megaparsec. This was done to support new documentation syntax which will require some lexer fanciness. The code isn't really much shorter, but it's now more straightforward parser combinator code. The layout handling, however, is still pretty finicky.

I added a couple features during the rewrite:

  • Hexadecimal and octal literals, like 0xdeadbeef or 0o012723, which are parsed as the corresponding Nat.
  • Bytes literals, which start with 0xs, for instance 0xs9ab38bcf83ffabb9012, which are used by the pretty-printer.

Interesting/controversial decisions

I was going to add a syntax for identifiers that include spaces or other special characters, using surrounding double backticks (or any number of enclosing backticks that's greater than 1). The lexer change was very easy, but the pretty-printer change seemed annoying and hard to do properly because of how names are just an opaque text value currently. So I left this feature commented out for now.

Test coverage

There's good coverage from all the existing tests and transcripts. The new bytes literals feature has some testing from existing transcripts. I added some tests of the hex and octal literals as well to the term printer test suite.

I added a new transcript (error-messages.md) demonstrating some of the new error messages.

Loose ends

I wasn't able to delete some of the old wordy and symboly id parsers which are currently being used in path parsing which I didn't want to mess with for this PR. But I'd like to clean that up at some point.

@pchiusano pchiusano marked this pull request as ready for review November 20, 2020 01:34
@aryairani
Copy link
Contributor

Sounds good. Agree we should change Name to something structured and then support escaping segments. And then pull out syntax into an interface :)

@pchiusano
Copy link
Member Author

Okay, I think this is all set.

I did some work to improve the error reporting and added a transcript to demonstrate (error-messages.md). Feel free to add other failing code to that transcript to check for sensible error messages.

@mitchellwrosen you can take a look if you like, otherwise I will merge in a day or two assuming no objections from anyone else.

@pchiusano pchiusano added the ready-to-merge Apply this to a PR and it will get merged automatically once CI passes and 1 reviewer has approved label Nov 20, 2020

num :: Maybe String -> Integer -> Lexeme
num sign n = Numeric (fromMaybe "" sign <> show n)
sign = P.optional (lit "+" <|> lit "-")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the result of this is always fromMaybe ""'d - if so, it could be refactored to

sign = lit "+" <|> lit "-" <|> pure ""

@mitchellwrosen
Copy link
Member

LGTM! My main feedback is that the implementation looks a little bit "stringy" - like if you had to get in there and amend something, you might have to lean on a comprehensive test suite rather than the type checker to make sure you've handled all the cases. But for a lexer, that's probably expected :)

@mergify mergify bot merged commit 2eccdbc into trunk Nov 21, 2020
@mergify mergify bot deleted the topic/lexer2 branch November 21, 2020 01:23
@mergify mergify bot removed the ready-to-merge Apply this to a PR and it will get merged automatically once CI passes and 1 reviewer has approved label Nov 21, 2020
@pchiusano pchiusano mentioned this pull request May 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants