-
Notifications
You must be signed in to change notification settings - Fork 99
Option to change whitespace in token parsing. #44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
@Chobbes, take a look at @minad 's solution from #41, this is one option. But as he points out this is not enough to elegantly parse languages where indentation matters. This issue is one of our goals in Megaparsec. I propose you close the PR because with respect, #41 provides not ideal, but much better solution. |
@mrkkrp #41 is a better solution, but not if you wish to maintain backwards compatibility. It will break any existing token parsers which manually construct a LanguageDef. I'm fine with the change in #41 (and would prefer it), but I intentionally avoided it because I thought this option was less destructive (and as a result more likely to be merged). I will leave this for the maintainer to decide. "But as he points out this is not enough to elegantly parse languages where indentation matters." @mrkkrp #41 makes no mention of this aside from references to Layout.hs and the IndentParser package. Layout.hs is nearly identical to what the indents package does: https://hackage.haskell.org/package/indents and will suffer from the same issues with respect to the Python example I mentioned. IndentParser provides its own token library, and appears to have the best alternative thus far. megaparsec appears to not currently solve this problem, and at the moment no longer handles commenting like the Token module does in Parsec? mrkkrp/megaparsec@3661da9#diff-60a69ee7900f16e4c15e0edaf4cce9a3R548 Is this a long term goal of megaparsec, or does it provide a solution now? |
@Chobbes, Megaparsec is less than one month old, it's not finished yet, although it's already does many things much better than Parsec, see Anyway, good luck with this PR. I would be glad if Parsec finally moved forward extending its functionality in any way. But as I see it, Parsec is really old and its development is dead. It lacks tests, I've studied all its issues in the past, its changelog, etc. Goal of Parsec now is preserving its functionality without breaking anything. So if I were maintainer of Parsec, I would first fix its (well-reproducible and well-known) bugs, see issues tracker, rather than add new functionality. To move forward now, it needs a really passionate maintainer, who would write complete test suite for it for starters. I hope when we finish our test suite for Megaparsec, Parsec could adopt its variant to test its own code (although they will need to edit it manually, I'm not interested in doing that for Parsec without any guarantee that my work won't be just ignored by this sleepy project). If you need to do something with this actual issue fast, do it in your project (possibly in ugly way) and you will be fine. |
@Chobbes, I've started work on this (and related) issues in Megaparsec. See branch You could try out our lexer and give your feedback. It's possible that we will release Megaparsec sooner then this PR is merged, by the way. |
@Chobbes, Megaparsec 4.0.0 is ready and it provides solution now. It'll will be tagged and released tomorrow I think. You can clone the repo and try it. |
Currently there is no way to alter what the token parser considers to be whitespace. This is an issue if one wishes to parse certain indentation specific languages, for instance with the following package:
https://hackage.haskell.org/package/indents
If newlines are consumed by the lexeme parsers in Text.Parsec.Tokens then it is difficult to work with certain languages which depend upon indentation and newlines, such as Python 3.
https://docs.python.org/3/reference/grammar.html
As an example if statements in Python either require semi-colon separated statements after the conditional, e.g.:
if blah: stmt1; stmt2
Or the statements must occur on a newline, but must also be indented further. Not both. This makes it difficult to use Text.Parsec.Tokens.makeTokenParser in its current form, as it ignores any form of newline since it uses Data.Char.isSpace by default to decide what is and what is not whitespace.