-
-
Notifications
You must be signed in to change notification settings - Fork 36
Some red flags with the current grammar for reserved-statement #635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think you're reading the grammar fine, but drawing an perhaps-incorrect conclusion.
Too lax for what purpose? We deliberately made the grammar of The Basically this says "consume characters until you see an (unescaped) How would you change it? We need to freeze the syntax. |
@mihnita I'm sorry I missed bringing this up in the F2F call. Can you clarify the request or, if appropriate, close this issues if it has been addressed? Thanks. |
Per discussion in 2024-02-26 call, closing |
This is how our escaping of
Legend: When writing a parser it is common to write a tokenizer that groups characters into tokens, and then the parser works at the higher level, on tokens. But to write a tokenizer it is best to not need much context, just consume characters. On occasions the tokenizer might enter a different mode, for example in strings ( But these are one single level mode changes. For our grammar the tokenizer must know when we are inside a reserved section, but to get there it needs a Worse, inside a reserve sequence we are still looking (and recognizing) a literal (block 5 above). This is unnecessary complex. Our grammar forces the tokenizer to:
"I wrote a parser for it" is not an argument. And think what this does for our users. |
I think this was a misunderstanding. |
Kind of related, I think that there is no way to use
|
You mean writing a left-to-right (from the beginning to the end) parser, right? I will concede that for our grammar, writing a parser that starts somewhere in the middle and needs to understand the context by looking left, is not so immediate as writing a left-to-right parser. But that is probably true for many (most) language grammars.
There are two ways to write a parser:
Yes. As said above, this can be achieved with 1 bit of context.
I disagree. As stated above, all the tokenizer needs to know is whether the parser currently is inside a
There is no problem with literals in the grammar. Maybe you designed a tokenizer that attempts to returns literals as a single token? If so, that is a problem with the design of that tokenizer, but not a problem with the grammar. I mean, you would have the same problem in an ISO C parser, if you wrote a tokenizer that returns |
I think this is intentional. Because otherwise there would be a parsing ambiguity inside an
could be parsed as
or as
|
This is 100% intentional. A reserved annotation is part of |
Since we removed |
See above comment. |
Reserved statements:
If I read this correctly, then these strings are valid:
Do we really want to allow
'.'
in the reserved body?Escaped text:
To me this looks like a red flag.
Not only for the implementers of a parser, but also for the users of the final syntax.
In most cases escaped strings can be handled pretty low level (tokenizer).
But in this case knowing that we can now accept
reserved-escape
we need a lot of context.And that allows us to support strings like this:
So I find the
reserved-statement
to be too lax.Or I am reading the grammar wrong?
The text was updated successfully, but these errors were encountered: