-
Notifications
You must be signed in to change notification settings - Fork 90
ANTLR-isation of parse/syntactic grammar #351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
IMPORTANT: As PR#342 which ANTLR-ises the lexical & preprocessor grammars is not yet merged those changes are NOT in the grammar in this PR. For the files changed by this PR this means you'll see `Identifier` rather than `identifier` etc. This PR makes the small number of changes to ANTLR-ise all but `primary_no_array_creation_expression` which is being left in its mutual left recursive state for now/ever... There is very little change to the descriptive text, these are mainly small grammar tweaks to remove mutual left recursion.
@@ -26,7 +26,12 @@ enum_declaration | |||
; | |||
|
|||
enum_base | |||
: ':' struct_type | |||
: ':' integral_type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes narrow down enum_base
to avoid bad stuff happening. Text below has a small addition to describe the two alternatives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems fine
@@ -1129,6 +1129,8 @@ primary_no_array_creation_expression | |||
; | |||
``` | |||
|
|||
> *Note*: These grammar rules are not ANTLR-ready as they are part of a set of mutually left-recursive rules (`primary_expression`, `primary_no_array_creation_expression`, `member_access`, `invocation_expression`, `element_access`, `post_increment_expression`, `post_decrement_expression`, `pointer_member_access` and `pointer_element_access`) which ANTLR does not handle. Standard techniques can be used to transform the grammar to remove the mutual left-recursion. This has not been done as not all parsing strategies require it (e.g. an LALR parser would not) and doing so would obfuscate the structure and description. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the one mutual left recursive set of rules being left. To factor the rules in the list can be inlined into primary_no_array_creation_expression
(see the Parse_CSharp.g4 file sent out). If you've an opinion on removing/leaving the MLR please comment, or would like propose an alternative grammar please do so!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your comment in the change that a refactoring would obscure the structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree as well.
@@ -2329,14 +2331,13 @@ nameof_expression | |||
; | |||
|
|||
named_entity | |||
: simple_name | |||
| named_entity_target '.' identifier type_argument_list? | |||
: named_entity_target ('.' identifier type_argument_list?)* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trivial MLR removal
@@ -139,14 +139,18 @@ A value type is either a struct type or an enumeration type. C# provides a set o | |||
|
|||
```ANTLR | |||
value_type | |||
: non_nullable_value_type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rules here formed an MLR set, just a bit or re-org fixed that.
standard/unsafe-code.md
Outdated
@@ -99,8 +99,8 @@ A *pointer_type* is written as an *unmanaged_type* ([§9.8](types.md#98-unmanage | |||
|
|||
```ANTLR | |||
pointer_type | |||
: unmanaged_type '*' | |||
| 'void' '*' | |||
: value_type '*'+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is a consequence of changing unmanaged_type
, it needed to be changed to allow pointers to pointers. The version included does this with the +
operator, while this is used elsewhere in the grammar it is usually applied to non-terminals and here it is to a literal ('*'+
). An alternative would be to make the rule left-recursive, which ANTLR can handle, by dropping the +
's and adding an alternative pointer_type '*'
. The version here produces a "flat-wide" parse/node, the alternative produces a "tree" parse/of nodes. I choose "flat-wide", but one could argue '*'+
is a little hard to read... Express your preference!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think putting parentheses, as in ('*')+
, would be enough to make it more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MadsTorgersen – That's harmless enough ;-) Done.
… of the lexical grammar was merged into draft-v6. There are a few minor changes to lexicial-structure.md – fixing incorrect text resulting from having parallel PRs doing slightly different things, the odd grammar fix, etc. The grammar.md file (auto produced by the tooling) reflects as much ANTLR-isation of the grammar as we currently intend to put into the nromative text of the Standard. It doesn't quite get through ANTLR as is, internal tooling to make the few changes required will probably be released to TG2 "soon".
It will be automatically replaced by a tooling generated one at some point. This doesn't change anything in the PR (it was extracted from the other files).
@@ -26,7 +26,12 @@ enum_declaration | |||
; | |||
|
|||
enum_base | |||
: ':' struct_type | |||
: ':' integral_type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems fine
@@ -1129,6 +1129,8 @@ primary_no_array_creation_expression | |||
; | |||
``` | |||
|
|||
> *Note*: These grammar rules are not ANTLR-ready as they are part of a set of mutually left-recursive rules (`primary_expression`, `primary_no_array_creation_expression`, `member_access`, `invocation_expression`, `element_access`, `post_increment_expression`, `post_decrement_expression`, `pointer_member_access` and `pointer_element_access`) which ANTLR does not handle. Standard techniques can be used to transform the grammar to remove the mutual left-recursion. This has not been done as not all parsing strategies require it (e.g. an LALR parser would not) and doing so would obfuscate the structure and description. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your comment in the change that a refactoring would obscure the structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great work @Nigel-Ecma
I approve as well. I have one question, just to make sure we don't lose anything: While reviewing, I think I could map all the changes in the grammar.md Annex to changes you made in each of the normative clauses. Can you verify that? I'm concerned that we would lose edits when the tool overwrites the annex with the text from each clause.
I'll merge once you that I didn't miss anything in review.
Nigel verified that all grammar changes were made in the normative clauses. Merging now. |
IMPORTANT: As PR#342 which ANTLR-ises the lexical & preprocessor grammars is not yet merged those
changes are NOT in the grammar in this PR. For the files changed by this PR this means you'll see
Identifier
rather thanidentifier
etc.This PR makes the small number of changes to ANTLR-ise all but
primary_no_array_creation_expression
which is being left in its mutual left recursive state for now/ever...
There is very little change to the descriptive text, these are mainly small grammar tweaks to remove
mutual left recursion.
I'll add a GitHub comment or two to explain the changes.