ANTLR-isation of parse/syntactic grammar #351

Nigel-Ecma · 2021-07-02T05:29:08Z

IMPORTANT: As PR#342 which ANTLR-ises the lexical & preprocessor grammars is not yet merged those
changes are NOT in the grammar in this PR. For the files changed by this PR this means you'll see
Identifier rather than identifier etc.

This PR makes the small number of changes to ANTLR-ise all but primary_no_array_creation_expression
which is being left in its mutual left recursive state for now/ever...

There is very little change to the descriptive text, these are mainly small grammar tweaks to remove
mutual left recursion.

I'll add a GitHub comment or two to explain the changes.

IMPORTANT: As PR#342 which ANTLR-ises the lexical & preprocessor grammars is not yet merged those changes are NOT in the grammar in this PR. For the files changed by this PR this means you'll see `Identifier` rather than `identifier` etc. This PR makes the small number of changes to ANTLR-ise all but `primary_no_array_creation_expression` which is being left in its mutual left recursive state for now/ever... There is very little change to the descriptive text, these are mainly small grammar tweaks to remove mutual left recursion.

Nigel-Ecma · 2021-07-02T05:31:46Z

standard/enums.md

@@ -26,7 +26,12 @@ enum_declaration
    ;

 enum_base
-    : ':' struct_type
+    : ':' integral_type


These changes narrow down enum_base to avoid bad stuff happening. Text below has a small addition to describe the two alternatives.

That seems fine

Nigel-Ecma · 2021-07-02T05:35:51Z

standard/expressions.md

@@ -1129,6 +1129,8 @@ primary_no_array_creation_expression
    ;
 ```

+> *Note*: These grammar rules are not ANTLR-ready as they are part of a set of mutually left-recursive rules (`primary_expression`, `primary_no_array_creation_expression`, `member_access`, `invocation_expression`, `element_access`, `post_increment_expression`, `post_decrement_expression`, `pointer_member_access` and `pointer_element_access`) which ANTLR does not handle. Standard techniques can be used to transform the grammar to remove the mutual left-recursion. This has not been done as not all parsing strategies require it (e.g. an LALR parser would not) and doing so would obfuscate the structure and description.


This is the one mutual left recursive set of rules being left. To factor the rules in the list can be inlined into primary_no_array_creation_expression (see the Parse_CSharp.g4 file sent out). If you've an opinion on removing/leaving the MLR please comment, or would like propose an alternative grammar please do so!

I agree with your comment in the change that a refactoring would obscure the structure.

I agree as well.

Nigel-Ecma · 2021-07-02T05:36:29Z

standard/expressions.md

@@ -2329,14 +2331,13 @@ nameof_expression
    ;

 named_entity
-    : simple_name
-    | named_entity_target '.' identifier type_argument_list?
+    : named_entity_target ('.' identifier type_argument_list?)*


Trivial MLR removal

Nigel-Ecma · 2021-07-02T05:37:15Z

standard/types.md

@@ -139,14 +139,18 @@ A value type is either a struct type or an enumeration type. C# provides a set o

 ```ANTLR
 value_type
+    : non_nullable_value_type


The rules here formed an MLR set, just a bit or re-org fixed that.

Nigel-Ecma · 2021-07-02T05:43:55Z

standard/unsafe-code.md

@@ -99,8 +99,8 @@ A *pointer_type* is written as an *unmanaged_type* ([§9.8](types.md#98-unmanage

 ```ANTLR
 pointer_type
-    : unmanaged_type '*'
-    | 'void' '*'
+    : value_type '*'+


This change is a consequence of changing unmanaged_type, it needed to be changed to allow pointers to pointers. The version included does this with the + operator, while this is used elsewhere in the grammar it is usually applied to non-terminals and here it is to a literal ('*'+). An alternative would be to make the rule left-recursive, which ANTLR can handle, by dropping the +'s and adding an alternative pointer_type '*'. The version here produces a "flat-wide" parse/node, the alternative produces a "tree" parse/of nodes. I choose "flat-wide", but one could argue '*'+ is a little hard to read... Express your preference!

I think putting parentheses, as in ('*')+, would be enough to make it more readable.

@MadsTorgersen – That's harmless enough ;-) Done.

… of the lexical grammar was merged into draft-v6. There are a few minor changes to lexicial-structure.md – fixing incorrect text resulting from having parallel PRs doing slightly different things, the odd grammar fix, etc. The grammar.md file (auto produced by the tooling) reflects as much ANTLR-isation of the grammar as we currently intend to put into the nromative text of the Standard. It doesn't quite get through ANTLR as is, internal tooling to make the few changes required will probably be released to TG2 "soon".

It will be automatically replaced by a tooling generated one at some point. This doesn't change anything in the PR (it was extracted from the other files).

MadsTorgersen · 2021-09-24T17:52:14Z

standard/enums.md

@@ -26,7 +26,12 @@ enum_declaration
    ;

 enum_base
-    : ':' struct_type
+    : ':' integral_type


That seems fine

MadsTorgersen · 2021-09-24T17:53:41Z

standard/expressions.md

@@ -1129,6 +1129,8 @@ primary_no_array_creation_expression
    ;
 ```

+> *Note*: These grammar rules are not ANTLR-ready as they are part of a set of mutually left-recursive rules (`primary_expression`, `primary_no_array_creation_expression`, `member_access`, `invocation_expression`, `element_access`, `post_increment_expression`, `post_decrement_expression`, `pointer_member_access` and `pointer_element_access`) which ANTLR does not handle. Standard techniques can be used to transform the grammar to remove the mutual left-recursion. This has not been done as not all parsing strategies require it (e.g. an LALR parser would not) and doing so would obfuscate the structure and description.


I agree with your comment in the change that a refactoring would obscure the structure.

…ggestion

BillWagner

This is great work @Nigel-Ecma

I approve as well. I have one question, just to make sure we don't lose anything: While reviewing, I think I could map all the changes in the grammar.md Annex to changes you made in each of the normative clauses. Can you verify that? I'm concerned that we would lose edits when the tool overwrites the annex with the text from each clause.

I'll merge once you that I didn't miss anything in review.

BillWagner · 2021-09-27T20:29:59Z

Nigel verified that all grammar changes were made in the normative clauses. Merging now.

Nigel-Ecma requested review from gafter and MadsTorgersen July 2, 2021 05:29

Nigel-Ecma self-assigned this Jul 2, 2021

Nigel-Ecma commented Jul 2, 2021

View reviewed changes

Nigel-Ecma marked this pull request as ready for review July 20, 2021 03:48

Nigel-Ecma added 4 commits August 2, 2021 11:10

Merge branch 'dotnet:draft-v6' into antlr-parser

82fecf6

This is just a temporary manually generated grammar.md

aeee09a

It will be automatically replaced by a tooling generated one at some point. This doesn't change anything in the PR (it was extracted from the other files).

Just fixing an edit/merge/pull snafu – diff is your friend!

3e0ecf6

Nigel-Ecma added the meeting: discuss This issue should be discussed at the next TC49-TG2 meeting label Sep 17, 2021

MadsTorgersen approved these changes Sep 24, 2021

View reviewed changes

Add some parens (for visual purposes) to pointer_type as per @Mads su…

87af7ac

…ggestion

BillWagner approved these changes Sep 27, 2021

View reviewed changes

BillWagner merged commit 866d1be into dotnet:draft-v6 Sep 27, 2021

RexJaeschke mentioned this pull request Oct 10, 2021

ANTLR: Deciding on how far to go with this #37

Closed

RexJaeschke mentioned this pull request Nov 25, 2021

19.2 Enum base type #399

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ANTLR-isation of parse/syntactic grammar #351

ANTLR-isation of parse/syntactic grammar #351

Uh oh!

Nigel-Ecma commented Jul 2, 2021

Uh oh!

Nigel-Ecma Jul 2, 2021

Uh oh!

MadsTorgersen Sep 24, 2021

Uh oh!

Nigel-Ecma Jul 2, 2021

Uh oh!

MadsTorgersen Sep 24, 2021

Uh oh!

BillWagner Sep 27, 2021

Uh oh!

Nigel-Ecma Jul 2, 2021

Uh oh!

Nigel-Ecma Jul 2, 2021

Uh oh!

Nigel-Ecma Jul 2, 2021

Uh oh!

MadsTorgersen Sep 24, 2021

Uh oh!

Nigel-Ecma Sep 26, 2021

Uh oh!

MadsTorgersen Sep 24, 2021

Uh oh!

MadsTorgersen Sep 24, 2021

Uh oh!

BillWagner left a comment

Uh oh!

BillWagner commented Sep 27, 2021

Uh oh!

Uh oh!

ANTLR-isation of parse/syntactic grammar #351

ANTLR-isation of parse/syntactic grammar #351

Uh oh!

Conversation

Nigel-Ecma commented Jul 2, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BillWagner left a comment

Choose a reason for hiding this comment

Uh oh!

BillWagner commented Sep 27, 2021

Uh oh!

Uh oh!