Skip to content

ANTLR-isation of parse/syntactic grammar #351

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Sep 27, 2021
Merged

ANTLR-isation of parse/syntactic grammar #351

merged 6 commits into from
Sep 27, 2021

Conversation

Nigel-Ecma
Copy link
Contributor

IMPORTANT: As PR#342 which ANTLR-ises the lexical & preprocessor grammars is not yet merged those
changes are NOT in the grammar in this PR. For the files changed by this PR this means you'll see
Identifier rather than identifier etc.

This PR makes the small number of changes to ANTLR-ise all but primary_no_array_creation_expression
which is being left in its mutual left recursive state for now/ever...

There is very little change to the descriptive text, these are mainly small grammar tweaks to remove
mutual left recursion.

I'll add a GitHub comment or two to explain the changes.

IMPORTANT: As PR#342 which ANTLR-ises the lexical & preprocessor grammars is not yet merged those
changes are NOT in the grammar in this PR. For the files changed by this PR this means you'll see
`Identifier` rather than `identifier` etc.

This PR makes the small number of changes to ANTLR-ise all but `primary_no_array_creation_expression`
which is being left in its mutual left recursive state for now/ever...

There is very little change to the descriptive text, these are mainly small grammar tweaks to remove
mutual left recursion.
@Nigel-Ecma Nigel-Ecma self-assigned this Jul 2, 2021
@@ -26,7 +26,12 @@ enum_declaration
;

enum_base
: ':' struct_type
: ':' integral_type
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes narrow down enum_base to avoid bad stuff happening. Text below has a small addition to describe the two alternatives.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems fine

@@ -1129,6 +1129,8 @@ primary_no_array_creation_expression
;
```

> *Note*: These grammar rules are not ANTLR-ready as they are part of a set of mutually left-recursive rules (`primary_expression`, `primary_no_array_creation_expression`, `member_access`, `invocation_expression`, `element_access`, `post_increment_expression`, `post_decrement_expression`, `pointer_member_access` and `pointer_element_access`) which ANTLR does not handle. Standard techniques can be used to transform the grammar to remove the mutual left-recursion. This has not been done as not all parsing strategies require it (e.g. an LALR parser would not) and doing so would obfuscate the structure and description.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the one mutual left recursive set of rules being left. To factor the rules in the list can be inlined into primary_no_array_creation_expression (see the Parse_CSharp.g4 file sent out). If you've an opinion on removing/leaving the MLR please comment, or would like propose an alternative grammar please do so!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your comment in the change that a refactoring would obscure the structure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree as well.

@@ -2329,14 +2331,13 @@ nameof_expression
;

named_entity
: simple_name
| named_entity_target '.' identifier type_argument_list?
: named_entity_target ('.' identifier type_argument_list?)*
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trivial MLR removal

@@ -139,14 +139,18 @@ A value type is either a struct type or an enumeration type. C# provides a set o

```ANTLR
value_type
: non_nullable_value_type
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rules here formed an MLR set, just a bit or re-org fixed that.

@@ -99,8 +99,8 @@ A *pointer_type* is written as an *unmanaged_type* ([§9.8](types.md#98-unmanage

```ANTLR
pointer_type
: unmanaged_type '*'
| 'void' '*'
: value_type '*'+
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is a consequence of changing unmanaged_type, it needed to be changed to allow pointers to pointers. The version included does this with the + operator, while this is used elsewhere in the grammar it is usually applied to non-terminals and here it is to a literal ('*'+). An alternative would be to make the rule left-recursive, which ANTLR can handle, by dropping the +'s and adding an alternative pointer_type '*'. The version here produces a "flat-wide" parse/node, the alternative produces a "tree" parse/of nodes. I choose "flat-wide", but one could argue '*'+ is a little hard to read... Express your preference!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think putting parentheses, as in ('*')+, would be enough to make it more readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MadsTorgersen –  That's harmless enough ;-) Done.

@Nigel-Ecma Nigel-Ecma marked this pull request as ready for review July 20, 2021 03:48
… of the lexical grammar

was merged into draft-v6.

There are a few minor changes to lexicial-structure.md – fixing incorrect text resulting from
having parallel PRs doing slightly different things, the odd grammar fix, etc.

The grammar.md file (auto produced by the tooling) reflects as much ANTLR-isation of the grammar
as we currently intend to put into the nromative text of the Standard. It doesn't quite get through
ANTLR as is, internal tooling to make the few changes required will probably be released to TG2
"soon".
It will be automatically replaced by a tooling generated one at some point.
This doesn't change anything in the PR (it was extracted from the other files).
@Nigel-Ecma Nigel-Ecma added the meeting: discuss This issue should be discussed at the next TC49-TG2 meeting label Sep 17, 2021
@@ -26,7 +26,12 @@ enum_declaration
;

enum_base
: ':' struct_type
: ':' integral_type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems fine

@@ -1129,6 +1129,8 @@ primary_no_array_creation_expression
;
```

> *Note*: These grammar rules are not ANTLR-ready as they are part of a set of mutually left-recursive rules (`primary_expression`, `primary_no_array_creation_expression`, `member_access`, `invocation_expression`, `element_access`, `post_increment_expression`, `post_decrement_expression`, `pointer_member_access` and `pointer_element_access`) which ANTLR does not handle. Standard techniques can be used to transform the grammar to remove the mutual left-recursion. This has not been done as not all parsing strategies require it (e.g. an LALR parser would not) and doing so would obfuscate the structure and description.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your comment in the change that a refactoring would obscure the structure.

Copy link
Member

@BillWagner BillWagner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great work @Nigel-Ecma

I approve as well. I have one question, just to make sure we don't lose anything: While reviewing, I think I could map all the changes in the grammar.md Annex to changes you made in each of the normative clauses. Can you verify that? I'm concerned that we would lose edits when the tool overwrites the annex with the text from each clause.

I'll merge once you that I didn't miss anything in review.

@BillWagner
Copy link
Member

Nigel verified that all grammar changes were made in the normative clauses. Merging now.

@BillWagner BillWagner merged commit 866d1be into dotnet:draft-v6 Sep 27, 2021
@RexJaeschke RexJaeschke mentioned this pull request Nov 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meeting: discuss This issue should be discussed at the next TC49-TG2 meeting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants