Updates to §6.2.5 Grammar ambiguities

There are issues with the clause §6.2.5 *Grammar ambiguities* that need addressing:

- It has not kept up as new constructs which need to be, or need not to be, disambiguated
have been added to the language.

- While *how* the ambiguities should be resolved within those constructs has been updated
over time it appears that the translation from source material to the Standard – and the
two can be quite different in approach – has unintentionally resulted in incorrect rules.

The suggestions below have been tested against the current (at the time of writing!)
draft-v8 of the Standard using the grammar checker; the latter has been updated to include
additional test samples and revised support code.

A draft PR should follow shortly.

The “source material” mentioned above, and later, is the usual collection of design and
implementation documents used for the Standard, which include the Roslyn compiler. 

For those who wish to consult Roslyn while reading this, look for the
`ScanTypeArgumentList` method, which can be found in `LanguageParser.cs` and work your
way from there.

> Note: rather than use the `roslyn-Visual-Studio-2019-Version-16.3 (C# 8).zip` go with
`roslyn-Visual-Studio-2019-Version-16.8 (C# 9).zip`. The latter includes a C# v8 change
not included in the former; presumably introduced somewhere in 16.4-16.7.

---

## A. The grammar rules requiring disambiguation

§6.2.5 lists the rules that require disambiguation in three places: right at the start (which
misses one), after the first example, and after the bulleted list. They are:

- *simple_name* (§12.8.4);
- *member_access* (§12.8.7); and
- *pointer_member_access* (§23.6.3).

The common characteristic of these rules is that they all appear in an expression context
and each has one or more alternatives which end in `identifier type_argument_list?`. Being
in an expression context means that the containing expression may continue after the
*type_argument_list* and this may produce an ambiguity between recognising the optional
*type_argument_list* and recognising the ‘`<`’ and ‘`>`’ that would enclose it as
operators within the larger expression. The first example in §6.2.5 covers this.

There are other rules with the same properties which therefore also might require
disambiguation and thus need to be included, these are:

- *base_access* (§12.8.15);
- *null_conditional_member_access* (§12.8.8); and
- *dependent_access* (§12.8.8)

## B. The grammar rules NOT requiring disambiguation

§6.2.5 also contains the list of rules which do not require disambiguation but rather
the *type_argument_list* should always be parsed if present. This list is in a *Note*
just before the examples:

- *namespace_or_type_name* (§7.8)

(Yes, not quite a list, yet…) Here the common characteristic is ending in
`identifier type_argument_list?` but occurring in a type context. There are other
rules with the same properties which now need to be included:

- *named_entity* (§12.8.23);
- *null_conditional_projection_initializer* (§12.8.8); and
- *qualified_alias_member* (§14.8.1);

---

## C. The disambiguation rules

A & B above are simple housekeeping, this one is a bit more involved. I’ll get a bit
more waffly than my usual waffly level here, hopefully it helps…

A grammar rule is normally recognised when the input tokens match the rule. The
disambiguation rules effectively specify when *not* to recognise the input as a
*type_argument_list* even when the input matches the rule.

The rules are expressed in the positive: the rule should be recognised if and only if the input
tokens match the grammar **and** the disambiguation rules pass. The disambiguation rules
are based on the following, and sometimes preceding, tokens. If the following token is [§6.2.5]:

> - One of `(  )  ]  }  :  ;  ,  .  ?  ==  !=  |  ^  &&  ||  &  [`; or
> - One of the relational operators `<  <=  >=  is as`; or
> - A contextual query keyword appearing inside a query expression; or
> - In certain contexts, *identifier* is treated as a disambiguating token. Those contexts are where the sequence of tokens being disambiguated is immediately preceded by one of the keywords `is`, `case` or `out`, or arises while parsing the first element of a tuple literal (in which case the tokens are preceded by `(` or `:` and the identifier is followed by a `,`) or a subsequent element of a tuple literal.

then the *type_argument_list* should be recognised, otherwise not (and the input tokens
left to be recognised by other grammar rules).

> **Remember:** this disambiguation *only* applies when recognising the grammar rules in (A)
above. For those rules in (B) there is no disambiguation, if the tokens match the
*type_argument_list* rule they are to be recognised as such.

The first and second bullets are the list of following tokens that can be used to
disambiguate, it has change as the language has evolved, and will change again. It
fits in with the Standard, no issues here.

The third and fourth bullets are recent additions.

The third deals with *query_expression*s, and is essentially a contextual extension of the
first and second bullets. In *query_expression*s embedded *expressions*, which would elsewhere be
terminated by one of the tokens listed in the first and second bullets,
may be terminated by a contextual query keyword. This fits in the the Standard just as the first
two bullets do.

The fourth bullet is an different beast, it adds rules for when the following token is
an *identifier* and either the input tokens matching *type_argument_list* **and** the
preceding *identifier* (remember all the rules this applies to end in
`indentifier type_argument_list?`) is:

- preceded itself by `is`, `case` or `out`; or
- together with the following *identifier* form the whole of a tuple element.

What is being recognised here?

We have: *identifier* *type_argument_list* *identifier*; either following
`is`/`case`/`out` or forming a tuple element.

And that sequence is matched by (give yourself a drum roll if you’ve figured it out) [§12.17]:

```ANTLR
    declaration_expression
        : local_variable_type identifier
        ;
```

The fourth bullet is about correctly recognising *declaration_expression*s containing generic types.

So what?

The grammar recognises this sequence of tokens using the following rules:

- *declaration_expression* => *local_variable_type* *identifier*
- *local_variable_type* => *type*
- *type* => … => *type_name* (via *value_type* or *reference_type* and so on)
- *type_name* => *namespace_or_type_name*
- *namespace_or_type_name* => *identifier* *type_argument_list*?

Which gets us to:

- The type part of a *declaration_expression* is recognised by the *namespace_or_type_name*
rule;
- that rule is in (B) above, the list of rules for which disambiguation is not done; and
- so this disambiguation rule is pointless in the specification as it is unreachable.

**Important:**

> The rule is **not wrong** *per se*! An implementation following the Standard
may *choose* to use this approach, and one does, but it does not *need* to.

So the fourth bullet is implementation detail, and it also contradicts/overlaps with
statements on which grammar rules these disambiguation rules apply to, so should not
be there. When we used the source material during writing this clause this was missed.

---

Yes it’s the end, for now… recent work on the grammar checker did turn up a few other issues
which I’m sure you will wait for with bated breath! 😉

As mentioned above I’ll follow with a draft PR in a short while, till then comment away!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Updates to §6.2.5 Grammar ambiguities #1283

A. The grammar rules requiring disambiguation

B. The grammar rules NOT requiring disambiguation

C. The disambiguation rules

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Updates to §6.2.5 Grammar ambiguities #1283

Description

A. The grammar rules requiring disambiguation

B. The grammar rules NOT requiring disambiguation

C. The disambiguation rules

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions