Skip to content

Updates to §6.2.5 Grammar ambiguities #1283

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Nigel-Ecma opened this issue Mar 7, 2025 · 0 comments
Open

Updates to §6.2.5 Grammar ambiguities #1283

Nigel-Ecma opened this issue Mar 7, 2025 · 0 comments
Assignees
Milestone

Comments

@Nigel-Ecma
Copy link
Contributor

There are issues with the clause §6.2.5 Grammar ambiguities that need addressing:

  • It has not kept up as new constructs which need to be, or need not to be, disambiguated
    have been added to the language.

  • While how the ambiguities should be resolved within those constructs has been updated
    over time it appears that the translation from source material to the Standard – and the
    two can be quite different in approach – has unintentionally resulted in incorrect rules.

The suggestions below have been tested against the current (at the time of writing!)
draft-v8 of the Standard using the grammar checker; the latter has been updated to include
additional test samples and revised support code.

A draft PR should follow shortly.

The “source material” mentioned above, and later, is the usual collection of design and
implementation documents used for the Standard, which include the Roslyn compiler.

For those who wish to consult Roslyn while reading this, look for the
ScanTypeArgumentList method, which can be found in LanguageParser.cs and work your
way from there.

Note: rather than use the roslyn-Visual-Studio-2019-Version-16.3 (C# 8).zip go with
roslyn-Visual-Studio-2019-Version-16.8 (C# 9).zip. The latter includes a C# v8 change
not included in the former; presumably introduced somewhere in 16.4-16.7.


A. The grammar rules requiring disambiguation

§6.2.5 lists the rules that require disambiguation in three places: right at the start (which
misses one), after the first example, and after the bulleted list. They are:

  • simple_name (§12.8.4);
  • member_access (§12.8.7); and
  • pointer_member_access (§23.6.3).

The common characteristic of these rules is that they all appear in an expression context
and each has one or more alternatives which end in identifier type_argument_list?. Being
in an expression context means that the containing expression may continue after the
type_argument_list and this may produce an ambiguity between recognising the optional
type_argument_list and recognising the ‘<’ and ‘>’ that would enclose it as
operators within the larger expression. The first example in §6.2.5 covers this.

There are other rules with the same properties which therefore also might require
disambiguation and thus need to be included, these are:

  • base_access (§12.8.15);
  • null_conditional_member_access (§12.8.8); and
  • dependent_access (§12.8.8)

B. The grammar rules NOT requiring disambiguation

§6.2.5 also contains the list of rules which do not require disambiguation but rather
the type_argument_list should always be parsed if present. This list is in a Note
just before the examples:

  • namespace_or_type_name (§7.8)

(Yes, not quite a list, yet…) Here the common characteristic is ending in
identifier type_argument_list? but occurring in a type context. There are other
rules with the same properties which now need to be included:

  • named_entity (§12.8.23);
  • null_conditional_projection_initializer (§12.8.8); and
  • qualified_alias_member (§14.8.1);

C. The disambiguation rules

A & B above are simple housekeeping, this one is a bit more involved. I’ll get a bit
more waffly than my usual waffly level here, hopefully it helps…

A grammar rule is normally recognised when the input tokens match the rule. The
disambiguation rules effectively specify when not to recognise the input as a
type_argument_list even when the input matches the rule.

The rules are expressed in the positive: the rule should be recognised if and only if the input
tokens match the grammar and the disambiguation rules pass. The disambiguation rules
are based on the following, and sometimes preceding, tokens. If the following token is [§6.2.5]:

  • One of ( ) ] } : ; , . ? == != | ^ && || & [; or
  • One of the relational operators < <= >= is as; or
  • A contextual query keyword appearing inside a query expression; or
  • In certain contexts, identifier is treated as a disambiguating token. Those contexts are where the sequence of tokens being disambiguated is immediately preceded by one of the keywords is, case or out, or arises while parsing the first element of a tuple literal (in which case the tokens are preceded by ( or : and the identifier is followed by a ,) or a subsequent element of a tuple literal.

then the type_argument_list should be recognised, otherwise not (and the input tokens
left to be recognised by other grammar rules).

Remember: this disambiguation only applies when recognising the grammar rules in (A)
above. For those rules in (B) there is no disambiguation, if the tokens match the
type_argument_list rule they are to be recognised as such.

The first and second bullets are the list of following tokens that can be used to
disambiguate, it has change as the language has evolved, and will change again. It
fits in with the Standard, no issues here.

The third and fourth bullets are recent additions.

The third deals with query_expressions, and is essentially a contextual extension of the
first and second bullets. In query_expressions embedded expressions, which would elsewhere be
terminated by one of the tokens listed in the first and second bullets,
may be terminated by a contextual query keyword. This fits in the the Standard just as the first
two bullets do.

The fourth bullet is an different beast, it adds rules for when the following token is
an identifier and either the input tokens matching type_argument_list and the
preceding identifier (remember all the rules this applies to end in
indentifier type_argument_list?) is:

  • preceded itself by is, case or out; or
  • together with the following identifier form the whole of a tuple element.

What is being recognised here?

We have: identifier type_argument_list identifier; either following
is/case/out or forming a tuple element.

And that sequence is matched by (give yourself a drum roll if you’ve figured it out) [§12.17]:

    declaration_expression
        : local_variable_type identifier
        ;

The fourth bullet is about correctly recognising declaration_expressions containing generic types.

So what?

The grammar recognises this sequence of tokens using the following rules:

  • declaration_expression => local_variable_type identifier
  • local_variable_type => type
  • type => … => type_name (via value_type or reference_type and so on)
  • type_name => namespace_or_type_name
  • namespace_or_type_name => identifier type_argument_list?

Which gets us to:

  • The type part of a declaration_expression is recognised by the namespace_or_type_name
    rule;
  • that rule is in (B) above, the list of rules for which disambiguation is not done; and
  • so this disambiguation rule is pointless in the specification as it is unreachable.

Important:

The rule is not wrong per se! An implementation following the Standard
may choose to use this approach, and one does, but it does not need to.

So the fourth bullet is implementation detail, and it also contradicts/overlaps with
statements on which grammar rules these disambiguation rules apply to, so should not
be there. When we used the source material during writing this clause this was missed.


Yes it’s the end, for now… recent work on the grammar checker did turn up a few other issues
which I’m sure you will wait for with bated breath! 😉

As mentioned above I’ll follow with a draft PR in a short while, till then comment away!

@Nigel-Ecma Nigel-Ecma added this to the C# 8.0 milestone Mar 7, 2025
@Nigel-Ecma Nigel-Ecma self-assigned this Mar 7, 2025
@Nigel-Ecma Nigel-Ecma changed the title Updates to *Grammar ambiguities* (§6.2.5) Updates to §6.2.5 Grammar ambiguities Mar 7, 2025
Nigel-Ecma added a commit to Nigel-Ecma/csharpstandard-forked-draft-v8 that referenced this issue Mar 10, 2025
Updates §6.2.5 as per issue and also changes the descriptive style to the proscriptive requirement of the Standard
jskeet pushed a commit that referenced this issue Mar 19, 2025
Updates §6.2.5 as per issue and also changes the descriptive style to the proscriptive requirement of the Standard
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant