You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are issues with the clause §6.2.5 Grammar ambiguities that need addressing:
It has not kept up as new constructs which need to be, or need not to be, disambiguated
have been added to the language.
While how the ambiguities should be resolved within those constructs has been updated
over time it appears that the translation from source material to the Standard – and the
two can be quite different in approach – has unintentionally resulted in incorrect rules.
The suggestions below have been tested against the current (at the time of writing!)
draft-v8 of the Standard using the grammar checker; the latter has been updated to include
additional test samples and revised support code.
A draft PR should follow shortly.
The “source material” mentioned above, and later, is the usual collection of design and
implementation documents used for the Standard, which include the Roslyn compiler.
For those who wish to consult Roslyn while reading this, look for the ScanTypeArgumentList method, which can be found in LanguageParser.cs and work your
way from there.
Note: rather than use the roslyn-Visual-Studio-2019-Version-16.3 (C# 8).zip go with roslyn-Visual-Studio-2019-Version-16.8 (C# 9).zip. The latter includes a C# v8 change
not included in the former; presumably introduced somewhere in 16.4-16.7.
A. The grammar rules requiring disambiguation
§6.2.5 lists the rules that require disambiguation in three places: right at the start (which
misses one), after the first example, and after the bulleted list. They are:
simple_name (§12.8.4);
member_access (§12.8.7); and
pointer_member_access (§23.6.3).
The common characteristic of these rules is that they all appear in an expression context
and each has one or more alternatives which end in identifier type_argument_list?. Being
in an expression context means that the containing expression may continue after the type_argument_list and this may produce an ambiguity between recognising the optional type_argument_list and recognising the ‘<’ and ‘>’ that would enclose it as
operators within the larger expression. The first example in §6.2.5 covers this.
There are other rules with the same properties which therefore also might require
disambiguation and thus need to be included, these are:
base_access (§12.8.15);
null_conditional_member_access (§12.8.8); and
dependent_access (§12.8.8)
B. The grammar rules NOT requiring disambiguation
§6.2.5 also contains the list of rules which do not require disambiguation but rather
the type_argument_list should always be parsed if present. This list is in a Note
just before the examples:
namespace_or_type_name (§7.8)
(Yes, not quite a list, yet…) Here the common characteristic is ending in identifier type_argument_list? but occurring in a type context. There are other
rules with the same properties which now need to be included:
named_entity (§12.8.23);
null_conditional_projection_initializer (§12.8.8); and
qualified_alias_member (§14.8.1);
C. The disambiguation rules
A & B above are simple housekeeping, this one is a bit more involved. I’ll get a bit
more waffly than my usual waffly level here, hopefully it helps…
A grammar rule is normally recognised when the input tokens match the rule. The
disambiguation rules effectively specify when not to recognise the input as a type_argument_list even when the input matches the rule.
The rules are expressed in the positive: the rule should be recognised if and only if the input
tokens match the grammar and the disambiguation rules pass. The disambiguation rules
are based on the following, and sometimes preceding, tokens. If the following token is [§6.2.5]:
One of ( ) ] } : ; , . ? == != | ^ && || & [; or
One of the relational operators < <= >= is as; or
A contextual query keyword appearing inside a query expression; or
In certain contexts, identifier is treated as a disambiguating token. Those contexts are where the sequence of tokens being disambiguated is immediately preceded by one of the keywords is, case or out, or arises while parsing the first element of a tuple literal (in which case the tokens are preceded by ( or : and the identifier is followed by a ,) or a subsequent element of a tuple literal.
then the type_argument_list should be recognised, otherwise not (and the input tokens
left to be recognised by other grammar rules).
Remember: this disambiguation only applies when recognising the grammar rules in (A)
above. For those rules in (B) there is no disambiguation, if the tokens match the type_argument_list rule they are to be recognised as such.
The first and second bullets are the list of following tokens that can be used to
disambiguate, it has change as the language has evolved, and will change again. It
fits in with the Standard, no issues here.
The third and fourth bullets are recent additions.
The third deals with query_expressions, and is essentially a contextual extension of the
first and second bullets. In query_expressions embedded expressions, which would elsewhere be
terminated by one of the tokens listed in the first and second bullets,
may be terminated by a contextual query keyword. This fits in the the Standard just as the first
two bullets do.
The fourth bullet is an different beast, it adds rules for when the following token is
an identifier and either the input tokens matching type_argument_listand the
preceding identifier (remember all the rules this applies to end in indentifier type_argument_list?) is:
preceded itself by is, case or out; or
together with the following identifier form the whole of a tuple element.
What is being recognised here?
We have: identifiertype_argument_listidentifier; either following is/case/out or forming a tuple element.
And that sequence is matched by (give yourself a drum roll if you’ve figured it out) [§12.17]:
The type part of a declaration_expression is recognised by the namespace_or_type_name
rule;
that rule is in (B) above, the list of rules for which disambiguation is not done; and
so this disambiguation rule is pointless in the specification as it is unreachable.
Important:
The rule is not wrongper se! An implementation following the Standard
may choose to use this approach, and one does, but it does not need to.
So the fourth bullet is implementation detail, and it also contradicts/overlaps with
statements on which grammar rules these disambiguation rules apply to, so should not
be there. When we used the source material during writing this clause this was missed.
Yes it’s the end, for now… recent work on the grammar checker did turn up a few other issues
which I’m sure you will wait for with bated breath! 😉
As mentioned above I’ll follow with a draft PR in a short while, till then comment away!
The text was updated successfully, but these errors were encountered:
There are issues with the clause §6.2.5 Grammar ambiguities that need addressing:
It has not kept up as new constructs which need to be, or need not to be, disambiguated
have been added to the language.
While how the ambiguities should be resolved within those constructs has been updated
over time it appears that the translation from source material to the Standard – and the
two can be quite different in approach – has unintentionally resulted in incorrect rules.
The suggestions below have been tested against the current (at the time of writing!)
draft-v8 of the Standard using the grammar checker; the latter has been updated to include
additional test samples and revised support code.
A draft PR should follow shortly.
The “source material” mentioned above, and later, is the usual collection of design and
implementation documents used for the Standard, which include the Roslyn compiler.
For those who wish to consult Roslyn while reading this, look for the
ScanTypeArgumentList
method, which can be found inLanguageParser.cs
and work yourway from there.
A. The grammar rules requiring disambiguation
§6.2.5 lists the rules that require disambiguation in three places: right at the start (which
misses one), after the first example, and after the bulleted list. They are:
The common characteristic of these rules is that they all appear in an expression context
and each has one or more alternatives which end in
identifier type_argument_list?
. Beingin an expression context means that the containing expression may continue after the
type_argument_list and this may produce an ambiguity between recognising the optional
type_argument_list and recognising the ‘
<
’ and ‘>
’ that would enclose it asoperators within the larger expression. The first example in §6.2.5 covers this.
There are other rules with the same properties which therefore also might require
disambiguation and thus need to be included, these are:
B. The grammar rules NOT requiring disambiguation
§6.2.5 also contains the list of rules which do not require disambiguation but rather
the type_argument_list should always be parsed if present. This list is in a Note
just before the examples:
(Yes, not quite a list, yet…) Here the common characteristic is ending in
identifier type_argument_list?
but occurring in a type context. There are otherrules with the same properties which now need to be included:
C. The disambiguation rules
A & B above are simple housekeeping, this one is a bit more involved. I’ll get a bit
more waffly than my usual waffly level here, hopefully it helps…
A grammar rule is normally recognised when the input tokens match the rule. The
disambiguation rules effectively specify when not to recognise the input as a
type_argument_list even when the input matches the rule.
The rules are expressed in the positive: the rule should be recognised if and only if the input
tokens match the grammar and the disambiguation rules pass. The disambiguation rules
are based on the following, and sometimes preceding, tokens. If the following token is [§6.2.5]:
then the type_argument_list should be recognised, otherwise not (and the input tokens
left to be recognised by other grammar rules).
The first and second bullets are the list of following tokens that can be used to
disambiguate, it has change as the language has evolved, and will change again. It
fits in with the Standard, no issues here.
The third and fourth bullets are recent additions.
The third deals with query_expressions, and is essentially a contextual extension of the
first and second bullets. In query_expressions embedded expressions, which would elsewhere be
terminated by one of the tokens listed in the first and second bullets,
may be terminated by a contextual query keyword. This fits in the the Standard just as the first
two bullets do.
The fourth bullet is an different beast, it adds rules for when the following token is
an identifier and either the input tokens matching type_argument_list and the
preceding identifier (remember all the rules this applies to end in
indentifier type_argument_list?
) is:is
,case
orout
; orWhat is being recognised here?
We have: identifier type_argument_list identifier; either following
is
/case
/out
or forming a tuple element.And that sequence is matched by (give yourself a drum roll if you’ve figured it out) [§12.17]:
The fourth bullet is about correctly recognising declaration_expressions containing generic types.
So what?
The grammar recognises this sequence of tokens using the following rules:
Which gets us to:
rule;
Important:
So the fourth bullet is implementation detail, and it also contradicts/overlaps with
statements on which grammar rules these disambiguation rules apply to, so should not
be there. When we used the source material during writing this clause this was missed.
Yes it’s the end, for now… recent work on the grammar checker did turn up a few other issues
which I’m sure you will wait for with bated breath! 😉
As mentioned above I’ll follow with a draft PR in a short while, till then comment away!
The text was updated successfully, but these errors were encountered: