Skip to content

Add a new chapter to define concepts related to pattern-matching. #757

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Apr 11, 2023
Binary file not shown.
2 changes: 1 addition & 1 deletion .github/workflows/grammar-validator.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ jobs:
# Install build grammar global tool
- name: Install BuildGrammar tool
run: |
dotnet tool install --version 1.0.0-alpha.1 --global --add-source ./.github/workflows/dependencies/ EcmaTC49.BuildGrammar
dotnet tool install --version 1.0.0-alpha.2 --global --add-source ./.github/workflows/dependencies/ EcmaTC49.BuildGrammar


- name: run validate
Expand Down
1 change: 1 addition & 0 deletions standard/clauses.json
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
"types.md",
"variables.md",
"conversions.md",
"patterns.md",
"expressions.md",
"statements.md",
"namespaces.md",
Expand Down
17 changes: 16 additions & 1 deletion standard/expressions.md
Original file line number Diff line number Diff line change
Expand Up @@ -3783,6 +3783,7 @@ relational_expression
| relational_expression '<=' shift_expression
| relational_expression '>=' shift_expression
| relational_expression 'is' type
| relational_expression 'is' pattern
| relational_expression 'as' type
;

Expand Down Expand Up @@ -4152,7 +4153,11 @@ The tuple equality operator `x != y` is evaluated as follows:

### 11.12.12 The is operator

The `is` operator is used to check if the run-time type of an object is compatible with a given type. The check is performed at runtime. The result of the operation `E is T`, where `E` is an expression and `T` is a type other than `dynamic`, is a Boolean value indicating whether `E` is non-null and can successfully be converted to type `T` by a reference conversion, a boxing conversion, an unboxing conversion, a wrapping conversion, or an unwrapping conversion.
There are two forms of the `is` operator. One is the *is-type operator*, which has a type on the right-hand-side. The other is the *is-pattern operator*, which has a pattern on the right-hand-side.

#### The is-type operator

The *is-type operator* is used to check if the run-time type of an object is compatible with a given type. The check is performed at runtime. The result of the operation `E is T`, where `E` is an expression and `T` is a type other than `dynamic`, is a Boolean value indicating whether `E` is non-null and can successfully be converted to type `T` by a reference conversion, a boxing conversion, an unboxing conversion, a wrapping conversion, or an unwrapping conversion.

The operation is evaluated as follows:

Expand Down Expand Up @@ -4190,6 +4195,15 @@ User defined conversions are not considered by the `is` operator.
>
> *end note*

#### The is-pattern operator

The *is-pattern operator* is used to check if the value computed by an expression *matches* a given pattern (XREF TO DEF OF "PATTERN MATCHES"). The check is performed at runtime. The result of the is-pattern operator is true if the value matches the pattern; otherwise it is false.

For an expression of the form `E is P`, where `E` is a relational expression of type `T` and `P` is a pattern, it is a compile-time error if any of the following hold:

- `E` does not designate a value or does not have a type.
- The pattern `P` is not applicable (XREF NEEDED) to the type `T`.

### 11.12.13 The as operator

The `as` operator is used to explicitly convert a value to a given reference type or nullable value type. Unlike a cast expression ([§11.9.7](expressions.md#1197-cast-expressions)), the `as` operator never throws an exception. Instead, if the indicated conversion is not possible, the resulting value is `null`.
Expand Down Expand Up @@ -6322,6 +6336,7 @@ Constant expressions are required in the contexts listed below and this is indic
- `goto case` statements ([§12.10.4](statements.md#12104-the-goto-statement))
- Dimension lengths in an array creation expression ([§11.8.16.5](expressions.md#118165-array-creation-expressions)) that includes an initializer.
- Attributes ([§21](attributes.md#21-attributes))
- In a *constant_pattern* (§constant-pattern-new-clause)

An implicit constant expression conversion ([§10.2.11](conversions.md#10211-implicit-constant-expression-conversions)) permits a constant expression of type `int` to be converted to `sbyte`, `byte`, `short`, `ushort`, `uint`, or `ulong`, provided the value of the constant expression is within the range of the destination type.

Expand Down
28 changes: 22 additions & 6 deletions standard/lexical-structure.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,13 +65,14 @@ The productions for *simple_name* ([§11.8.4](expressions.md#1184-simple-names))
>
> *end example*

If a sequence of tokens can be parsed (in context) as a *simple_name* ([§11.8.4](expressions.md#1184-simple-names)), *member_access* ([§11.8.7](expressions.md#1187-member-access)), or *pointer_member_access* ([§22.6.3](unsafe-code.md#2263-pointer-member-access)) ending with a *type_argument_list* ([§8.4.2](types.md#842-type-arguments)), the token immediately following the closing `>` token is examined. If it is one of
If a sequence of tokens can be parsed (in context) as a *simple_name* ([§11.8.4](expressions.md#1184-simple-names)), *member_access* ([§11.8.7](expressions.md#1187-member-access)), or *pointer_member_access* ([§22.6.3](unsafe-code.md#2263-pointer-member-access)) ending with a *type_argument_list* ([§8.4.2](types.md#842-type-arguments)), the token immediately following the closing `>` token is examined, to see if it is

```csharp
( ) ] : ; , . ? == !=
```
- One of `( ) ] } : ; , . ? == != | ^ && || & [`; or
- One of the relational operators `< > <= >= is as`; or
- A contextual query keyword appearing inside a query expression; or
- In certain contexts, we treat *identifier* as a disambiguating token. Those contexts are where the sequence of tokens being disambiguated is immediately preceded by one of the keywords `is`, `case` or `out`, or arises while parsing the first element of a tuple literal (in which case the tokens are preceded by `(` or `:` and the identifier is followed by a `,`) or a subsequent element of a tuple literal.

then the *type_argument_list* is retained as part of the *simple_name*, *member_access*, or *pointer_member_access* and any other possible parse of the sequence of tokens is discarded. Otherwise, the *type_argument_list* is not considered part of the *simple_name*, *member_access*, or *pointer_member_access*, even if there is no other possible parse of the sequence of tokens.
If the following token is among this list, or an identifier in such a context, then the *type_argument_list* is retained as part of the *simple_name*, *member_access* or *pointer_member-access* and any other possible parse of the sequence of tokens is discarded. Otherwise, the *type_argument_list* is not considered to be part of the *simple_name*, *member_access* or *pointer_member_access*, even if there is no other possible parse of the sequence of tokens. (These rules are not applied when parsing a *type_argument_list* in a *namespace_or_type_name* [§7.8](basic-concepts.md#78-namespace-and-type-names).)

> *Note*: These rules are not applied when parsing a *type_argument_list* in a *namespace_or_type_name* ([§7.8](basic-concepts.md#78-namespace-and-type-names)). *end note*
<!-- markdownlint-disable MD028 -->
Expand Down Expand Up @@ -106,10 +107,25 @@ then the *type_argument_list* is retained as part of the *simple_name*, *member_
> x = y is C<T> && z;
> ```
>
> the tokens `C<T>` are interpreted as a *namespace_or_type_name* with a *type_argument_list* due to being on the right-hand side of the `is` operator ([§11.12.1](expressions.md#11121-general)). Because `C<T>` parses as a *namespace_or_type_name*, not a *simple_name*, *member_access*, or *pointer_member_access*, the above rule does not apply, and it is considered to have a *type_argument_list* regardless of the token that follows.
> the tokens `C<T>` are interpreted as a *namespace_or_type_name* with a *type_argument_list* due to the presence of
> the disambiguating token `&&` after the *type_argument_list*.
>
> The expression `(A < B, C > D)` is a tuple with two elements, each a comparison.
>
> The expression `(A<B,C> D, E)` is a tuple with two elements, the first of which is a declaration expression.
>
> The invocation `M(A < B, C > D, E)` has three arguments.
>
> The invocation `M(out A<B,C> D, E)` has two arguments, the first of which is an `out` declaration.
>
> The expression `e is A<B> C` uses a declaration pattern.
>
> The case label `case A<B> C:` uses a declaration pattern.
>
> *end example*

A *relational_expression* ([§11.12.1](expressions.md#11121-general)) can have the form "*relational_expression* `is` *type*" or "*relational_expression* `is` *constant_pattern*," either of which might be a valid parse of a qualified identifier. In this case, an attempt is made to bind it as a type (XREF TO 7.8.1 NAMESPACES AND TYPES); however, if that fails, it is bound as an expression, and the result must be a constant.

## 6.3 Lexical analysis

### 6.3.1 General
Expand Down
175 changes: 175 additions & 0 deletions standard/patterns.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
# §patterns-new-clause Patterns and pattern matching

## §patterns-new-clause-general General

A ***pattern*** is a syntactic form that can be used with the `is` operator ([§11.12.12](expressions.md#111212-the-is-operator)) and in a *switch_statement* ([§12.8.3](statements.md#1283-the-switch-statement)) to express the shape of data against which incoming data is to be compared. A pattern is tested against the *expression* of a switch statement, or against a *relational_expression* that is on the left-hand side of an `is` operator. We call this a ***pattern input value***.

## §patterns-new-clause-forms Pattern Forms

A pattern may have one of the following forms:

```ANTLR
pattern
: declaration_pattern
| constant_pattern
| var_pattern
;
```

A *declaration_pattern* and a *var_pattern* can result in the declaration of a local variable.

Each pattern form defines the set of types for input values that the pattern may be applied to. We say a pattern `P` is *applicable to* a type `T` if `T` is among the types whose values the pattern may match. It is an error if a pattern `P` appears in a program to match a *pattern input value* of type `T` if `P` is not applicable to `T`.

Each pattern form defines the set of values for which the pattern *matches* the value.
Copy link
Member Author

@gafter gafter Mar 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say this is a runtime concept. (Not urgent, or even necessary)


### §declaration-pattern-new-clause Declaration pattern

A *declaration_pattern* is used to test that a value has a given type and, if the test succeeds, provide the value in a variable of that type.

```ANTLR
declaration_pattern
: type simple_designation
;
simple_designation
: single_variable_designation
;
single_variable_designation
: identifier
;
```

The runtime type of the value is tested against the *type* in the pattern. If it is of that runtime type (or some subtype), the pattern *matches* that value. This pattern form never matches a `null` value.

Given a *pattern input value* *e*, if the *simple_designation* is the *identifier* `_`, it denotes a discard (§9.2.8.1) the value of *e* is not bound to anything. (Although a declared variable with the name `_` may be in scope at that point, that named variable is not seen in this context.) If *simple_designation* is any other identifier, a local variable ([§9.2.8](variables.md#928-local-variables)) of the given type named by the given identifier is introduced. That local variable is assigned the value of the *pattern input value* when the pattern *matches* the value.

Certain combinations of static type of the pattern input value and the given type are considered incompatible and result in a compile-time error. A value of static type `E` is said to be ***pattern compatible*** with the type `T` if there exists an identity conversion, an implicit reference conversion, a boxing conversion, an explicit reference conversion, or an unboxing conversion from `E` to `T`, or if either `E` or `T` is an open type ([§8.4.3](types.md#843-open-and-closed-types)). A declaration pattern naming a type `T` is *applicable to* every type `E` for which `E` is *pattern compatible* with `T`.

> *Note*: The support for open types can be most useful when checking types that may be either struct or class types, and boxing is to be avoided. *end note*
<!-- markdownlint-disable MD028 -->
<!-- markdownlint-enable MD028 -->
> *Example*: The declaration pattern is useful for performing run-time type tests of reference types, and replaces the idiom
>
> ```csharp
> var v = expr as Type;
> if (v != null) { /* code using v */ }
> ```
>
> with the slightly more concise
>
> ```csharp
> if (expr is Type v) { /* code using v */ }
> ```
>
> *end example*

It is an error if *type* is a nullable value type.

> *Example*: The declaration pattern can be used to test values of nullable types: a value of type `Nullable<T>` (or a boxed `T`) matches a type pattern `T2 id` if the value is non-null and `T2` is `T`, or some base type or interface of `T`. For example, in the code fragment
>
> ```csharp
> int? x = 3;
> if (x is int v) { /* code using v */ }
> ```
>
> The condition of the `if` statement is `true` at runtime and the variable `v` holds the value `3` of type `int` inside the block. *end example*

### §constant-pattern-new-clause Constant pattern

A *constant_pattern* is used to test the value of a pattern input value (§patterns-new-clause) against the given constant value.

```ANTLR
constant_pattern
: constant_expression
;
```

A constant pattern `P` is *applicable to* a type `T` if there is an implicit conversion from the constant expression of `P` to the type `T`.

For a constant pattern `P`, we say its *converted value* is

- if the input expression's type is an integral type or an enum type, the pattern's constant value converted to that type; otherwise
- if the input expression's type is the nullable version of an integral type or an enum type, the pattern's constant value converted to its underlying type; otherwise
- the value of the pattern's constant value.

Given a *pattern input value* *e* and a constant pattern `P` with converted value *v*,

- if *e* has integral type or enum type, or a nullable form of one of those, and *v* has integral type, the pattern `P` *matches* the value *e* if result of the expression `e == v` is `true`; otherwise
- the pattern `P` *matches* the value *e* if `object.Equals(e, v)` returns `true`.

> *Example*:
>
> ```csharp
> public static decimal GetGroupTicketPrice(int visitorCount)
> {
> switch (visitorCount) {
> case 1: return 12.0m;
> case 2: return 20.0m;
> case 3: return 27.0m;
> case 4: return 32.0m;
> case 0: return 0.0m;
> default: throw new ArgumentException(…);
> }
> }
> ```
>
> *end example*

### §var-pattern-new-clause Var pattern

A *var_pattern* matches every value. That is, a pattern-matching operation with a *var_pattern* always succeeds.

A *var_pattern* is *applicable to* every type.

```ANTLR
var_pattern
: 'var' designation
;
designation
: simple_designation
;
```

Given a *pattern input value* *e*, if *designation* is the *identifier* `_`, it denotes a discard (§9.2.8.1), and the value of *e* is not bound to anything. (Although a declared variable with that name may be in scope at that point, that named variable is not seen in this context.) If *designation* is any other identifier, at runtime the value of *e* is bound to a newly introduced local variable ([§9.2.8](variables.md#928-local-variables)) of that name whose type is the static type of *e*, and the pattern input value is assigned to that local variable.

It is an error if the name `var` would bind to a type where a *var_pattern* is used.

## Pattern Subsumption

In a switch statement, it is an error if a case's pattern is *subsumed* by the preceding set of unguarded cases (XREF).
Informally, this means that any input value would have been matched by one of the previous cases.
Here we define when a set of patterns *subsumes* a given pattern.

We say a pattern `P` *would match* a constant `K` if the specification for that pattern's runtime behavior is that `P` matches `K`.

A set of patterns `Q` *subsumes* a pattern `P` if any of the following conditions hold:

- `P` is a constant pattern and any of the patterns in the set `Q` would match `P`'s *converted value*
- `P` is a var pattern and the set of patterns `Q` is *exhaustive* for the type of the pattern input value, and either the pattern input value is not of a nullable type or some pattern in `Q` would match `null`.
- `P` is a declaration pattern with type `T` and the set of patterns `Q` is *exhaustive* for the type `T` (XREF).

## Pattern Exhaustiveness

Informally, we say that a set of patterns is exhaustive for a type if some pattern in the set is applicable to every possible value of that type other than null.
Here we define when a set of patterns is *exhaustive* for a type.

A set of patterns `Q` is *exhaustive* for a type `T` if any of the following conditions hold:

1. `T` is an integral or enum type, or a nullable version of one of those, and for every possible value of `T`'s underlying type, some pattern in `Q` would match that value; or
2. Some pattern in `Q` is a *var pattern*; or
3. Some pattern in `Q` is a *declaration pattern* for type `D`, and there is an identity conversion, an implicit reference conversion, or a boxing conversion from `T` to `D`.

> *Example*:
>
> ```csharp
> static void M(byte b)
> {
> switch (b) {
> case 0: case 1: case 2: case 3: ... // handle every specific value of byte
> break;
> case byte other: // error: the pattern 'byte other' is subsumed by previous cases because the previous cases are exhaustive for byte
> break;
> }
> }
> ```
>
> *end example*
Loading