Skip to content

Commit d60381f

Browse files
authored
Resolve ANTLR warnings by defining named lexer tokens (#225)
* Update lexical-structure.md * Update expressions.md * Update classes.md * Removed a redundant sentence from 7.3.1 General
1 parent 5f73b59 commit d60381f

File tree

3 files changed

+38
-29
lines changed

3 files changed

+38
-29
lines changed

standard/classes.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3433,7 +3433,7 @@ binary_operator_declarator
34333433
34343434
overloadable_binary_operator
34353435
: '+' | '-' | '*' | '/' | '%' | '&' | '|' | '^' | '<<'
3436-
| Right_shift | '==' | '!=' | '>' | '<' | '>=' | '<='
3436+
| Right_Shift | '==' | '!=' | '>' | '<' | '>=' | '<='
34373437
;
34383438
34393439
conversion_operator_declarator

standard/expressions.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1543,7 +1543,7 @@ A *base_access* consists of the keyword base followed by either a "`.`" token a
15431543

15441544
```ANTLR
15451545
base_access
1546-
: 'base' '.' identifier type_argument_list?
1546+
: 'base' '.' Identifier type_argument_list?
15471547
| 'base' '[' argument_list ']'
15481548
;
15491549
```

standard/lexical-structure.md

Lines changed: 36 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ Conforming implementations shall accept Unicode compilation units encoded with t
2020

2121
### 7.2.1 General
2222

23-
This specification presents the syntax of the C# programming language using two grammars. The ***lexical grammar*** ([§7.2.2](lexical-structure.md#722-grammar-notation)) defines how Unicode characters are combined to form line terminators, white space, comments, tokens, and pre-processing directives. The ***syntactic grammar*** ([§7.2.4](lexical-structure.md#724-syntactic-grammar)) defines how the tokens resulting from the lexical grammar are combined to form C# programs.
23+
This specification presents the syntax of the C# programming language using two grammars. The ***lexical grammar*** ([§7.2.3](lexical-structure.md#723-lexical-grammar)) defines how Unicode characters are combined to form line terminators, white space, comments, tokens, and pre-processing directives. The ***syntactic grammar*** ([§7.2.4](lexical-structure.md#724-syntactic-grammar)) defines how the tokens resulting from the lexical grammar are combined to form C# programs.
2424

2525
All terminal characters are to be understood as the appropriate Unicode character from the range U+0020 to U+007F, as opposed to any similar-looking characters from other Unicode character ranges.
2626

@@ -34,12 +34,16 @@ The lexical grammar of C# is presented in [§7.3](lexical-structure.md#73-lexica
3434

3535
Every compilation unit in a C# program shall conform to the *Input* production of the lexical grammar ([§7.3.1](lexical-structure.md#731-general)).
3636

37+
Using the ANTLR convention, lexer rule names are spelled with an initial uppercase letter.
38+
3739
### 7.2.4 Syntactic grammar
3840

3941
The syntactic grammar of C# is presented in the clauses, subclauses, and annexes that follow this subclause. The terminal symbols of the syntactic grammar are the tokens defined by the lexical grammar, and the syntactic grammar specifies how tokens are combined to form C# programs.
4042

4143
Every compilation unit in a C# program shall conform to the *Compilation_Unit* production ([§14.2](namespaces.md#142-compilation-units)) of the syntactic grammar.
4244

45+
Using the ANTLR convention, syntactic rule names are spelled with an initial lowercase letter.
46+
4347
### 7.2.5 Grammar ambiguities
4448

4549
The productions for *simple_name* ([§12.7.3](expressions.md#1273-simple-names)) and *member_access* ([§12.7.5](expressions.md#1275-member-access)) can give rise to ambiguities in the grammar for expressions.
@@ -84,6 +88,19 @@ then the *type_argument_list* is retained as part of the *simple_name*, *member_
8488
8589
### 7.3.1 General
8690
91+
For convenience, the lexical grammar defines and references the following named lexer tokens:
92+
93+
```ANTLR
94+
DEFAULT : 'default' ;
95+
NULL : 'null' ;
96+
TRUE : 'true' ;
97+
FALSE : 'false' ;
98+
ASTERISK : '*' ;
99+
SLASH : '/' ;
100+
```
101+
102+
Although these are lexer rules, these names are spelled in all-uppercase letters to distinguish them from ordinary lexer rule names.
103+
87104
The *Input* production defines the lexical structure of a C# compilation unit.
88105

89106
```ANTLR
@@ -120,13 +137,9 @@ When several lexical grammar productions match a sequence of characters in a com
120137
Line terminators divide the characters of a C# compilation unit into lines.
121138

122139
```ANTLR
123-
New_Line
124-
: '<Carriage return character (U+000D)>'
125-
| '<Line feed character (U+000A)>'
140+
New_Line
141+
: New_Line_Character
126142
| '<Carriage return character (U+000D) followed by line feed character (U+000A)>'
127-
| '<Next line character (U+0085)>'
128-
| '<Line separator character (U+2028)>'
129-
| '<Paragraph separator character (U+2029)>'
130143
;
131144
```
132145

@@ -196,20 +209,16 @@ New_Line_Character
196209
;
197210
198211
Delimited_Comment
199-
: '/*' Delimited_Comment_Section* Asterisk+ '/'
212+
: '/*' Delimited_Comment_Section* ASTERISK+ '/'
200213
;
201214
202215
Delimited_Comment_Section
203-
: '/'
204-
| Asterisk* Not_Slash_Or_Asterisk
205-
;
206-
207-
Asterisk
208-
: '*'
216+
: SLASH
217+
| ASTERISK* Not_Slash_Or_Asterisk
209218
;
210219
211220
Not_Slash_Or_Asterisk
212-
: '<Any Unicode character except / or *>'
221+
: '<Any Unicode character except SLASH or ASTERISK>'
213222
;
214223
```
215224
@@ -420,17 +429,17 @@ A ***keyword*** is an identifier-like sequence of characters that is reserved, a
420429
Keyword
421430
: 'abstract' | 'as' | 'base' | 'bool' | 'break'
422431
| 'byte' | 'case' | 'catch' | 'char' | 'checked'
423-
| 'class' | 'const' | 'continue' | 'decimal' | 'default'
432+
| 'class' | 'const' | 'continue' | 'decimal' | DEFAULT
424433
| 'delegate' | 'do' | 'double' | 'else' | 'enum'
425-
| 'event' | 'explicit' | 'extern' | 'false' | 'finally'
434+
| 'event' | 'explicit' | 'extern' | FALSE | 'finally'
426435
| 'fixed' | 'float' | 'for' | 'foreach' | 'goto'
427436
| 'if' | 'implicit' | 'in' | 'int' | 'interface'
428437
| 'internal' | 'is' | 'lock' | 'long' | 'namespace'
429-
| 'new' | 'null' | 'object' | 'operator' | 'out'
438+
| 'new' | NULL | 'object' | 'operator' | 'out'
430439
| 'override' | 'params' | 'private' | 'protected' | 'public'
431440
| 'readonly' | 'ref' | 'return' | 'sbyte' | 'sealed'
432441
| 'short' | 'sizeof' | 'stackalloc' | 'static' | 'string'
433-
| 'struct' | 'switch' | 'this' | 'throw' | 'true'
442+
| 'struct' | 'switch' | 'this' | 'throw' | TRUE
434443
| 'try' | 'typeof' | 'uint' | 'ulong' | 'unchecked'
435444
| 'unsafe' | 'ushort' | 'using' | 'virtual' | 'void'
436445
| 'volatile' | 'while'
@@ -483,8 +492,8 @@ There are two Boolean literal values: `true` and `false`.
483492

484493
```ANTLR
485494
Boolean_Literal
486-
: 'true'
487-
| 'false'
495+
: TRUE
496+
| FALSE
488497
;
489498
```
490499

@@ -673,7 +682,7 @@ Regular_String_Literal_Character
673682
;
674683
675684
Single_Regular_String_Literal_Character
676-
: '<Any character except \" (U+0022), \\ (U+005C), and New_Line_Character>'
685+
: '<Any character except " (U+0022), \\ (U+005C), and New_Line_Character>'
677686
;
678687
679688
Verbatim_String_Literal
@@ -736,7 +745,7 @@ Each string literal does not necessarily result in a new string instance. When t
736745
737746
```ANTLR
738747
Null_Literal
739-
: 'null'
748+
: NULL
740749
;
741750
```
742751
@@ -753,7 +762,7 @@ Punctuators are for grouping and separating.
753762
```ANTLR
754763
Operator_Or_Punctuator
755764
: '{' | '}' | '[' | ']' | '(' | ')' | '.' | ',' | ':' | ';'
756-
| '+' | '-' | '*' | '/' | '%' | '&' | '|' | '^' | '!' | '~'
765+
| '+' | '-' | ASTERISK | SLASH | '%' | '&' | '|' | '^' | '!' | '~'
757766
| '=' | '<' | '>' | '?' | '??' | '::' | '++' | '--' | '&&' | '||'
758767
| '->' | '==' | '!=' | '<=' | '>=' | '+=' | '-=' | '*=' | '/=' | '%='
759768
| '&=' | '|=' | '^=' | '<<' | '<<=' | '=>'
@@ -885,8 +894,8 @@ Pp_Unary_Expression
885894
;
886895
887896
Pp_Primary_Expression
888-
: 'true'
889-
| 'false'
897+
: TRUE
898+
| FALSE
890899
| Conditional_Symbol
891900
| '(' Whitespace? Pp_Expression Whitespace? ')'
892901
;
@@ -1161,7 +1170,7 @@ Pp_Line
11611170
Line_Indicator
11621171
: Decimal_Digit+ Whitespace Compilation_Unit_Name
11631172
| Decimal_Digit+
1164-
| 'default'
1173+
| DEFAULT
11651174
| 'hidden'
11661175
;
11671176

0 commit comments

Comments
 (0)