@@ -592,10 +592,12 @@ or interrupted by ignored characters.
592
592
593
593
Most tokens in Rust follow rules similar to the C family.
594
594
595
- Most tokens (including identifiers, whitespace, keywords, operators and
596
- structural symbols) are drawn from the ASCII-compatible range of
597
- Unicode. String and character literals, however, may include the full range of
598
- Unicode characters.
595
+ Most tokens (including whitespace, keywords, operators and structural symbols)
596
+ are drawn from the ASCII-compatible range of Unicode. Identifiers are drawn
597
+ from Unicode characters specified by the @code {XID_start } and
598
+ @code {XID_continue } rules given by UAX #31@footnote {Unicode Standard Annex
599
+ #31: Unicode Identifier and Pattern Syntax }. String and character literals may
600
+ include the full range of Unicode characters.
599
601
600
602
@emph {TODO: formalize this section much more }.
601
603
@@ -638,18 +640,22 @@ token or a syntactic extension token. Multi-line comments may be nested.
638
640
@c * Ref.Lex.Ident:: Identifier tokens.
639
641
@cindex Identifier token
640
642
641
- Identifiers follow the pattern of C identifiers: they begin with a
642
- @emph { letter } or @emph {underscore }, and continue with any combination of
643
- @emph { letters }, @emph { decimal digits } and underscores, and must not be equal
644
- to any keyword or reserved token . @xref {Ref.Lex.Key }. @xref {Ref.Lex.Res }.
643
+ Identifiers follow the rules given by Unicode Standard Annex #31, in the form
644
+ closed under NFKC normalization, @emph {excluding } those tokens that are
645
+ otherwise defined as keywords or reserved
646
+ tokens . @xref {Ref.Lex.Key }. @xref {Ref.Lex.Res }.
645
647
646
- A @emph {letter } is a Unicode character in the ranges U+0061-U+007A and
647
- U+0041-U+005A (@code {'a' }-@code {'z' } and @code {'A' }-@code {'Z' }).
648
+ That is: an identifier starts with any character having derived property
649
+ @code {XID_Start } and continues with zero or more characters having derived
650
+ property @code {XID_Continue }; and such an identifier is NFKC-normalized during
651
+ lexing, such that all subsequent comparison of identifiers is performed on the
652
+ NFKC-normalized forms.
648
653
649
- An @dfn { underscore } is the character U+005F ('_') .
654
+ @emph { TODO: define relationship between Unicode and Rust versions } .
650
655
651
- A @dfn {decimal digit } is a character in the range U+0030-U+0039
652
- (@code {'0' }-@code {'9' }).
656
+ @footnote {This identifier syntax is a superset of the identifier syntaxes of C
657
+ and Java , and is modeled on Python PEP #3131 , which formed the definition of
658
+ identifiers in Python 3.0 and later. }
653
659
654
660
@node Ref.Lex.Key
655
661
@subsection Ref.Lex.Key
@@ -1984,22 +1990,22 @@ module system).
1984
1990
An example of a @code {tag } item and its use:
1985
1991
@example
1986
1992
tag animal @{
1987
- dog() ;
1988
- cat() ;
1993
+ dog;
1994
+ cat;
1989
1995
@}
1990
1996
1991
- let animal a = dog() ;
1992
- a = cat() ;
1997
+ let animal a = dog;
1998
+ a = cat;
1993
1999
@end example
1994
2000
1995
2001
An example of a @emph {recursive } @code {tag } item and its use:
1996
2002
@example
1997
2003
tag list[T] @{
1998
- nil() ;
2004
+ nil;
1999
2005
cons(T, @@ list[T]);
2000
2006
@}
2001
2007
2002
- let list[int] a = cons(7, cons(13, nil() ));
2008
+ let list[int] a = cons(7, cons(13, nil));
2003
2009
@end example
2004
2010
2005
2011
@@ -3395,9 +3401,9 @@ control enters the block.
3395
3401
An example of a pattern @code {alt } statement:
3396
3402
3397
3403
@example
3398
- type list[X] = tag(nil() , cons(X, @@ list[X]));
3404
+ type list[X] = tag(nil, cons(X, @@ list[X]));
3399
3405
3400
- let list[int] x = cons(10, cons(11, nil() ));
3406
+ let list[int] x = cons(10, cons(11, nil));
3401
3407
3402
3408
alt (x) @{
3403
3409
case (cons(a, cons(b, _))) @{
0 commit comments