Skip to content

Commit 2fd38e1

Browse files
committed
Create Whitespace grammar productions
I created productions for `END_OF_LINE`, `IGNORABLE_CODE_POINT`, and `HORIZONTAL_WHITESPACE` as that is how the unicode standard is written and in preparation for rust-lang#1974 which will make use of `HORIZONTAL_WHITESPACE`
1 parent 0adfec3 commit 2fd38e1

File tree

2 files changed

+54
-19
lines changed

2 files changed

+54
-19
lines changed

src/input-format.md

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,6 @@ r[input.syntax]
66
@root CHAR -> <a Unicode scalar value>
77
88
NUL -> U+0000
9-
10-
TAB -> U+0009
11-
12-
LF -> U+000A
13-
14-
CR -> U+000D
159
```
1610

1711
r[input.intro]

src/whitespace.md

Lines changed: 54 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,62 @@
11
r[lex.whitespace]
22
# Whitespace
33

4+
r[whitespace.syntax]
5+
```grammar,lexer
6+
@root WHITESPACE ->
7+
END_OF_LINE
8+
| IGNORABLE_CODE_POINT
9+
| HORIZONTAL_WHITESPACE
10+
11+
TAB -> HORIZONTAL_TAB
12+
13+
LF -> LINE_FEED
14+
15+
CR -> CARRIAGE_RETURN
16+
17+
END_OF_LINE ->
18+
LINE_FEED
19+
| VERTICAL_TAB
20+
| FORM_FEED
21+
| CARRIAGE_RETURN
22+
| NEXT_LINE
23+
| LINE_SEPARATOR
24+
| PARAGRAPH_SEPARATOR
25+
26+
LINE_FEED -> U+000A
27+
28+
VERTICAL_TAB -> U+000B
29+
30+
FORM_FEED -> U+000C
31+
32+
CARRIAGE_RETURN -> U+000D
33+
34+
NEXT_LINE -> U+0085
35+
36+
LINE_SEPARATOR -> U+2028
37+
38+
PARAGRAPH_SEPARATOR -> U+2029
39+
40+
IGNORABLE_CODE_POINT ->
41+
LEFT_TO_RIGHT_MARK
42+
| RIGHT_TO_LEFT_MARK
43+
44+
LEFT_TO_RIGHT_MARK -> U+200E
45+
46+
RIGHT_TO_LEFT_MARK -> U+200F
47+
48+
HORIZONTAL_WHITESPACE ->
49+
HORIZONTAL_TAB
50+
| SPACE
51+
52+
HORIZONTAL_TAB -> U+0009
53+
54+
SPACE -> U+0020
55+
```
56+
457
r[lex.whitespace.intro]
558
Whitespace is any non-empty string containing only characters that have the
6-
[`Pattern_White_Space`] Unicode property, namely:
7-
8-
- `U+0009` (horizontal tab, `'\t'`)
9-
- `U+000A` (line feed, `'\n'`)
10-
- `U+000B` (vertical tab)
11-
- `U+000C` (form feed)
12-
- `U+000D` (carriage return, `'\r'`)
13-
- `U+0020` (space, `' '`)
14-
- `U+0085` (next line)
15-
- `U+200E` (left-to-right mark)
16-
- `U+200F` (right-to-left mark)
17-
- `U+2028` (line separator)
18-
- `U+2029` (paragraph separator)
59+
[`Pattern_White_Space`] Unicode property.
1960

2061
r[lex.whitespace.token-sep]
2162
Rust is a "free-form" language, meaning that all forms of whitespace serve only

0 commit comments

Comments
 (0)