Skip to content

Commit c615e23

Browse files
committed
Create Whitespace grammar productions
I created productions for `END_OF_LINE`, `IGNORABLE_CODE_POINT`, and `HORIZONTAL_WHITESPACE` as that is how the unicode standard is written and in preparation for rust-lang#1974 which will make use of `HORIZONTAL_WHITESPACE`
1 parent bcb96fb commit c615e23

File tree

2 files changed

+52
-17
lines changed

2 files changed

+52
-17
lines changed

src/notation.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -57,10 +57,6 @@ r[input.syntax]
5757
NUL -> U+0000
5858
5959
TAB -> U+0009
60-
61-
LF -> U+000A
62-
63-
CR -> U+000D
6460
```
6561

6662
[binary operators]: expressions/operator-expr.md#arithmetic-and-logical-binary-operators

src/whitespace.md

Lines changed: 52 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,60 @@
11
r[lex.whitespace]
22
# Whitespace
33

4+
r[whitespace.syntax]
5+
```grammar,lexer
6+
@root WHITESPACE ->
7+
END_OF_LINE
8+
| IGNORABLE_CODE_POINT
9+
| HORIZONTAL_WHITESPACE
10+
11+
LF -> LINE_FEED
12+
13+
CR -> CARRIAGE_RETURN
14+
15+
END_OF_LINE ->
16+
LINE_FEED
17+
| VERTICAL_TAB
18+
| FORM_FEED
19+
| CARRIAGE_RETURN
20+
| NEXT_LINE
21+
| LINE_SEPARATOR
22+
| PARAGRAPH_SEPARATOR
23+
24+
LINE_FEED -> U+000A
25+
26+
VERTICAL_TAB -> U+000B
27+
28+
FORM_FEED -> U+000C
29+
30+
CARRIAGE_RETURN -> U+000D
31+
32+
NEXT_LINE -> U+0085
33+
34+
LINE_SEPARATOR -> U+2028
35+
36+
PARAGRAPH_SEPARATOR -> U+2029
37+
38+
IGNORABLE_CODE_POINT ->
39+
LEFT_TO_RIGHT_MARK
40+
| RIGHT_TO_LEFT_MARK
41+
42+
LEFT_TO_RIGHT_MARK -> U+200E
43+
44+
RIGHT_TO_LEFT_MARK -> U+200F
45+
46+
HORIZONTAL_WHITESPACE ->
47+
HORIZONTAL_TAB
48+
| SPACE
49+
50+
HORIZONTAL_TAB -> U+0009
51+
52+
SPACE -> U+0020
53+
```
54+
455
r[lex.whitespace.intro]
556
Whitespace is any non-empty string containing only characters that have the
6-
[`Pattern_White_Space`] Unicode property, namely:
7-
8-
- `U+0009` (horizontal tab, `'\t'`)
9-
- `U+000A` (line feed, `'\n'`)
10-
- `U+000B` (vertical tab)
11-
- `U+000C` (form feed)
12-
- `U+000D` (carriage return, `'\r'`)
13-
- `U+0020` (space, `' '`)
14-
- `U+0085` (next line)
15-
- `U+200E` (left-to-right mark)
16-
- `U+200F` (right-to-left mark)
17-
- `U+2028` (line separator)
18-
- `U+2029` (paragraph separator)
57+
[`Pattern_White_Space`] Unicode property.
1958

2059
r[lex.whitespace.token-sep]
2160
Rust is a "free-form" language, meaning that all forms of whitespace serve only

0 commit comments

Comments
 (0)