doc: Document lexer skipping

This commit is contained in:
Markus Westerlind 2020-03-02 15:44:04 +01:00
parent ee2f7060e9
commit 4447616283
2 changed files with 34 additions and 5 deletions

View File

@ -2,6 +2,21 @@ use std::str::FromStr;
grammar; grammar;
match {
"+",
"-",
"*",
"/",
"(",
")",
r"[0-9]+",
// Skip whitespace and comments
r"\s*" => { },
r"//[^\n\r]*[\n\r]*" => { }, // `// comment`
r"/\*([^\*]*\*+[^\*/])*([^\*]*\*+|[^\*])*\*/" => { }, // `/* comment */`
}
pub Expr: i32 = { pub Expr: i32 = {
<l:Expr> "+" <r:Factor> => l + r, <l:Expr> "+" <r:Factor> => l + r,
<l:Expr> "-" <r:Factor> => l - r, <l:Expr> "-" <r:Factor> => l - r,

View File

@ -209,13 +209,13 @@ match {
} else { } else {
r"\w+", r"\w+",
_ _
} }
``` ```
Here the match contains two levels; each level can have more than one Here the match contains two levels; each level can have more than one
item in it. The top-level contains only `r"[0-9]+"`, which means that this item in it. The top-level contains only `r"[0-9]+"`, which means that this
regular expression is given highest priority. The next level contains regular expression is given highest priority. The next level contains
`r\w+`, so that will match afterwards. `r\w+`, so that will match afterwards.
The final `_` indicates that other string literals and regular The final `_` indicates that other string literals and regular
expressions that appear elsewhere in the grammar (e.g., `"("` or expressions that appear elsewhere in the grammar (e.g., `"("` or
@ -240,7 +240,7 @@ fn calculator2b() {
let result = calculator2b::TermParser::new().parse("(foo33)").unwrap(); let result = calculator2b::TermParser::new().parse("(foo33)").unwrap();
assert_eq!(result, "Id(foo33)"); assert_eq!(result, "Id(foo33)");
// This one will fail: // This one will fail:
let result = calculator2b::TermParser::new().parse("(22)").unwrap(); let result = calculator2b::TermParser::new().parse("(22)").unwrap();
@ -262,7 +262,7 @@ match {
} else { } else {
r"\w+", r"\w+",
_ _
} }
``` ```
This raises the interesting question of what the precedence is **within** This raises the interesting question of what the precedence is **within**
@ -280,7 +280,7 @@ There is one final twist before we reach the
can also use `match` declarations to give names to regular can also use `match` declarations to give names to regular
expressions, so that we don't have to type them directly in our expressions, so that we don't have to type them directly in our
grammar. For example, maybe instead of writing `r"\w+"`, we would grammar. For example, maybe instead of writing `r"\w+"`, we would
prefer to write `ID`. We could do that by modifying the match declaration like prefer to write `ID`. We could do that by modifying the match declaration like
so: so:
``` ```
@ -321,6 +321,20 @@ match {
And now any reference in your grammar to `"BEGIN"` will actually match And now any reference in your grammar to `"BEGIN"` will actually match
any capitalization. any capitalization.
#### Customizing skipping between tokens
If we want to support comments we will need to skip more than just whitespace in our lexer.
To this end `ignore patterns` can be specified.
```
match {
r"\s*" => { }, // The default whitespace skipping is disabled an `ignore pattern` is specified
r"//[^\n\r]*[\n\r]*" => { }, // Skip `// comments`
r"/\*([^\*]*\*+[^\*/])*([^\*]*\*+|[^\*])*\*/" => { }, // Skip `/* comments */`
}
```
[lexer tutorial]: index.md [lexer tutorial]: index.md
[calculator2b]: ../../calculator/src/calculator2b.lalrpop [calculator2b]: ../../calculator/src/calculator2b.lalrpop
[calculator3]: ../../calculator/src/calculator3.lalrpop [calculator3]: ../../calculator/src/calculator3.lalrpop