Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
creating:expressions [2025/04/11 17:58] – Explain operator parsing logic ahelwer | creating:expressions [2025/05/13 15:58] (current) – Fixed lookahead in set literal parsing ahelwer | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== | + | ======= Parsing |
- | This tutorial page covers the next three chapters in //Crafting Interpreters//: | + | This tutorial page covers the next two chapters in //Crafting Interpreters//: |
- [[https:// | - [[https:// | ||
- [[https:// | - [[https:// | ||
- | - [[https:// | ||
Same as the book, we //could// build a parser for our entire minimal TLA⁺ language subset before moving on to interpreting it, but that would be boring! | Same as the book, we //could// build a parser for our entire minimal TLA⁺ language subset before moving on to interpreting it, but that would be boring! | ||
Line 15: | Line 14: | ||
First read the section of the book, then read the corresponding commentary & modifications given by this tutorial. | First read the section of the book, then read the corresponding commentary & modifications given by this tutorial. | ||
- | ===== Chapter 5: Representing Code ===== | + | ====== Chapter 5: Representing Code ====== |
[[https:// | [[https:// | ||
However, since this tutorial is intended to be self-contained code-wise, all necessary code is reproduced. | However, since this tutorial is intended to be self-contained code-wise, all necessary code is reproduced. | ||
- | ==== Section 5.1: Context-Free Grammars ==== | + | ===== Section 5.1: Context-Free Grammars |
Everything before [[https:// | Everything before [[https:// | ||
Line 38: | Line 37: | ||
ternary | ternary | ||
variadic | variadic | ||
- | operator | + | operator |
</ | </ | ||
There are a few interesting differences. | There are a few interesting differences. | ||
Line 51: | Line 50: | ||
The only difference between '' | The only difference between '' | ||
This is the perspective of a language implementer. | This is the perspective of a language implementer. | ||
- | Later on we will extend the definition of '' | ||
- | ==== Section 5.2: Implementing Syntax Trees ==== | + | Note that we //could// also include the conjunction ''/ |
+ | |||
+ | ===== Section 5.2: Implementing Syntax Trees ===== | ||
While some programmers might have an aversion to generating code, the approach taken in the book is actually very convenient - as you will discover if you take some time to prototype & play around with different class representations of the parse tree! | While some programmers might have an aversion to generating code, the approach taken in the book is actually very convenient - as you will discover if you take some time to prototype & play around with different class representations of the parse tree! | ||
Line 59: | Line 59: | ||
<code java [highlight_lines_extra=" | <code java [highlight_lines_extra=" | ||
- | package | + | package tool; |
import java.io.IOException; | import java.io.IOException; | ||
Line 94: | Line 94: | ||
PrintWriter writer = new PrintWriter(path, | PrintWriter writer = new PrintWriter(path, | ||
- | writer.println(" | + | writer.println(" |
writer.println(); | writer.println(); | ||
writer.println(" | writer.println(" | ||
Line 144: | Line 144: | ||
- | ==== Section 5.3: Working with Trees ==== | + | ===== Section 5.3: Working with Trees ===== |
[[https:// | [[https:// | ||
Line 201: | Line 201: | ||
Our generated syntax tree node types now support the visitor pattern! | Our generated syntax tree node types now support the visitor pattern! | ||
- | ==== Section 5.4: A (Not Very) Pretty Printer ==== | + | ===== Section 5.4: A (Not Very) Pretty Printer |
[[https:// | [[https:// | ||
Line 207: | Line 207: | ||
<code java [highlight_lines_extra=" | <code java [highlight_lines_extra=" | ||
- | package | + | package tla; |
class AstPrinter implements Expr.Visitor< | class AstPrinter implements Expr.Visitor< | ||
Line 273: | Line 273: | ||
It isn't necessary to define a '' | It isn't necessary to define a '' | ||
- | ===== Chapter 6: Parsing Expressions ===== | + | ====== Chapter 6: Parsing Expressions |
In [[https:// | In [[https:// | ||
- | ==== Chapter | + | ===== Section |
First, as in [[https:// | First, as in [[https:// | ||
Line 299: | Line 299: | ||
Worry not, it can still be made terse and understandable! | Worry not, it can still be made terse and understandable! | ||
- | ==== Chapter | + | ===== Section |
[[https:// | [[https:// | ||
Recursive descent can seem a bit magical even if you're used to reasoning about recursive functions! | Recursive descent can seem a bit magical even if you're used to reasoning about recursive functions! | ||
Unfortunately TLA⁺ requires us to push this magic even further. | Unfortunately TLA⁺ requires us to push this magic even further. | ||
- | Worry not, we'll take it step by step. | + | We'll take it step by step. |
- | We start with the same basic '' | + | We start with the same basic '' |
+ | We also add an import for '' | ||
- | <code java [highlight_lines_extra=" | + | <code java [highlight_lines_extra=" |
- | package | + | package tla; |
import java.util.List; | import java.util.List; | ||
+ | import java.util.ArrayList; | ||
- | import static | + | import static tla.TokenType.*; |
class Parser { | class Parser { | ||
Line 337: | Line 339: | ||
private Expr operatorExpressionPrec15() { ... } | private Expr operatorExpressionPrec15() { ... } | ||
</ | </ | ||
- | where each '' | + | where each '' |
Life is too short for this. | Life is too short for this. | ||
Instead, we'll adopt a technique alluded to in the text: | Instead, we'll adopt a technique alluded to in the text: | ||
Line 343: | Line 345: | ||
Here's the skeleton of our operator parsing function; the trick is to make the precedence a parameter to the function instead of a component of the name. | Here's the skeleton of our operator parsing function; the trick is to make the precedence a parameter to the function instead of a component of the name. | ||
- | Add this code after the '' | + | Add this code after the '' |
<code java> | <code java> | ||
private Expr expression() { | private Expr expression() { | ||
Line 358: | Line 360: | ||
</ | </ | ||
- | Before filling out '' | + | Before filling out '' |
<code java> | <code java> | ||
Line 396: | Line 398: | ||
Now we have to define a table of operators with their details. | Now we have to define a table of operators with their details. | ||
- | For this, create | + | First use the handy Java 17 [[https:// |
+ | Put it at the top of the '' | ||
- | <code java> | + | <code java [highlight_lines_extra=" |
- | package com.craftinginterpreters.tla; | + | class Parser { |
+ | private static enum Fix { PREFIX, INFIX, POSTFIX } | ||
+ | private static record Operator(Fix fix, TokenType token, | ||
+ | boolean assoc, int lowPrec, int highPrec) {} | ||
- | enum Fix { | + | private |
- | PREFIX, INFIX, POSTFIX | + | |
- | } | + | |
- | + | ||
- | class Operator { | + | |
- | | + | |
- | final TokenType token; | + | |
- | final boolean assoc; | + | |
- | final int lowPrec; | + | |
- | final int highPrec; | + | |
- | + | ||
- | public Operator(Fix fix, TokenType token, boolean assoc, | + | |
- | int lowPrec, int highPrec) { | + | |
- | this.fix = fix; | + | |
- | this.token = token; | + | |
- | this.assoc = assoc; | + | |
- | this.lowPrec = lowPrec; | + | |
- | this.highPrec = highPrec; | + | |
- | } | + | |
- | } | + | |
</ | </ | ||
- | |||
- | For convenience, | ||
- | |||
- | <code java [highlight_lines_extra=" | ||
- | package com.craftinginterpreters.tla; | ||
- | |||
- | import java.util.List; | ||
- | import java.util.ArrayList; | ||
- | |||
- | import static com.craftinginterpreters.tla.TokenType.*; | ||
- | import static com.craftinginterpreters.tla.Fix.*; | ||
- | |||
- | class Parser { | ||
- | </ | ||
- | |||
You can find operator attributes on page 271 of // | You can find operator attributes on page 271 of // | ||
We use a small subset of these operators. | We use a small subset of these operators. | ||
- | Record their attributes in a table in '' | + | Record their attributes in a table in '' |
<code java> | <code java> | ||
private static final Operator[] operators = new Operator[] { | private static final Operator[] operators = new Operator[] { | ||
- | new Operator(PREFIX, | + | new Operator(Fix.PREFIX, |
- | new Operator(PREFIX, | + | new Operator(Fix.PREFIX, |
- | new Operator(PREFIX, | + | new Operator(Fix.PREFIX, |
- | new Operator(INFIX, | + | new Operator(Fix.INFIX, |
- | new Operator(INFIX, | + | new Operator(Fix.INFIX, |
- | new Operator(INFIX, | + | new Operator(Fix.INFIX, |
- | new Operator(INFIX, | + | new Operator(Fix.INFIX, |
- | new Operator(INFIX, | + | new Operator(Fix.INFIX, |
- | new Operator(INFIX, | + | new Operator(Fix.INFIX, |
- | new Operator(INFIX, | + | new Operator(Fix.POSTFIX, PRIME, |
- | new Operator(INFIX, | + | |
- | new Operator(POSTFIX, | + | |
}; | }; | ||
</ | </ | ||
Line 467: | Line 437: | ||
While this design decision is unusual, it is [[https:// | While this design decision is unusual, it is [[https:// | ||
- | We need one more helper method before we can start working on '' | + | We need one more helper method before we can start working on '' |
- | Add this code above '' | + | Add this code above '' |
<code java> | <code java> | ||
private Operator matchOp(Fix fix, int prec) { | private Operator matchOp(Fix fix, int prec) { | ||
Line 482: | Line 452: | ||
Okay, we're all set for the main event! | Okay, we're all set for the main event! | ||
- | Here is how we modify | + | Here is how we modify '' |
You can see a strong resemblance to the infix operator parsing code given in the book (new lines highlighted): | You can see a strong resemblance to the infix operator parsing code given in the book (new lines highlighted): | ||
<code java [highlight_lines_extra=" | <code java [highlight_lines_extra=" | ||
Line 489: | Line 459: | ||
Operator op; | Operator op; | ||
- | + | ||
Expr expr = operatorExpression(prec + 1); | Expr expr = operatorExpression(prec + 1); | ||
- | while ((op = matchOp(INFIX, | + | while ((op = matchOp(Fix.INFIX, prec)) != null) { |
Token operator = previous(); | Token operator = previous(); | ||
Expr right = operatorExpression(op.highPrec + 1); | Expr right = operatorExpression(op.highPrec + 1); | ||
expr = new Expr.Binary(expr, | expr = new Expr.Binary(expr, | ||
} | } | ||
- | + | ||
return expr; | return expr; | ||
} | } | ||
Line 502: | Line 472: | ||
We use Java's combined conditional-assignment atop the '' | We use Java's combined conditional-assignment atop the '' | ||
- | The interior of the '' | + | The interior of the '' |
- | It takes a bit of thinking to understand why this works for parsing expressions according to the precedence we want, but it does. | + | It takes a bit of thinking to understand why this works for parsing expressions according to the precedence |
Take a minute to ponder it. | Take a minute to ponder it. | ||
If you still don't get it, wait until we have a fully-functional expression parser then play around with this line to see how it changes the behavior. | If you still don't get it, wait until we have a fully-functional expression parser then play around with this line to see how it changes the behavior. | ||
- | This code implements precedence ranges, but assumes all infix operators are associative. | + | Our code implements precedence ranges, but assumes all infix operators are associative. |
- | We need to modify the loop to break if the infix operator is not associative: | + | We need to modify the loop to return immediately |
<code java [highlight_lines_extra=" | <code java [highlight_lines_extra=" | ||
- | while ((op = matchOp(INFIX, | + | while ((op = matchOp(Fix.INFIX, prec)) != null) { |
Token operator = previous(); | Token operator = previous(); | ||
Expr right = operatorExpression(op.highPrec + 1); | Expr right = operatorExpression(op.highPrec + 1); | ||
expr = new Expr.Binary(expr, | expr = new Expr.Binary(expr, | ||
- | if (!op.assoc) | + | if (!op.assoc) |
} | } | ||
</ | </ | ||
Line 521: | Line 491: | ||
That was the most complicated case, so let's move on to prefix operators. | That was the most complicated case, so let's move on to prefix operators. | ||
Again our code resembles the prefix operator parsing logic given in the book. | Again our code resembles the prefix operator parsing logic given in the book. | ||
- | Add this snippet near the top of the '' | + | Add this snippet near the top of '' |
<code java [highlight_lines_extra=" | <code java [highlight_lines_extra=" | ||
Line 528: | Line 498: | ||
Operator op; | Operator op; | ||
- | if ((op = matchOp(PREFIX, | + | if ((op = matchOp(Fix.PREFIX, prec)) != null) { |
Token opToken = previous(); | Token opToken = previous(); | ||
Expr expr = operatorExpression( | Expr expr = operatorExpression( | ||
- | | + | |
return new Expr.Unary(opToken, | return new Expr.Unary(opToken, | ||
} | } | ||
Line 541: | Line 511: | ||
To understand this expression, it is helpful to consider an example. | To understand this expression, it is helpful to consider an example. | ||
The negative prefix operator '' | The negative prefix operator '' | ||
- | In order to do that after consuming the first '' | + | In order to do that after consuming the first '' |
Then we will consume the second '' | Then we will consume the second '' | ||
In contrast, the prefix operator '' | In contrast, the prefix operator '' | ||
Line 549: | Line 519: | ||
All that remains is to parse postfix operators. | All that remains is to parse postfix operators. | ||
These are not covered in the book, but they pretty much resemble infix operators without matching an expression on the right-hand side of the operator. | These are not covered in the book, but they pretty much resemble infix operators without matching an expression on the right-hand side of the operator. | ||
- | Add the highlighted code below the '' | + | Add the highlighted code below the '' |
<code java [highlight_lines_extra=" | <code java [highlight_lines_extra=" | ||
} | } | ||
- | while ((op = matchOp(POSTFIX, | + | while ((op = matchOp(Fix.POSTFIX, prec)) != null) { |
Token opToken = previous(); | Token opToken = previous(); | ||
expr = new Expr.Unary(opToken, | expr = new Expr.Unary(opToken, | ||
Line 565: | Line 535: | ||
And that completes our operator parsing logic! | And that completes our operator parsing logic! | ||
+ | All that remains is our '' | ||
+ | It is fairly similar to that given in the book, with some omissions since our minimal TLA⁺ subset does not contain strings or null values. | ||
+ | Add this code after '' | ||
+ | |||
+ | <code java> | ||
+ | private Expr primary() { | ||
+ | if (match(FALSE)) return new Expr.Literal(false); | ||
+ | if (match(TRUE)) return new Expr.Literal(true); | ||
+ | |||
+ | if (match(NUMBER)) { | ||
+ | return new Expr.Literal(previous().literal); | ||
+ | } | ||
+ | |||
+ | if (match(LEFT_PAREN)) { | ||
+ | Expr expr = expression(); | ||
+ | consume(RIGHT_PAREN, | ||
+ | return new Expr.Grouping(expr); | ||
+ | } | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | We have two final additions to '' | ||
+ | Here's how we parse '' | ||
+ | |||
+ | <code java [highlight_lines_extra=" | ||
+ | return new Expr.Grouping(expr); | ||
+ | } | ||
+ | |||
+ | if (match(IF)) { | ||
+ | Token operator = previous(); | ||
+ | Expr condition = expression(); | ||
+ | consume(THEN, | ||
+ | Expr yes = expression(); | ||
+ | consume(ELSE, | ||
+ | Expr no = expression(); | ||
+ | return new Expr.Ternary(operator, | ||
+ | } | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | And here's how we parse the set construction operator. | ||
+ | Again add the highlighted code to the bottom of '' | ||
+ | |||
+ | <code java [highlight_lines_extra=" | ||
+ | return new Expr.Ternary(operator, | ||
+ | } | ||
+ | |||
+ | if (match(LEFT_BRACE)) { | ||
+ | Token operator = previous(); | ||
+ | List< | ||
+ | if (!check(RIGHT_BRACE)) { | ||
+ | do { | ||
+ | elements.add(expression()); | ||
+ | } while (match(COMMA)); | ||
+ | } | ||
+ | consume(RIGHT_BRACE, | ||
+ | return new Expr.Variadic(operator, | ||
+ | } | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | This will handle expressions like '' | ||
+ | |||
+ | The '' | ||
+ | |||
+ | ===== Section 6.3: Syntax Errors ===== | ||
+ | |||
+ | We don't need to modify most of the error reporting code given in [[https:// | ||
+ | Here it is; add the '' | ||
+ | |||
+ | <code java> | ||
+ | private Token consume(TokenType type, String message) { | ||
+ | if (check(type)) return advance(); | ||
+ | |||
+ | throw error(peek(), | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | Then add the '' | ||
+ | |||
+ | <code java> | ||
+ | private ParseError error(Token token, String message) { | ||
+ | Lox.error(token, | ||
+ | return new ParseError(); | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | In the '' | ||
+ | |||
+ | <code java> | ||
+ | static void error(Token token, String message) { | ||
+ | if (token.type == TokenType.EOF) { | ||
+ | report(token.line, | ||
+ | } else { | ||
+ | report(token.line, | ||
+ | } | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | Add the '' | ||
+ | |||
+ | <code java [highlight_lines_extra=" | ||
+ | class Parser { | ||
+ | private static enum Fix { PREFIX, INFIX, POSTFIX } | ||
+ | private static record Operator(Fix fix, TokenType token, | ||
+ | boolean assoc, int lowPrec, int highPrec) {} | ||
+ | private static class ParseError extends RuntimeException {} | ||
+ | |||
+ | private final List< | ||
+ | </ | ||
+ | |||
+ | One piece of code we //do// need to modify is the '' | ||
+ | TLA⁺ doesn' | ||
+ | |||
+ | <code java> | ||
+ | private void synchronize() { | ||
+ | advance(); | ||
+ | |||
+ | while(!isAtEnd()) { | ||
+ | if (previous().type == EQUAL_EQUAL) return; | ||
+ | |||
+ | advance(); | ||
+ | } | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | Fully-featured TLA⁺ parsers generally look ahead to identify the start of a new operator definition (not trivial!) and insert a synthetic zero-width token for the parser to consume. | ||
+ | For an example, see SANY's very complicated '' | ||
+ | We won't be doing that here, but it's a good technique to know about; looking ahead then inserting synthetic tokens to disambiguate syntax is a common method when parsing complicated & ambiguous languages like TLA⁺. | ||
+ | |||
+ | ===== Section 6.4: Wiring up the Parser ===== | ||
+ | |||
+ | In [[https:// | ||
+ | In the '' | ||
+ | |||
+ | <code java [highlight_lines_extra=" | ||
+ | List< | ||
+ | |||
+ | Parser parser = new Parser(tokens); | ||
+ | Expr expression = parser.parse(); | ||
+ | |||
+ | // Stop if there was a syntax error. | ||
+ | if (hadError) return; | ||
+ | |||
+ | System.out.println(new AstPrinter().print(expression)); | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | Fire up the interpreter and parse a TLA⁺ expression! | ||
+ | < | ||
+ | > {1 + 2, IF TRUE THEN 3 ELSE 4, {}} | ||
+ | ({ (+ 1 2) (IF true 3 4) ({)) | ||
+ | </ | ||
+ | If you got out of sync, you can find a snapshot of the expected state of the code in [[https:// | ||
+ | Next tutorial: [[creating: | ||
+ | |||
+ | ====== Challenges ====== | ||
+ | |||
+ | Here are some optional challenges to flesh out your TLA⁺ interpreter, | ||
+ | |||
+ | [[creating: | ||