Differences
This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
| creating:expressions [2025/04/11 17:58] – Explain operator parsing logic ahelwer | creating:expressions [2025/05/13 15:58] (current) – Fixed lookahead in set literal parsing ahelwer | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | ====== | + | ======= Parsing |
| - | This tutorial page covers the next three chapters in //Crafting Interpreters//: | + | This tutorial page covers the next two chapters in //Crafting Interpreters//: |
| - [[https:// | - [[https:// | ||
| - [[https:// | - [[https:// | ||
| - | - [[https:// | ||
| Same as the book, we //could// build a parser for our entire minimal TLA⁺ language subset before moving on to interpreting it, but that would be boring! | Same as the book, we //could// build a parser for our entire minimal TLA⁺ language subset before moving on to interpreting it, but that would be boring! | ||
| Line 15: | Line 14: | ||
| First read the section of the book, then read the corresponding commentary & modifications given by this tutorial. | First read the section of the book, then read the corresponding commentary & modifications given by this tutorial. | ||
| - | ===== Chapter 5: Representing Code ===== | + | ====== Chapter 5: Representing Code ====== |
| [[https:// | [[https:// | ||
| However, since this tutorial is intended to be self-contained code-wise, all necessary code is reproduced. | However, since this tutorial is intended to be self-contained code-wise, all necessary code is reproduced. | ||
| - | ==== Section 5.1: Context-Free Grammars ==== | + | ===== Section 5.1: Context-Free Grammars |
| Everything before [[https:// | Everything before [[https:// | ||
| Line 38: | Line 37: | ||
| ternary | ternary | ||
| variadic | variadic | ||
| - | operator | + | operator |
| </ | </ | ||
| There are a few interesting differences. | There are a few interesting differences. | ||
| Line 51: | Line 50: | ||
| The only difference between '' | The only difference between '' | ||
| This is the perspective of a language implementer. | This is the perspective of a language implementer. | ||
| - | Later on we will extend the definition of '' | ||
| - | ==== Section 5.2: Implementing Syntax Trees ==== | + | Note that we //could// also include the conjunction ''/ |
| + | |||
| + | ===== Section 5.2: Implementing Syntax Trees ===== | ||
| While some programmers might have an aversion to generating code, the approach taken in the book is actually very convenient - as you will discover if you take some time to prototype & play around with different class representations of the parse tree! | While some programmers might have an aversion to generating code, the approach taken in the book is actually very convenient - as you will discover if you take some time to prototype & play around with different class representations of the parse tree! | ||
| Line 59: | Line 59: | ||
| <code java [highlight_lines_extra=" | <code java [highlight_lines_extra=" | ||
| - | package | + | package tool; |
| import java.io.IOException; | import java.io.IOException; | ||
| Line 94: | Line 94: | ||
| PrintWriter writer = new PrintWriter(path, | PrintWriter writer = new PrintWriter(path, | ||
| - | writer.println(" | + | writer.println(" |
| writer.println(); | writer.println(); | ||
| writer.println(" | writer.println(" | ||
| Line 144: | Line 144: | ||
| - | ==== Section 5.3: Working with Trees ==== | + | ===== Section 5.3: Working with Trees ===== |
| [[https:// | [[https:// | ||
| Line 201: | Line 201: | ||
| Our generated syntax tree node types now support the visitor pattern! | Our generated syntax tree node types now support the visitor pattern! | ||
| - | ==== Section 5.4: A (Not Very) Pretty Printer ==== | + | ===== Section 5.4: A (Not Very) Pretty Printer |
| [[https:// | [[https:// | ||
| Line 207: | Line 207: | ||
| <code java [highlight_lines_extra=" | <code java [highlight_lines_extra=" | ||
| - | package | + | package tla; |
| class AstPrinter implements Expr.Visitor< | class AstPrinter implements Expr.Visitor< | ||
| Line 273: | Line 273: | ||
| It isn't necessary to define a '' | It isn't necessary to define a '' | ||
| - | ===== Chapter 6: Parsing Expressions ===== | + | ====== Chapter 6: Parsing Expressions |
| In [[https:// | In [[https:// | ||
| - | ==== Chapter | + | ===== Section |
| First, as in [[https:// | First, as in [[https:// | ||
| Line 299: | Line 299: | ||
| Worry not, it can still be made terse and understandable! | Worry not, it can still be made terse and understandable! | ||
| - | ==== Chapter | + | ===== Section |
| [[https:// | [[https:// | ||
| Recursive descent can seem a bit magical even if you're used to reasoning about recursive functions! | Recursive descent can seem a bit magical even if you're used to reasoning about recursive functions! | ||
| Unfortunately TLA⁺ requires us to push this magic even further. | Unfortunately TLA⁺ requires us to push this magic even further. | ||
| - | Worry not, we'll take it step by step. | + | We'll take it step by step. |
| - | We start with the same basic '' | + | We start with the same basic '' |
| + | We also add an import for '' | ||
| - | <code java [highlight_lines_extra=" | + | <code java [highlight_lines_extra=" |
| - | package | + | package tla; |
| import java.util.List; | import java.util.List; | ||
| + | import java.util.ArrayList; | ||
| - | import static | + | import static tla.TokenType.*; |
| class Parser { | class Parser { | ||
| Line 337: | Line 339: | ||
| private Expr operatorExpressionPrec15() { ... } | private Expr operatorExpressionPrec15() { ... } | ||
| </ | </ | ||
| - | where each '' | + | where each '' |
| Life is too short for this. | Life is too short for this. | ||
| Instead, we'll adopt a technique alluded to in the text: | Instead, we'll adopt a technique alluded to in the text: | ||
| Line 343: | Line 345: | ||
| Here's the skeleton of our operator parsing function; the trick is to make the precedence a parameter to the function instead of a component of the name. | Here's the skeleton of our operator parsing function; the trick is to make the precedence a parameter to the function instead of a component of the name. | ||
| - | Add this code after the '' | + | Add this code after the '' |
| <code java> | <code java> | ||
| private Expr expression() { | private Expr expression() { | ||
| Line 358: | Line 360: | ||
| </ | </ | ||
| - | Before filling out '' | + | Before filling out '' |
| <code java> | <code java> | ||
| Line 396: | Line 398: | ||
| Now we have to define a table of operators with their details. | Now we have to define a table of operators with their details. | ||
| - | For this, create | + | First use the handy Java 17 [[https:// |
| + | Put it at the top of the '' | ||
| - | <code java> | + | <code java [highlight_lines_extra=" |
| - | package com.craftinginterpreters.tla; | + | class Parser { |
| + | private static enum Fix { PREFIX, INFIX, POSTFIX } | ||
| + | private static record Operator(Fix fix, TokenType token, | ||
| + | boolean assoc, int lowPrec, int highPrec) {} | ||
| - | enum Fix { | + | private |
| - | PREFIX, INFIX, POSTFIX | + | |
| - | } | + | |
| - | + | ||
| - | class Operator { | + | |
| - | | + | |
| - | final TokenType token; | + | |
| - | final boolean assoc; | + | |
| - | final int lowPrec; | + | |
| - | final int highPrec; | + | |
| - | + | ||
| - | public Operator(Fix fix, TokenType token, boolean assoc, | + | |
| - | int lowPrec, int highPrec) { | + | |
| - | this.fix = fix; | + | |
| - | this.token = token; | + | |
| - | this.assoc = assoc; | + | |
| - | this.lowPrec = lowPrec; | + | |
| - | this.highPrec = highPrec; | + | |
| - | } | + | |
| - | } | + | |
| </ | </ | ||
| - | |||
| - | For convenience, | ||
| - | |||
| - | <code java [highlight_lines_extra=" | ||
| - | package com.craftinginterpreters.tla; | ||
| - | |||
| - | import java.util.List; | ||
| - | import java.util.ArrayList; | ||
| - | |||
| - | import static com.craftinginterpreters.tla.TokenType.*; | ||
| - | import static com.craftinginterpreters.tla.Fix.*; | ||
| - | |||
| - | class Parser { | ||
| - | </ | ||
| - | |||
| You can find operator attributes on page 271 of // | You can find operator attributes on page 271 of // | ||
| We use a small subset of these operators. | We use a small subset of these operators. | ||
| - | Record their attributes in a table in '' | + | Record their attributes in a table in '' |
| <code java> | <code java> | ||
| private static final Operator[] operators = new Operator[] { | private static final Operator[] operators = new Operator[] { | ||
| - | new Operator(PREFIX, | + | new Operator(Fix.PREFIX, |
| - | new Operator(PREFIX, | + | new Operator(Fix.PREFIX, |
| - | new Operator(PREFIX, | + | new Operator(Fix.PREFIX, |
| - | new Operator(INFIX, | + | new Operator(Fix.INFIX, |
| - | new Operator(INFIX, | + | new Operator(Fix.INFIX, |
| - | new Operator(INFIX, | + | new Operator(Fix.INFIX, |
| - | new Operator(INFIX, | + | new Operator(Fix.INFIX, |
| - | new Operator(INFIX, | + | new Operator(Fix.INFIX, |
| - | new Operator(INFIX, | + | new Operator(Fix.INFIX, |
| - | new Operator(INFIX, | + | new Operator(Fix.POSTFIX, PRIME, |
| - | new Operator(INFIX, | + | |
| - | new Operator(POSTFIX, | + | |
| }; | }; | ||
| </ | </ | ||
| Line 467: | Line 437: | ||
| While this design decision is unusual, it is [[https:// | While this design decision is unusual, it is [[https:// | ||
| - | We need one more helper method before we can start working on '' | + | We need one more helper method before we can start working on '' |
| - | Add this code above '' | + | Add this code above '' |
| <code java> | <code java> | ||
| private Operator matchOp(Fix fix, int prec) { | private Operator matchOp(Fix fix, int prec) { | ||
| Line 482: | Line 452: | ||
| Okay, we're all set for the main event! | Okay, we're all set for the main event! | ||
| - | Here is how we modify | + | Here is how we modify '' |
| You can see a strong resemblance to the infix operator parsing code given in the book (new lines highlighted): | You can see a strong resemblance to the infix operator parsing code given in the book (new lines highlighted): | ||
| <code java [highlight_lines_extra=" | <code java [highlight_lines_extra=" | ||
| Line 489: | Line 459: | ||
| Operator op; | Operator op; | ||
| - | + | ||
| Expr expr = operatorExpression(prec + 1); | Expr expr = operatorExpression(prec + 1); | ||
| - | while ((op = matchOp(INFIX, | + | while ((op = matchOp(Fix.INFIX, prec)) != null) { |
| Token operator = previous(); | Token operator = previous(); | ||
| Expr right = operatorExpression(op.highPrec + 1); | Expr right = operatorExpression(op.highPrec + 1); | ||
| expr = new Expr.Binary(expr, | expr = new Expr.Binary(expr, | ||
| } | } | ||
| - | + | ||
| return expr; | return expr; | ||
| } | } | ||
| Line 502: | Line 472: | ||
| We use Java's combined conditional-assignment atop the '' | We use Java's combined conditional-assignment atop the '' | ||
| - | The interior of the '' | + | The interior of the '' |
| - | It takes a bit of thinking to understand why this works for parsing expressions according to the precedence we want, but it does. | + | It takes a bit of thinking to understand why this works for parsing expressions according to the precedence |
| Take a minute to ponder it. | Take a minute to ponder it. | ||
| If you still don't get it, wait until we have a fully-functional expression parser then play around with this line to see how it changes the behavior. | If you still don't get it, wait until we have a fully-functional expression parser then play around with this line to see how it changes the behavior. | ||
| - | This code implements precedence ranges, but assumes all infix operators are associative. | + | Our code implements precedence ranges, but assumes all infix operators are associative. |
| - | We need to modify the loop to break if the infix operator is not associative: | + | We need to modify the loop to return immediately |
| <code java [highlight_lines_extra=" | <code java [highlight_lines_extra=" | ||
| - | while ((op = matchOp(INFIX, | + | while ((op = matchOp(Fix.INFIX, prec)) != null) { |
| Token operator = previous(); | Token operator = previous(); | ||
| Expr right = operatorExpression(op.highPrec + 1); | Expr right = operatorExpression(op.highPrec + 1); | ||
| expr = new Expr.Binary(expr, | expr = new Expr.Binary(expr, | ||
| - | if (!op.assoc) | + | if (!op.assoc) |
| } | } | ||
| </ | </ | ||
| Line 521: | Line 491: | ||
| That was the most complicated case, so let's move on to prefix operators. | That was the most complicated case, so let's move on to prefix operators. | ||
| Again our code resembles the prefix operator parsing logic given in the book. | Again our code resembles the prefix operator parsing logic given in the book. | ||
| - | Add this snippet near the top of the '' | + | Add this snippet near the top of '' |
| <code java [highlight_lines_extra=" | <code java [highlight_lines_extra=" | ||
| Line 528: | Line 498: | ||
| Operator op; | Operator op; | ||
| - | if ((op = matchOp(PREFIX, | + | if ((op = matchOp(Fix.PREFIX, prec)) != null) { |
| Token opToken = previous(); | Token opToken = previous(); | ||
| Expr expr = operatorExpression( | Expr expr = operatorExpression( | ||
| - | | + | |
| return new Expr.Unary(opToken, | return new Expr.Unary(opToken, | ||
| } | } | ||
| Line 541: | Line 511: | ||
| To understand this expression, it is helpful to consider an example. | To understand this expression, it is helpful to consider an example. | ||
| The negative prefix operator '' | The negative prefix operator '' | ||
| - | In order to do that after consuming the first '' | + | In order to do that after consuming the first '' |
| Then we will consume the second '' | Then we will consume the second '' | ||
| In contrast, the prefix operator '' | In contrast, the prefix operator '' | ||
| Line 549: | Line 519: | ||
| All that remains is to parse postfix operators. | All that remains is to parse postfix operators. | ||
| These are not covered in the book, but they pretty much resemble infix operators without matching an expression on the right-hand side of the operator. | These are not covered in the book, but they pretty much resemble infix operators without matching an expression on the right-hand side of the operator. | ||
| - | Add the highlighted code below the '' | + | Add the highlighted code below the '' |
| <code java [highlight_lines_extra=" | <code java [highlight_lines_extra=" | ||
| } | } | ||
| - | while ((op = matchOp(POSTFIX, | + | while ((op = matchOp(Fix.POSTFIX, prec)) != null) { |
| Token opToken = previous(); | Token opToken = previous(); | ||
| expr = new Expr.Unary(opToken, | expr = new Expr.Unary(opToken, | ||
| Line 565: | Line 535: | ||
| And that completes our operator parsing logic! | And that completes our operator parsing logic! | ||
| + | All that remains is our '' | ||
| + | It is fairly similar to that given in the book, with some omissions since our minimal TLA⁺ subset does not contain strings or null values. | ||
| + | Add this code after '' | ||
| + | |||
| + | <code java> | ||
| + | private Expr primary() { | ||
| + | if (match(FALSE)) return new Expr.Literal(false); | ||
| + | if (match(TRUE)) return new Expr.Literal(true); | ||
| + | |||
| + | if (match(NUMBER)) { | ||
| + | return new Expr.Literal(previous().literal); | ||
| + | } | ||
| + | |||
| + | if (match(LEFT_PAREN)) { | ||
| + | Expr expr = expression(); | ||
| + | consume(RIGHT_PAREN, | ||
| + | return new Expr.Grouping(expr); | ||
| + | } | ||
| + | } | ||
| + | </ | ||
| + | |||
| + | We have two final additions to '' | ||
| + | Here's how we parse '' | ||
| + | |||
| + | <code java [highlight_lines_extra=" | ||
| + | return new Expr.Grouping(expr); | ||
| + | } | ||
| + | |||
| + | if (match(IF)) { | ||
| + | Token operator = previous(); | ||
| + | Expr condition = expression(); | ||
| + | consume(THEN, | ||
| + | Expr yes = expression(); | ||
| + | consume(ELSE, | ||
| + | Expr no = expression(); | ||
| + | return new Expr.Ternary(operator, | ||
| + | } | ||
| + | } | ||
| + | </ | ||
| + | |||
| + | And here's how we parse the set construction operator. | ||
| + | Again add the highlighted code to the bottom of '' | ||
| + | |||
| + | <code java [highlight_lines_extra=" | ||
| + | return new Expr.Ternary(operator, | ||
| + | } | ||
| + | |||
| + | if (match(LEFT_BRACE)) { | ||
| + | Token operator = previous(); | ||
| + | List< | ||
| + | if (!check(RIGHT_BRACE)) { | ||
| + | do { | ||
| + | elements.add(expression()); | ||
| + | } while (match(COMMA)); | ||
| + | } | ||
| + | consume(RIGHT_BRACE, | ||
| + | return new Expr.Variadic(operator, | ||
| + | } | ||
| + | } | ||
| + | </ | ||
| + | |||
| + | This will handle expressions like '' | ||
| + | |||
| + | The '' | ||
| + | |||
| + | ===== Section 6.3: Syntax Errors ===== | ||
| + | |||
| + | We don't need to modify most of the error reporting code given in [[https:// | ||
| + | Here it is; add the '' | ||
| + | |||
| + | <code java> | ||
| + | private Token consume(TokenType type, String message) { | ||
| + | if (check(type)) return advance(); | ||
| + | |||
| + | throw error(peek(), | ||
| + | } | ||
| + | </ | ||
| + | |||
| + | Then add the '' | ||
| + | |||
| + | <code java> | ||
| + | private ParseError error(Token token, String message) { | ||
| + | Lox.error(token, | ||
| + | return new ParseError(); | ||
| + | } | ||
| + | </ | ||
| + | |||
| + | In the '' | ||
| + | |||
| + | <code java> | ||
| + | static void error(Token token, String message) { | ||
| + | if (token.type == TokenType.EOF) { | ||
| + | report(token.line, | ||
| + | } else { | ||
| + | report(token.line, | ||
| + | } | ||
| + | } | ||
| + | </ | ||
| + | |||
| + | Add the '' | ||
| + | |||
| + | <code java [highlight_lines_extra=" | ||
| + | class Parser { | ||
| + | private static enum Fix { PREFIX, INFIX, POSTFIX } | ||
| + | private static record Operator(Fix fix, TokenType token, | ||
| + | boolean assoc, int lowPrec, int highPrec) {} | ||
| + | private static class ParseError extends RuntimeException {} | ||
| + | |||
| + | private final List< | ||
| + | </ | ||
| + | |||
| + | One piece of code we //do// need to modify is the '' | ||
| + | TLA⁺ doesn' | ||
| + | |||
| + | <code java> | ||
| + | private void synchronize() { | ||
| + | advance(); | ||
| + | |||
| + | while(!isAtEnd()) { | ||
| + | if (previous().type == EQUAL_EQUAL) return; | ||
| + | |||
| + | advance(); | ||
| + | } | ||
| + | } | ||
| + | </ | ||
| + | |||
| + | Fully-featured TLA⁺ parsers generally look ahead to identify the start of a new operator definition (not trivial!) and insert a synthetic zero-width token for the parser to consume. | ||
| + | For an example, see SANY's very complicated '' | ||
| + | We won't be doing that here, but it's a good technique to know about; looking ahead then inserting synthetic tokens to disambiguate syntax is a common method when parsing complicated & ambiguous languages like TLA⁺. | ||
| + | |||
| + | ===== Section 6.4: Wiring up the Parser ===== | ||
| + | |||
| + | In [[https:// | ||
| + | In the '' | ||
| + | |||
| + | <code java [highlight_lines_extra=" | ||
| + | List< | ||
| + | |||
| + | Parser parser = new Parser(tokens); | ||
| + | Expr expression = parser.parse(); | ||
| + | |||
| + | // Stop if there was a syntax error. | ||
| + | if (hadError) return; | ||
| + | |||
| + | System.out.println(new AstPrinter().print(expression)); | ||
| + | } | ||
| + | </ | ||
| + | |||
| + | Fire up the interpreter and parse a TLA⁺ expression! | ||
| + | < | ||
| + | > {1 + 2, IF TRUE THEN 3 ELSE 4, {}} | ||
| + | ({ (+ 1 2) (IF true 3 4) ({)) | ||
| + | </ | ||
| + | If you got out of sync, you can find a snapshot of the expected state of the code in [[https:// | ||
| + | Next tutorial: [[creating: | ||
| + | |||
| + | ====== Challenges ====== | ||
| + | |||
| + | Here are some optional challenges to flesh out your TLA⁺ interpreter, | ||
| + | |||
| + | [[creating: | ||