Differences
This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
| creating:expressions [2025/04/07 00:23] – Wrote tutorial mirroring chapter 5 ahelwer | creating:expressions [2025/05/13 15:58] (current) – Fixed lookahead in set literal parsing ahelwer | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | ====== | + | ======= Parsing |
| - | This tutorial page covers the next three chapters in //Crafting Interpreters//: | + | This tutorial page covers the next two chapters in //Crafting Interpreters//: |
| - [[https:// | - [[https:// | ||
| - [[https:// | - [[https:// | ||
| - | - [[https:// | ||
| Same as the book, we //could// build a parser for our entire minimal TLA⁺ language subset before moving on to interpreting it, but that would be boring! | Same as the book, we //could// build a parser for our entire minimal TLA⁺ language subset before moving on to interpreting it, but that would be boring! | ||
| Line 15: | Line 14: | ||
| First read the section of the book, then read the corresponding commentary & modifications given by this tutorial. | First read the section of the book, then read the corresponding commentary & modifications given by this tutorial. | ||
| - | ===== Chapter 5: Representing Code ===== | + | ====== Chapter 5: Representing Code ====== |
| [[https:// | [[https:// | ||
| However, since this tutorial is intended to be self-contained code-wise, all necessary code is reproduced. | However, since this tutorial is intended to be self-contained code-wise, all necessary code is reproduced. | ||
| - | ==== Section 5.1: Context-Free Grammars ==== | + | ===== Section 5.1: Context-Free Grammars |
| Everything before [[https:// | Everything before [[https:// | ||
| The ambiguous first-draft grammar for Lox expressions can be adapted to TLA⁺: | The ambiguous first-draft grammar for Lox expressions can be adapted to TLA⁺: | ||
| - | < | + | < |
| expression | expression | ||
| | unary | | unary | ||
| | binary | | binary | ||
| | ternary | | ternary | ||
| - | | + | |
| | grouping ; | | grouping ; | ||
| Line 37: | Line 36: | ||
| binary | binary | ||
| ternary | ternary | ||
| - | set | + | variadic |
| - | operator | + | operator |
| </ | </ | ||
| There are a few interesting differences. | There are a few interesting differences. | ||
| The '' | The '' | ||
| The '' | The '' | ||
| - | There is also the '' | ||
| The operators are changed to the set of operators defined in our TLA⁺ implementation. | The operators are changed to the set of operators defined in our TLA⁺ implementation. | ||
| These are all the expressions we can use without introducing the concept of identifiers referring to something else. | These are all the expressions we can use without introducing the concept of identifiers referring to something else. | ||
| - | ==== Section 5.2: Implementing Syntax Trees ==== | + | There is also the '' |
| + | This one is so named because it's where we'll put operators accepting varying numbers of parameters. | ||
| + | It's kind of weird to think of the finite set literal '' | ||
| + | The only difference between '' | ||
| + | This is the perspective of a language implementer. | ||
| + | |||
| + | Note that we //could// also include the conjunction ''/ | ||
| + | |||
| + | ===== Section 5.2: Implementing Syntax Trees ===== | ||
| While some programmers might have an aversion to generating code, the approach taken in the book is actually very convenient - as you will discover if you take some time to prototype & play around with different class representations of the parse tree! | While some programmers might have an aversion to generating code, the approach taken in the book is actually very convenient - as you will discover if you take some time to prototype & play around with different class representations of the parse tree! | ||
| Line 53: | Line 59: | ||
| <code java [highlight_lines_extra=" | <code java [highlight_lines_extra=" | ||
| - | package | + | package tool; |
| import java.io.IOException; | import java.io.IOException; | ||
| Line 73: | Line 79: | ||
| " | " | ||
| " | " | ||
| - | "Set | + | "Variadic |
| )); | )); | ||
| } | } | ||
| Line 88: | Line 94: | ||
| PrintWriter writer = new PrintWriter(path, | PrintWriter writer = new PrintWriter(path, | ||
| - | writer.println(" | + | writer.println(" |
| writer.println(); | writer.println(); | ||
| writer.println(" | writer.println(" | ||
| Line 138: | Line 144: | ||
| - | ==== Section 5.3: Working with Trees ==== | + | ===== Section 5.3: Working with Trees ===== |
| [[https:// | [[https:// | ||
| Line 195: | Line 201: | ||
| Our generated syntax tree node types now support the visitor pattern! | Our generated syntax tree node types now support the visitor pattern! | ||
| - | ==== Section 5.4: A (Not Very) Pretty Printer ==== | + | ===== Section 5.4: A (Not Very) Pretty Printer |
| [[https:// | [[https:// | ||
| Line 201: | Line 207: | ||
| <code java [highlight_lines_extra=" | <code java [highlight_lines_extra=" | ||
| - | package | + | package tla; |
| class AstPrinter implements Expr.Visitor< | class AstPrinter implements Expr.Visitor< | ||
| Line 242: | Line 248: | ||
| @Override | @Override | ||
| - | public string | + | public string |
| - | return parenthesize(" | + | return parenthesize(expr.operator.lexeme, |
| + | | ||
| } | } | ||
| </ | </ | ||
| Line 265: | Line 272: | ||
| It isn't necessary to define a '' | It isn't necessary to define a '' | ||
| + | |||
| + | ====== Chapter 6: Parsing Expressions ====== | ||
| + | |||
| + | In [[https:// | ||
| + | |||
| + | ===== Section 6.1: Ambiguity and the Parsing Game ===== | ||
| + | |||
| + | First, as in [[https:// | ||
| + | The way that precedence works in TLA⁺ is different from Lox, and indeed different from most other languages! | ||
| + | One side-quote from this section of the book reads: | ||
| + | |||
| + | >While not common these days, some languages specify that certain pairs of operators have no relative precedence. That makes it a syntax error to mix those operators in an expression without using explicit grouping. | ||
| + | |||
| + | > | ||
| + | |||
| + | TLA⁺ has both of these features! | ||
| + | Instead of operators occupying a slot in some hierarchy of precedence, each operator has a precedence //range//. | ||
| + | When the precedence ranges of two operators overlap it is a parse error. | ||
| + | For example, the '' | ||
| + | < | ||
| + | ENABLED x' | ||
| + | </ | ||
| + | Users must add parentheses to indicate their desired grouping in the expression. | ||
| + | Similarly, some operators like '' | ||
| + | Both of these factors combine to make our operator parsing code quite a bit different from that given in the book. | ||
| + | Worry not, it can still be made terse and understandable! | ||
| + | |||
| + | ===== Section 6.2: Recursive Descent Parsing ===== | ||
| + | |||
| + | [[https:// | ||
| + | Recursive descent can seem a bit magical even if you're used to reasoning about recursive functions! | ||
| + | Unfortunately TLA⁺ requires us to push this magic even further. | ||
| + | We'll take it step by step. | ||
| + | |||
| + | We start with the same basic '' | ||
| + | We also add an import for '' | ||
| + | |||
| + | <code java [highlight_lines_extra=" | ||
| + | package tla; | ||
| + | |||
| + | import java.util.List; | ||
| + | import java.util.ArrayList; | ||
| + | |||
| + | import static tla.TokenType.*; | ||
| + | |||
| + | class Parser { | ||
| + | private final List< | ||
| + | private int current = 0; | ||
| + | |||
| + | Parser(List< | ||
| + | this.tokens = tokens; | ||
| + | } | ||
| + | } | ||
| + | </ | ||
| + | |||
| + | Now we hit our first major difference. | ||
| + | In Lox, precedence is given by a small hierarchy of named rules like '' | ||
| + | TLA⁺ is more complicated than that. | ||
| + | The full language has around 100 operators spanning precedences 1 to 15! | ||
| + | If we wanted to match the book's style we'd have to write a tower of recursive functions like: | ||
| + | <code java> | ||
| + | private Expr operatorExpressionPrec1() { ... } | ||
| + | private Expr operatorExpressionPrec2() { ... } | ||
| + | private Expr operatorExpressionPrec3() { ... } | ||
| + | ... | ||
| + | private Expr operatorExpressionPrec15() { ... } | ||
| + | </ | ||
| + | where each '' | ||
| + | Life is too short for this. | ||
| + | Instead, we'll adopt a technique alluded to in the text: | ||
| + | >If you wanted to do some clever Java 8, you could create a helper method for parsing a left-associative series of binary operators given a list of token types, and an operand method handle to simplify this redundant code. | ||
| + | |||
| + | Here's the skeleton of our operator parsing function; the trick is to make the precedence a parameter to the function instead of a component of the name. | ||
| + | Add this code after the '' | ||
| + | <code java> | ||
| + | private Expr expression() { | ||
| + | return operatorExpression(1); | ||
| + | } | ||
| + | |||
| + | private Expr operatorExpression(int prec) { | ||
| + | if (prec == 16) return primary(); | ||
| + | |||
| + | Expr expr = operatorExpression(prec + 1); | ||
| + | |||
| + | return expr; | ||
| + | } | ||
| + | </ | ||
| + | |||
| + | Before filling out '' | ||
| + | |||
| + | <code java> | ||
| + | private boolean match(TokenType... types) { | ||
| + | for (TokenType type : types) { | ||
| + | if (check(type)) { | ||
| + | advance(); | ||
| + | return true; | ||
| + | } | ||
| + | } | ||
| + | |||
| + | return false; | ||
| + | } | ||
| + | |||
| + | private boolean check(TokenType type) { | ||
| + | if (isAtEnd()) return false; | ||
| + | return peek().type == type; | ||
| + | } | ||
| + | |||
| + | private Token advance() { | ||
| + | if (!isAtEnd()) current++; | ||
| + | return previous(); | ||
| + | } | ||
| + | |||
| + | private boolean isAtEnd() { | ||
| + | return peek().type == EOF; | ||
| + | } | ||
| + | |||
| + | private Token peek() { | ||
| + | return tokens.get(current); | ||
| + | } | ||
| + | |||
| + | private Token previous() { | ||
| + | return tokens.get(current - 1); | ||
| + | } | ||
| + | </ | ||
| + | |||
| + | Now we have to define a table of operators with their details. | ||
| + | First use the handy Java 17 [[https:// | ||
| + | Put it at the top of the '' | ||
| + | |||
| + | <code java [highlight_lines_extra=" | ||
| + | class Parser { | ||
| + | private static enum Fix { PREFIX, INFIX, POSTFIX } | ||
| + | private static record Operator(Fix fix, TokenType token, | ||
| + | boolean assoc, int lowPrec, int highPrec) {} | ||
| + | |||
| + | private final List< | ||
| + | </ | ||
| + | |||
| + | You can find operator attributes on page 271 of // | ||
| + | We use a small subset of these operators. | ||
| + | Record their attributes in a table in '' | ||
| + | |||
| + | <code java> | ||
| + | private static final Operator[] operators = new Operator[] { | ||
| + | new Operator(Fix.PREFIX, | ||
| + | new Operator(Fix.PREFIX, | ||
| + | new Operator(Fix.PREFIX, | ||
| + | new Operator(Fix.INFIX, | ||
| + | new Operator(Fix.INFIX, | ||
| + | new Operator(Fix.INFIX, | ||
| + | new Operator(Fix.INFIX, | ||
| + | new Operator(Fix.INFIX, | ||
| + | new Operator(Fix.INFIX, | ||
| + | new Operator(Fix.POSTFIX, | ||
| + | }; | ||
| + | </ | ||
| + | |||
| + | Here's something a bit odd. | ||
| + | In TLA⁺, the infix minus operator (subtraction) has higher precedence at 11-11 than the infix plus operator (addition) at 10-10! | ||
| + | In grade school you probably learned acronyms like PEMDAS or BEDMAS to remember the order of operations in arithmetic. | ||
| + | Really, you learned parsing rules! | ||
| + | Now when writing our own parsing algorithm we come to understand that the order of these operations is not inscribed in the bedrock of mathematical truth but instead is a simple convention of mathematical notation. | ||
| + | TLA⁺ subverts this by parsing the expression '' | ||
| + | While this design decision is unusual, it is [[https:// | ||
| + | |||
| + | We need one more helper method before we can start working on '' | ||
| + | Add this code above '' | ||
| + | <code java> | ||
| + | private Operator matchOp(Fix fix, int prec) { | ||
| + | for (Operator op : operators) { | ||
| + | if (op.fix == fix && op.lowPrec == prec) { | ||
| + | if (match(op.token)) return op; | ||
| + | } | ||
| + | } | ||
| + | |||
| + | return null; | ||
| + | } | ||
| + | </ | ||
| + | |||
| + | Okay, we're all set for the main event! | ||
| + | Here is how we modify '' | ||
| + | You can see a strong resemblance to the infix operator parsing code given in the book (new lines highlighted): | ||
| + | <code java [highlight_lines_extra=" | ||
| + | private Expr operatorExpression(int prec) { | ||
| + | if (prec == 16) return primary(); | ||
| + | |||
| + | Operator op; | ||
| + | |||
| + | Expr expr = operatorExpression(prec + 1); | ||
| + | while ((op = matchOp(Fix.INFIX, | ||
| + | Token operator = previous(); | ||
| + | Expr right = operatorExpression(op.highPrec + 1); | ||
| + | expr = new Expr.Binary(expr, | ||
| + | } | ||
| + | |||
| + | return expr; | ||
| + | } | ||
| + | </ | ||
| + | |||
| + | We use Java's combined conditional-assignment atop the '' | ||
| + | The interior of the '' | ||
| + | It takes a bit of thinking to understand why this works for parsing expressions according to the precedence rules we want, but it does. | ||
| + | Take a minute to ponder it. | ||
| + | If you still don't get it, wait until we have a fully-functional expression parser then play around with this line to see how it changes the behavior. | ||
| + | |||
| + | Our code implements precedence ranges, but assumes all infix operators are associative. | ||
| + | We need to modify the loop to return immediately if the infix operator is not associative: | ||
| + | <code java [highlight_lines_extra=" | ||
| + | while ((op = matchOp(Fix.INFIX, | ||
| + | Token operator = previous(); | ||
| + | Expr right = operatorExpression(op.highPrec + 1); | ||
| + | expr = new Expr.Binary(expr, | ||
| + | if (!op.assoc) return expr; | ||
| + | } | ||
| + | </ | ||
| + | |||
| + | And that's how we parse infix operators! | ||
| + | That was the most complicated case, so let's move on to prefix operators. | ||
| + | Again our code resembles the prefix operator parsing logic given in the book. | ||
| + | Add this snippet near the top of '' | ||
| + | |||
| + | <code java [highlight_lines_extra=" | ||
| + | private Expr operatorExpression(int prec) { | ||
| + | if (prec == 16) return primary(); | ||
| + | |||
| + | Operator op; | ||
| + | if ((op = matchOp(Fix.PREFIX, | ||
| + | Token opToken = previous(); | ||
| + | Expr expr = operatorExpression( | ||
| + | op.assoc ? op.lowPrec : op.highPrec + 1); | ||
| + | return new Expr.Unary(opToken, | ||
| + | } | ||
| + | |||
| + | Expr expr = operatorExpression(prec + 1); | ||
| + | </ | ||
| + | |||
| + | Of note is the expression '' | ||
| + | To understand this expression, it is helpful to consider an example. | ||
| + | The negative prefix operator '' | ||
| + | In order to do that after consuming the first '' | ||
| + | Then we will consume the second '' | ||
| + | In contrast, the prefix operator '' | ||
| + | In that case we want to reject the expression '' | ||
| + | Thus the second '' | ||
| + | |||
| + | All that remains is to parse postfix operators. | ||
| + | These are not covered in the book, but they pretty much resemble infix operators without matching an expression on the right-hand side of the operator. | ||
| + | Add the highlighted code below the '' | ||
| + | |||
| + | <code java [highlight_lines_extra=" | ||
| + | } | ||
| + | |||
| + | while ((op = matchOp(Fix.POSTFIX, | ||
| + | Token opToken = previous(); | ||
| + | expr = new Expr.Unary(opToken, | ||
| + | if (!op.assoc) break; | ||
| + | } | ||
| + | |||
| + | return expr | ||
| + | } | ||
| + | </ | ||
| + | |||
| + | And that completes our operator parsing logic! | ||
| + | All that remains is our '' | ||
| + | It is fairly similar to that given in the book, with some omissions since our minimal TLA⁺ subset does not contain strings or null values. | ||
| + | Add this code after '' | ||
| + | |||
| + | <code java> | ||
| + | private Expr primary() { | ||
| + | if (match(FALSE)) return new Expr.Literal(false); | ||
| + | if (match(TRUE)) return new Expr.Literal(true); | ||
| + | |||
| + | if (match(NUMBER)) { | ||
| + | return new Expr.Literal(previous().literal); | ||
| + | } | ||
| + | |||
| + | if (match(LEFT_PAREN)) { | ||
| + | Expr expr = expression(); | ||
| + | consume(RIGHT_PAREN, | ||
| + | return new Expr.Grouping(expr); | ||
| + | } | ||
| + | } | ||
| + | </ | ||
| + | |||
| + | We have two final additions to '' | ||
| + | Here's how we parse '' | ||
| + | |||
| + | <code java [highlight_lines_extra=" | ||
| + | return new Expr.Grouping(expr); | ||
| + | } | ||
| + | |||
| + | if (match(IF)) { | ||
| + | Token operator = previous(); | ||
| + | Expr condition = expression(); | ||
| + | consume(THEN, | ||
| + | Expr yes = expression(); | ||
| + | consume(ELSE, | ||
| + | Expr no = expression(); | ||
| + | return new Expr.Ternary(operator, | ||
| + | } | ||
| + | } | ||
| + | </ | ||
| + | |||
| + | And here's how we parse the set construction operator. | ||
| + | Again add the highlighted code to the bottom of '' | ||
| + | |||
| + | <code java [highlight_lines_extra=" | ||
| + | return new Expr.Ternary(operator, | ||
| + | } | ||
| + | |||
| + | if (match(LEFT_BRACE)) { | ||
| + | Token operator = previous(); | ||
| + | List< | ||
| + | if (!check(RIGHT_BRACE)) { | ||
| + | do { | ||
| + | elements.add(expression()); | ||
| + | } while (match(COMMA)); | ||
| + | } | ||
| + | consume(RIGHT_BRACE, | ||
| + | return new Expr.Variadic(operator, | ||
| + | } | ||
| + | } | ||
| + | </ | ||
| + | |||
| + | This will handle expressions like '' | ||
| + | |||
| + | The '' | ||
| + | |||
| + | ===== Section 6.3: Syntax Errors ===== | ||
| + | |||
| + | We don't need to modify most of the error reporting code given in [[https:// | ||
| + | Here it is; add the '' | ||
| + | |||
| + | <code java> | ||
| + | private Token consume(TokenType type, String message) { | ||
| + | if (check(type)) return advance(); | ||
| + | |||
| + | throw error(peek(), | ||
| + | } | ||
| + | </ | ||
| + | |||
| + | Then add the '' | ||
| + | |||
| + | <code java> | ||
| + | private ParseError error(Token token, String message) { | ||
| + | Lox.error(token, | ||
| + | return new ParseError(); | ||
| + | } | ||
| + | </ | ||
| + | |||
| + | In the '' | ||
| + | |||
| + | <code java> | ||
| + | static void error(Token token, String message) { | ||
| + | if (token.type == TokenType.EOF) { | ||
| + | report(token.line, | ||
| + | } else { | ||
| + | report(token.line, | ||
| + | } | ||
| + | } | ||
| + | </ | ||
| + | |||
| + | Add the '' | ||
| + | |||
| + | <code java [highlight_lines_extra=" | ||
| + | class Parser { | ||
| + | private static enum Fix { PREFIX, INFIX, POSTFIX } | ||
| + | private static record Operator(Fix fix, TokenType token, | ||
| + | boolean assoc, int lowPrec, int highPrec) {} | ||
| + | private static class ParseError extends RuntimeException {} | ||
| + | |||
| + | private final List< | ||
| + | </ | ||
| + | |||
| + | One piece of code we //do// need to modify is the '' | ||
| + | TLA⁺ doesn' | ||
| + | |||
| + | <code java> | ||
| + | private void synchronize() { | ||
| + | advance(); | ||
| + | |||
| + | while(!isAtEnd()) { | ||
| + | if (previous().type == EQUAL_EQUAL) return; | ||
| + | |||
| + | advance(); | ||
| + | } | ||
| + | } | ||
| + | </ | ||
| + | |||
| + | Fully-featured TLA⁺ parsers generally look ahead to identify the start of a new operator definition (not trivial!) and insert a synthetic zero-width token for the parser to consume. | ||
| + | For an example, see SANY's very complicated '' | ||
| + | We won't be doing that here, but it's a good technique to know about; looking ahead then inserting synthetic tokens to disambiguate syntax is a common method when parsing complicated & ambiguous languages like TLA⁺. | ||
| + | |||
| + | ===== Section 6.4: Wiring up the Parser ===== | ||
| + | |||
| + | In [[https:// | ||
| + | In the '' | ||
| + | |||
| + | <code java [highlight_lines_extra=" | ||
| + | List< | ||
| + | |||
| + | Parser parser = new Parser(tokens); | ||
| + | Expr expression = parser.parse(); | ||
| + | |||
| + | // Stop if there was a syntax error. | ||
| + | if (hadError) return; | ||
| + | |||
| + | System.out.println(new AstPrinter().print(expression)); | ||
| + | } | ||
| + | </ | ||
| + | |||
| + | Fire up the interpreter and parse a TLA⁺ expression! | ||
| + | < | ||
| + | > {1 + 2, IF TRUE THEN 3 ELSE 4, {}} | ||
| + | ({ (+ 1 2) (IF true 3 4) ({)) | ||
| + | </ | ||
| + | If you got out of sync, you can find a snapshot of the expected state of the code in [[https:// | ||
| + | Next tutorial: [[creating: | ||
| + | |||
| + | ====== Challenges ====== | ||
| + | |||
| + | Here are some optional challenges to flesh out your TLA⁺ interpreter, | ||
| + | |||
| + | [[creating: | ||