Differences

This shows you the differences between two versions of the page.

--- creating:expressions [2025/04/11 17:58] – Explain operator parsing logic ahelwer
+++ creating:expressions [2025/05/13 15:58] (current) – Fixed lookahead in set literal parsing ahelwer
@@ Line 1: / Line 1: @@
-====== Handling Constant TLA⁺ Expressions ======
+======= Parsing Constant TLA⁺ Expressions =======
-This tutorial page covers the next three chapters in //Crafting Interpreters//:
+This tutorial page covers the next two chapters in //Crafting Interpreters//:
   - [[https://craftinginterpreters.com/representing-code.html|Chapter 5: Representing Code]]
   - [[https://craftinginterpreters.com/parsing-expressions.html|Chapter 6: Parsing Expressions]]
-  - [[https://craftinginterpreters.com/evaluating-expressions.html|Chapter 7: Evaluating Expressions]]
 Same as the book, we //could// build a parser for our entire minimal TLA⁺ language subset before moving on to interpreting it, but that would be boring!
@@ Line 15: / Line 14: @@
 First read the section of the book, then read the corresponding commentary & modifications given by this tutorial.
-===== Chapter 5: Representing Code =====
+====== Chapter 5: Representing Code ======
 [[https://craftinginterpreters.com/representing-code.html|Chapter 5]] focuses more on concepts than code, so this section will not have many TLA⁺-specific modifications.
 However, since this tutorial is intended to be self-contained code-wise, all necessary code is reproduced.
-==== Section 5.1: Context-Free Grammars ====
+===== Section 5.1: Context-Free Grammars =====
 Everything before [[https://craftinginterpreters.com/representing-code.html#a-grammar-for-lox-expressions|Section 5.1.3: A Grammar for Lox expressions]] applies equally to TLA⁺.
@@ Line 38: / Line 37: @@
 ternary        → "IF" expression "THEN" expression "ELSE" expression;
 variadic       → "{" ( expression ( "," expression )* )? "}"
-operator       → "=" | "+" | "-" | ".." | "/\" | "\/" | "<"  | "\in" ;
+operator       → "=" | "+" | "-" | ".." | "<"  | "\in" ;
 </code>
 There are a few interesting differences.
@@ Line 51: / Line 50: @@
 The only difference between ''{1, 2, 3}'' and an operator like ''constructSet(1, 2, 3)'' is syntactic sugar.
 This is the perspective of a language implementer.
-Later on we will extend the definition of ''variadic'' to include vertically-aligned conjunction & disjunction lists.
-==== Section 5.2: Implementing Syntax Trees ====
+Note that we //could// also include the conjunction ''/\'' and disjunction ''\/'' infix operators here but they have odd parsing interactions with vertically-aligned conjunction & disjunction lists, so we will handle them in a later chapter dedicated to that topic.
+===== Section 5.2: Implementing Syntax Trees =====
 While some programmers might have an aversion to generating code, the approach taken in the book is actually very convenient - as you will discover if you take some time to prototype & play around with different class representations of the parse tree!
@@ Line 59: / Line 59: @@
 <code java [highlight_lines_extra="19,20,21"]>
-package com.craftinginterpreters.tool;
+package tool;
 import java.io.IOException;
@@ Line 94: / Line 94: @@
     PrintWriter writer = new PrintWriter(path, "UTF-8");
-    writer.println("package com.craftinginterpreters.tla;");
+    writer.println("package tla;");
     writer.println();
     writer.println("import java.util.List;");
@@ Line 144: / Line 144: @@
-==== Section 5.3: Working with Trees ====
+===== Section 5.3: Working with Trees =====
 [[https://craftinginterpreters.com/representing-code.html#working-with-trees|Section 5.3]] introduces the Visitor pattern.
@@ Line 201: / Line 201: @@
 Our generated syntax tree node types now support the visitor pattern!
-==== Section 5.4: A (Not Very) Pretty Printer ====
+===== Section 5.4: A (Not Very) Pretty Printer =====
 [[https://craftinginterpreters.com/representing-code.html#a-not-very-pretty-printer|Section 5.4]] provides an implementation of a visitor called ''AstPrinter'' that prints out the parse tree.
@@ Line 207: / Line 207: @@
 <code java [highlight_lines_extra="1"]>
-package com.craftinginterpreters.tla;
+package tla;
 class AstPrinter implements Expr.Visitor<String> {
@@ Line 273: / Line 273: @@
 It isn't necessary to define a ''main'' method in the ''AstPrinter'' class, but if you'd like to try it out you are free to copy the one given in the book.
-===== Chapter 6: Parsing Expressions =====
+====== Chapter 6: Parsing Expressions ======
 In [[https://craftinginterpreters.com/parsing-expressions.html|Chapter 6]], we finally build a parse tree out of our tokens!
-==== Chapter 6.1: Ambiguity and the Parsing Game ====
+===== Section 6.1: Ambiguity and the Parsing Game =====
 First, as in [[https://craftinginterpreters.com/parsing-expressions.html#ambiguity-and-the-parsing-game|section 6.1 of the book]], we have to disambiguate our grammar.
@@ Line 299: / Line 299: @@
 Worry not, it can still be made terse and understandable!
-==== Chapter 6.2: Recursive Descent Parsing ====
+===== Section 6.2: Recursive Descent Parsing =====
 [[https://craftinginterpreters.com/parsing-expressions.html#recursive-descent-parsing|Section 6.2]] is where we start writing our parser in the recursive descent style.
 Recursive descent can seem a bit magical even if you're used to reasoning about recursive functions!
 Unfortunately TLA⁺ requires us to push this magic even further.
-Worry not, we'll take it step by step.
+We'll take it step by step.
-We start with the same basic ''Parser.java'' file as in the book, only renaming ''lox'' to ''tla'' in the package name as usual:
+We start with the same basic ''Parser.java'' file as in the book, renaming ''lox'' to ''tla'' in the package name as usual.
+We also add an import for ''ArrayList'', which we will later make use of when parsing variadic operators:
-<code java [highlight_lines_extra="1,5"]>
+<code java [highlight_lines_extra="1,4,6"]>
-package com.craftinginterpreters.tla;
+package tla;
 import java.util.List;
+import java.util.ArrayList;
-import static com.craftinginterpreters.tla.TokenType.*;
+import static tla.TokenType.*;
 class Parser {
@@ Line 337: / Line 339: @@
 private Expr operatorExpressionPrec15() { ... }
 </code>
-where each ''operatorExpressionPrecN'' function parses all the prefix, infix, and postfix operators of precedence ''N'', and calls ''operatorExpressionPrecN+1''.
+where each ''operatorExpressionPrecN()'' function parses all the prefix, infix, and postfix operators of precedence ''N'', and calls ''operatorExpressionPrecN+1()''.
 Life is too short for this.
 Instead, we'll adopt a technique alluded to in the text:
@@ Line 343: / Line 345: @@
 Here's the skeleton of our operator parsing function; the trick is to make the precedence a parameter to the function instead of a component of the name.
-Add this code after the ''Parser'' constructor:
+Add this code after the ''Parser()'' constructor:
 <code java>
   private Expr expression() {
@@ Line 358: / Line 360: @@
 </code>
-Before filling out ''operatorExpression'', we'll add the helper methods; these form an incredibly well-designed parsing API and are unchanged from the book:
+Before filling out ''operatorExpression()'', we'll add the helper methods; these form an incredibly well-designed parsing API and are unchanged from the book:
 <code java>
@@ Line 396: / Line 398: @@
 Now we have to define a table of operators with their details.
-For this, create a new file ''Operator.java'' containing a class recording each operator's fix type, token type, associativity, and precedence range:
+First use the handy Java 17 [[https://openjdk.org/jeps/395|records]] feature to quickly define a new ''Operator'' dataclass; this will hold the attributes of the operators we want to parse.
+Put it at the top of the ''Parser'' class:
-<code java>
+<code java [highlight_lines_extra="2,3,4"]>
-package com.craftinginterpreters.tla;
+class Parser {
+  private static enum Fix { PREFIX, INFIX, POSTFIX }
+  private static record Operator(Fix fix, TokenType token,
+      boolean assoc, int lowPrec, int highPrec) {}
-enum Fix {
+  private final List<Token> tokens;
-  PREFIX, INFIX, POSTFIX
-}
-class Operator {
-  final Fix fix;
-  final TokenType token;
-  final boolean assoc;
-  final int lowPrec;
-  final int highPrec;
-  public Operator(Fix fix, TokenType token, boolean assoc,
-                  int lowPrec, int highPrec) {
-    this.fix = fix;
-    this.token = token;
-    this.assoc = assoc;
-    this.lowPrec = lowPrec;
-    this.highPrec = highPrec;
-  }
-}
 </code>
-For convenience, import the ''Fix'' enum values in ''Parser.java'' so they can be referenced directly:
-<code java [highlight_lines_extra="7"]>
-package com.craftinginterpreters.tla;
-import java.util.List;
-import java.util.ArrayList;
-import static com.craftinginterpreters.tla.TokenType.*;
-import static com.craftinginterpreters.tla.Fix.*;
-class Parser {
-</code>
 You can find operator attributes on page 271 of //[[https://lamport.azurewebsites.net/tla/book.html|Specifying Systems]]// by Leslie Lamport, or [[https://github.com/tlaplus/tlaplus/blob/13e5a39b5368a6da4906b8ed1c2c1114d2e7de15/tlatools/org.lamport.tlatools/src/tla2sany/parser/Operators.java#L130-L234|this TLA⁺ tools source file]].
 We use a small subset of these operators.
-Record their attributes in a table in ''Parser.java'', below the ''operatorExpression'' method:
+Record their attributes in a table in ''Parser.java'', below ''operatorExpression()'':
 <code java>
   private static final Operator[] operators = new Operator[] {
-    new Operator(PREFIX,  NEGATION,   true,   4,  4 ),
+    new Operator(Fix.PREFIX,  NOT,        true,   4,  4 ),
-    new Operator(PREFIX,  ENABLED,    false,  4,  15),
+    new Operator(Fix.PREFIX,  ENABLED,    false,  4,  15),
-    new Operator(PREFIX,  MINUS,      true,   12, 12),
+    new Operator(Fix.PREFIX,  MINUS,      true,   12, 12),
-    new Operator(INFIX,   AND,        true,   3,  3 ),
+    new Operator(Fix.INFIX,   IN,         false,  5,  5 ),
-    new Operator(INFIX,   OR,         true,   3,  3 ),
+    new Operator(Fix.INFIX,   EQUAL,      false,  5,  5 ),
-    new Operator(INFIX,   IN,         false,  5,  5 ),
+    new Operator(Fix.INFIX,   LESS_THAN,  false,  5,  5 ),
-    new Operator(INFIX,   EQUAL,      false,  5,  5 ),
+    new Operator(Fix.INFIX,   DOT_DOT,    false,  9,  9 ),
-    new Operator(INFIX,   LESS_THAN,  false,  5,  5 ),
+    new Operator(Fix.INFIX,   PLUS,       true,   10, 10),
-    new Operator(INFIX,   DOT_DOT,    false,  9,  9 ),
+    new Operator(Fix.INFIX,   MINUS,      true,   11, 11),
-    new Operator(INFIX,   PLUS,       true,   10, 10),
+    new Operator(Fix.POSTFIX, PRIME,      false,  15, 15),
-    new Operator(INFIX,   MINUS,      true,   11, 11),
-    new Operator(POSTFIX, PRIME,      false,  15, 15),
   };
 </code>
@@ Line 467: / Line 437: @@
 While this design decision is unusual, it is [[https://groups.google.com/g/tlaplus/c/Bt1fl8GvizE/m/ohYTfQWiCgAJ|unlikely to cause any problems]].
-We need one more helper method before we can start working on ''operatorExpression'': a superpowered ''match'' method specific to operators, which will try to match any operators of a given fix type & (low) precedence.
+We need one more helper method before we can start working on ''operatorExpression()'': a superpowered ''match()'' method specific to operators, which will try to match any operators of a given fix type & (low) precedence.
-Add this code above ''match'':
+Add this code above ''match()'':
 <code java>
   private Operator matchOp(Fix fix, int prec) {
@@ Line 482: / Line 452: @@
 Okay, we're all set for the main event!
-Here is how we modify our ''operatorExpression'' method to parse infix operators.
+Here is how we modify ''operatorExpression()'' to parse infix operators.
 You can see a strong resemblance to the infix operator parsing code given in the book (new lines highlighted):
 <code java [highlight_lines_extra="4,7,8,9,10,11"]>
@@ Line 489: / Line 459: @@
     Operator op;
     Expr expr = operatorExpression(prec + 1);
-    while ((op = matchOp(INFIX, prec)) != null) {
+    while ((op = matchOp(Fix.INFIX, prec)) != null) {
       Token operator = previous();
       Expr right = operatorExpression(op.highPrec + 1);
       expr = new Expr.Binary(expr, operator, right);
     }
     return expr;
   }
@@ Line 502: / Line 472: @@
 We use Java's combined conditional-assignment atop the ''while'' loop to make our code more terse, both checking whether any operators were matched and getting details of the matched operator if so.
-The interior of the ''while'' loop is largely identical to infix parsing logic from the book, except for when we recurse to a higher precedence level: we go directly past the upper bound of the precedence range.
+The interior of the ''while'' loop is largely identical to infix parsing logic from the book, except for when we recurse to a higher precedence level: in the book the code recurses one level above the matched operator's precedence, but here we go one level above the //upper bound// of the matched operator's precedence.
-It takes a bit of thinking to understand why this works for parsing expressions according to the precedence we want, but it does.
+It takes a bit of thinking to understand why this works for parsing expressions according to the precedence rules we want, but it does.
 Take a minute to ponder it.
 If you still don't get it, wait until we have a fully-functional expression parser then play around with this line to see how it changes the behavior.
-This code implements precedence ranges, but assumes all infix operators are associative.
+Our code implements precedence ranges, but assumes all infix operators are associative.
-We need to modify the loop to break if the infix operator is not associative:
+We need to modify the loop to return immediately if the infix operator is not associative:
 <code java [highlight_lines_extra="5"]>
-    while ((op = matchOp(INFIX, prec)) != null) {
+    while ((op = matchOp(Fix.INFIX, prec)) != null) {
       Token operator = previous();
       Expr right = operatorExpression(op.highPrec + 1);
       expr = new Expr.Binary(expr, operator, right);
-      if (!op.assoc) break;
+      if (!op.assoc) return expr;
     }
 </code>
@@ Line 521: / Line 491: @@
 That was the most complicated case, so let's move on to prefix operators.
 Again our code resembles the prefix operator parsing logic given in the book.
-Add this snippet near the top of the ''operatorExpression'' method:
+Add this snippet near the top of ''operatorExpression()'':
 <code java [highlight_lines_extra="5,6,7,8,9,10"]>
@@ Line 528: / Line 498: @@
     Operator op;
-    if ((op = matchOp(PREFIX, prec)) != null) {
+    if ((op = matchOp(Fix.PREFIX, prec)) != null) {
       Token opToken = previous();
       Expr expr = operatorExpression(
-                  op.assoc ? op.lowPrec : op.highPrec + 1);
+                    op.assoc ? op.lowPrec : op.highPrec + 1);
       return new Expr.Unary(opToken, expr);
     }
@@ Line 541: / Line 511: @@
 To understand this expression, it is helpful to consider an example.
 The negative prefix operator ''-1'' is associative, so we should be able to parse the expression ''--1'' as ''-(-1)''.
-In order to do that after consuming the first ''-'' we have to recurse into ''operatorExpression'' //at the same precedence level//.
+In order to do that after consuming the first ''-'' we have to recurse into ''operatorExpression()'' //at the same precedence level//.
 Then we will consume the second ''-'', then again recurse and ultimately consume ''1'' in our yet-to-be-defined ''primary'' method.
 In contrast, the prefix operator ''ENABLED'' is not associative and has range 4-15.
@@ Line 549: / Line 519: @@
 All that remains is to parse postfix operators.
 These are not covered in the book, but they pretty much resemble infix operators without matching an expression on the right-hand side of the operator.
-Add the highlighted code below the ''INFIX'' logic in ''operatorExpression'':
+Add the highlighted code below the ''INFIX'' logic in ''operatorExpression()'':
 <code java [highlight_lines_extra="3,4,5,6,7"]>
     }
-    while ((op = matchOp(POSTFIX, prec)) != null) {
+    while ((op = matchOp(Fix.POSTFIX, prec)) != null) {
       Token opToken = previous();
       expr = new Expr.Unary(opToken, expr);
@@ Line 565: / Line 535: @@
 And that completes our operator parsing logic!
+All that remains is our ''primary()'' method.
+It is fairly similar to that given in the book, with some omissions since our minimal TLA⁺ subset does not contain strings or null values.
+Add this code after ''operatorExpression()'':
+<code java>
+  private Expr primary() {
+    if (match(FALSE)) return new Expr.Literal(false);
+    if (match(TRUE)) return new Expr.Literal(true);
+    if (match(NUMBER)) {
+      return new Expr.Literal(previous().literal);
+    }
+    if (match(LEFT_PAREN)) {
+      Expr expr = expression();
+      consume(RIGHT_PAREN, "Expect ')' after expression.");
+      return new Expr.Grouping(expr);
+    }
+  }
+</code>
+We have two final additions to ''primary()'', the ''IF''/''THEN''/''ELSE'' operator and the set construction operator.
+Here's how we parse ''IF''/''THEN''/''ELSE''; add the highlighted code to the bottom of ''primary()'':
+<code java [highlight_lines_extra="4,5,6,7,8,9,10,11,12"]>
+      return new Expr.Grouping(expr);
+    }
+    if (match(IF)) {
+      Token operator = previous();
+      Expr condition = expression();
+      consume(THEN, "'THEN' required after 'IF' expression.");
+      Expr yes = expression();
+      consume(ELSE, "'ELSE' required after 'THEN' expression.");
+      Expr no = expression();
+      return new Expr.Ternary(operator, condition, yes, no);
+    }
+  }
+</code>
+And here's how we parse the set construction operator.
+Again add the highlighted code to the bottom of ''primary()'':
+<code java [highlight_lines_extra="4,5,6,7,8,9,10,11,12,13,14"]>
+      return new Expr.Ternary(operator, condition, yes, no);
+    }
+    if (match(LEFT_BRACE)) {
+      Token operator = previous();
+      List<Expr> elements = new ArrayList<>();
+      if (!check(RIGHT_BRACE)) {
+        do {
+          elements.add(expression());
+        } while (match(COMMA));
+      }
+      consume(RIGHT_BRACE, "'}' is required to terminate finite set literal.");
+      return new Expr.Variadic(operator, elements);
+    }
+  }
+</code>
+This will handle expressions like ''{1,2,3}'' and (crucially) ''{}''.
+The ''consume()'' operator is defined in the next section, along with error reporting generally.
+===== Section 6.3: Syntax Errors =====
+We don't need to modify most of the error reporting code given in [[https://craftinginterpreters.com/parsing-expressions.html#syntax-errors|Section 6.3]].
+Here it is; add the ''consume()'' method after ''match()'' in the ''Parser'' class:
+<code java>
+  private Token consume(TokenType type, String message) {
+    if (check(type)) return advance();
+    throw error(peek(), message);
+  }
+</code>
+Then add the ''error()'' method after ''previous()'':
+<code java>
+  private ParseError error(Token token, String message) {
+    Lox.error(token, message);
+    return new ParseError();
+  }
+</code>
+In the ''TlaPlus'' class, add the ''error()'' method after ''report()'':
+<code java>
+  static void error(Token token, String message) {
+    if (token.type == TokenType.EOF) {
+      report(token.line, " at end", message);
+    } else {
+      report(token.line, " at '" + token.lexeme + "'", message);
+    }
+  }
+</code>
+Add the ''ParseError'' class definition at the top of the ''Parser'' class:
+<code java [highlight_lines_extra="5"]>
+class Parser {
+  private static enum Fix { PREFIX, INFIX, POSTFIX }
+  private static record Operator(Fix fix, TokenType token,
+      boolean assoc, int lowPrec, int highPrec) {}
+  private static class ParseError extends RuntimeException {}
+  private final List<Token> tokens;
+</code>
+One piece of code we //do// need to modify is the ''synchronization()'' method.
+TLA⁺ doesn't have statements separated by semicolons, so our best choice of synchronization point is the ''EQUAL_EQUAL'' symbol denoting an operator definition; add this in the ''Parser'' class after ''error()'':
+<code java>
+  private void synchronize() {
+    advance();
+    while(!isAtEnd()) {
+      if (previous().type == EQUAL_EQUAL) return;
+      advance();
+    }
+  }
+</code>
+Fully-featured TLA⁺ parsers generally look ahead to identify the start of a new operator definition (not trivial!) and insert a synthetic zero-width token for the parser to consume.
+For an example, see SANY's very complicated ''[[https://github.com/tlaplus/tlaplus/blob/13e5a39b5368a6da4906b8ed1c2c1114d2e7de15/tlatools/org.lamport.tlatools/javacc/tla%2B.jj#L229-L392|belchDEF()]]'' method.
+We won't be doing that here, but it's a good technique to know about; looking ahead then inserting synthetic tokens to disambiguate syntax is a common method when parsing complicated & ambiguous languages like TLA⁺.
+===== Section 6.4: Wiring up the Parser =====
+In [[https://craftinginterpreters.com/parsing-expressions.html#wiring-up-the-parser|section 6.4]] we finally arrive at the point where you can parse TLA⁺ expressions in your REPL.
+In the ''run()'' method of the ''TlaPlus'' class, delete the code printing out the scanned tokens (or not, it can be informative to keep around) and replace it with this:
+<code java [highlight_lines_extra="3,4,6,7,9"]>
+    List<Token> tokens = scanner.scanTokens();
+    Parser parser = new Parser(tokens);
+    Expr expression = parser.parse();
+    // Stop if there was a syntax error.
+    if (hadError) return;
+    System.out.println(new AstPrinter().print(expression));
+  }
+</code>
+Fire up the interpreter and parse a TLA⁺ expression!
+<code>
+> {1 + 2, IF TRUE THEN 3 ELSE 4, {}}
+({ (+ 1 2) (IF true 3 4) ({))
+</code>
+If you got out of sync, you can find a snapshot of the expected state of the code in [[https://github.com/tlaplus/devkit/tree/main/3-expressions|this repo directory]].
+Next tutorial: [[creating:evaluation|Evaluating Constant TLA⁺ Expressions]]!
+====== Challenges ======
+Here are some optional challenges to flesh out your TLA⁺ interpreter, roughly ranked from simplest to most difficult.
+[[creating:scanning|< Previous Page]] | [[creating:start#table_of_contents|Table of Contents]] | [[creating:evaluation|Next Page >]]