Differences

This shows you the differences between two versions of the page.

--- creating:expressions [2025/04/07 00:23] – Wrote tutorial mirroring chapter 5 ahelwer
+++ creating:expressions [2025/05/13 15:58] (current) – Fixed lookahead in set literal parsing ahelwer
@@ Line 1: / Line 1: @@
-====== Handling Constant TLA⁺ Expressions ======
+======= Parsing Constant TLA⁺ Expressions =======
-This tutorial page covers the next three chapters in //Crafting Interpreters//:
+This tutorial page covers the next two chapters in //Crafting Interpreters//:
   - [[https://craftinginterpreters.com/representing-code.html|Chapter 5: Representing Code]]
   - [[https://craftinginterpreters.com/parsing-expressions.html|Chapter 6: Parsing Expressions]]
-  - [[https://craftinginterpreters.com/evaluating-expressions.html|Chapter 7: Evaluating Expressions]]
 Same as the book, we //could// build a parser for our entire minimal TLA⁺ language subset before moving on to interpreting it, but that would be boring!
@@ Line 15: / Line 14: @@
 First read the section of the book, then read the corresponding commentary & modifications given by this tutorial.
-===== Chapter 5: Representing Code =====
+====== Chapter 5: Representing Code ======
 [[https://craftinginterpreters.com/representing-code.html|Chapter 5]] focuses more on concepts than code, so this section will not have many TLA⁺-specific modifications.
 However, since this tutorial is intended to be self-contained code-wise, all necessary code is reproduced.
-==== Section 5.1: Context-Free Grammars ====
+===== Section 5.1: Context-Free Grammars =====
 Everything before [[https://craftinginterpreters.com/representing-code.html#a-grammar-for-lox-expressions|Section 5.1.3: A Grammar for Lox expressions]] applies equally to TLA⁺.
 The ambiguous first-draft grammar for Lox expressions can be adapted to TLA⁺:
-<code>
+<code eiffel>
 expression     → literal
                | unary
                | binary
                | ternary
-               | set
+               | variadic
                | grouping ;
@@ Line 37: / Line 36: @@
 binary         → expression operator expression ;
 ternary        → "IF" expression "THEN" expression "ELSE" expression;
-set            → "{" ( expression ( "," expression )* )? "}"
+variadic       → "{" ( expression ( "," expression )* )? "}"
-operator       → "=" | "+" | "-" | ".." | "/\" | "\/" | "<"  | "\in" ;
+operator       → "=" | "+" | "-" | ".." | "<"  | "\in" ;
 </code>
 There are a few interesting differences.
 The ''unary'' rule now captures both prefix //and// suffix operators, which both only accept a single parameter.
 The ''ternary'' rule matches the ''IF''/''THEN''/''ELSE'' operator, with the three parameters being the predicate, the true branch, and the false branch.
-There is also the ''set'' rule matching finite set literals like ''{1, 2, 3}'' or the empty set ''{}''.
 The operators are changed to the set of operators defined in our TLA⁺ implementation.
 These are all the expressions we can use without introducing the concept of identifiers referring to something else.
-==== Section 5.2: Implementing Syntax Trees ====
+There is also the ''variadic'' rule matching finite set literals like ''{1, 2, 3}'' or the empty set ''{}''.
+This one is so named because it's where we'll put operators accepting varying numbers of parameters.
+It's kind of weird to think of the finite set literal ''{1, 2, 3}'' as an operator, but it is!
+The only difference between ''{1, 2, 3}'' and an operator like ''constructSet(1, 2, 3)'' is syntactic sugar.
+This is the perspective of a language implementer.
+Note that we //could// also include the conjunction ''/\'' and disjunction ''\/'' infix operators here but they have odd parsing interactions with vertically-aligned conjunction & disjunction lists, so we will handle them in a later chapter dedicated to that topic.
+===== Section 5.2: Implementing Syntax Trees =====
 While some programmers might have an aversion to generating code, the approach taken in the book is actually very convenient - as you will discover if you take some time to prototype & play around with different class representations of the parse tree!
@@ Line 53: / Line 59: @@
 <code java [highlight_lines_extra="19,20,21"]>
-package com.craftinginterpreters.tool;
+package tool;
 import java.io.IOException;
@@ Line 73: / Line 79: @@
       "Unary    : Token operator, Expr expr",
       "Ternary  : Token operator, Expr first, Expr second, Expr third",
-      "Set      : List<Expr> elements"
+      "Variadic : Token operator, List<Expr> parameters"
     ));
   }
@@ Line 88: / Line 94: @@
     PrintWriter writer = new PrintWriter(path, "UTF-8");
-    writer.println("package com.craftinginterpreters.tla;");
+    writer.println("package tla;");
     writer.println();
     writer.println("import java.util.List;");
@@ Line 138: / Line 144: @@
-==== Section 5.3: Working with Trees ====
+===== Section 5.3: Working with Trees =====
 [[https://craftinginterpreters.com/representing-code.html#working-with-trees|Section 5.3]] introduces the Visitor pattern.
@@ Line 195: / Line 201: @@
 Our generated syntax tree node types now support the visitor pattern!
-==== Section 5.4: A (Not Very) Pretty Printer ====
+===== Section 5.4: A (Not Very) Pretty Printer =====
 [[https://craftinginterpreters.com/representing-code.html#a-not-very-pretty-printer|Section 5.4]] provides an implementation of a visitor called ''AstPrinter'' that prints out the parse tree.
@@ Line 201: / Line 207: @@
 <code java [highlight_lines_extra="1"]>
-package com.craftinginterpreters.tla;
+package tla;
 class AstPrinter implements Expr.Visitor<String> {
@@ Line 242: / Line 248: @@
   @Override
-  public string visitSetExpr(Expr.Set expr) {
+  public string visitVariadicExpr(Expr.Variadic expr) {
-    return parenthesize("Set", expr.elements.toArray(Expr[]::new));
+    return parenthesize(expr.operator.lexeme,
+                        expr.parameters.toArray(Expr[]::new));
   }
 </code>
@@ Line 265: / Line 272: @@
 It isn't necessary to define a ''main'' method in the ''AstPrinter'' class, but if you'd like to try it out you are free to copy the one given in the book.
+====== Chapter 6: Parsing Expressions ======
+In [[https://craftinginterpreters.com/parsing-expressions.html|Chapter 6]], we finally build a parse tree out of our tokens!
+===== Section 6.1: Ambiguity and the Parsing Game =====
+First, as in [[https://craftinginterpreters.com/parsing-expressions.html#ambiguity-and-the-parsing-game|section 6.1 of the book]], we have to disambiguate our grammar.
+The way that precedence works in TLA⁺ is different from Lox, and indeed different from most other languages!
+One side-quote from this section of the book reads:
+>While not common these days, some languages specify that certain pairs of operators have no relative precedence. That makes it a syntax error to mix those operators in an expression without using explicit grouping.
+>Likewise, some operators are *non-associative*. That means it’s an error to use that operator more than once in a sequence. For example, Perl’s range operator isn’t associative, so a .. b is OK, but a .. b .. c is an error.
+TLA⁺ has both of these features!
+Instead of operators occupying a slot in some hierarchy of precedence, each operator has a precedence //range//.
+When the precedence ranges of two operators overlap it is a parse error.
+For example, the ''ENABLED'' prefix operator has precedence range 4-15, and the prime operator has precedence range 15-15; thus, the following expression has a precedence conflict and should not parse:
+<code>
+ENABLED x'
+</code>
+Users must add parentheses to indicate their desired grouping in the expression.
+Similarly, some operators like ''='' are not associative so ''a = b = c'' should be a parse error.
+Both of these factors combine to make our operator parsing code quite a bit different from that given in the book.
+Worry not, it can still be made terse and understandable!
+===== Section 6.2: Recursive Descent Parsing =====
+[[https://craftinginterpreters.com/parsing-expressions.html#recursive-descent-parsing|Section 6.2]] is where we start writing our parser in the recursive descent style.
+Recursive descent can seem a bit magical even if you're used to reasoning about recursive functions!
+Unfortunately TLA⁺ requires us to push this magic even further.
+We'll take it step by step.
+We start with the same basic ''Parser.java'' file as in the book, renaming ''lox'' to ''tla'' in the package name as usual.
+We also add an import for ''ArrayList'', which we will later make use of when parsing variadic operators:
+<code java [highlight_lines_extra="1,4,6"]>
+package tla;
+import java.util.List;
+import java.util.ArrayList;
+import static tla.TokenType.*;
+class Parser {
+  private final List<Token> tokens;
+  private int current = 0;
+  Parser(List<Token> tokens) {
+    this.tokens = tokens;
+  }
+}
+</code>
+Now we hit our first major difference.
+In Lox, precedence is given by a small hierarchy of named rules like ''equality'', ''comparison'', ''term'', etc.
+TLA⁺ is more complicated than that.
+The full language has around 100 operators spanning precedences 1 to 15!
+If we wanted to match the book's style we'd have to write a tower of recursive functions like:
+<code java>
+private Expr operatorExpressionPrec1() { ... }
+private Expr operatorExpressionPrec2() { ... }
+private Expr operatorExpressionPrec3() { ... }
+...
+private Expr operatorExpressionPrec15() { ... }
+</code>
+where each ''operatorExpressionPrecN()'' function parses all the prefix, infix, and postfix operators of precedence ''N'', and calls ''operatorExpressionPrecN+1()''.
+Life is too short for this.
+Instead, we'll adopt a technique alluded to in the text:
+>If you wanted to do some clever Java 8, you could create a helper method for parsing a left-associative series of binary operators given a list of token types, and an operand method handle to simplify this redundant code.
+Here's the skeleton of our operator parsing function; the trick is to make the precedence a parameter to the function instead of a component of the name.
+Add this code after the ''Parser()'' constructor:
+<code java>
+  private Expr expression() {
+    return operatorExpression(1);
+  }
+  private Expr operatorExpression(int prec) {
+    if (prec == 16) return primary();
+    Expr expr = operatorExpression(prec + 1);
+    return expr;
+  }
+</code>
+Before filling out ''operatorExpression()'', we'll add the helper methods; these form an incredibly well-designed parsing API and are unchanged from the book:
+<code java>
+  private boolean match(TokenType... types) {
+    for (TokenType type : types) {
+      if (check(type)) {
+        advance();
+        return true;
+      }
+    }
+    return false;
+  }
+  private boolean check(TokenType type) {
+    if (isAtEnd()) return false;
+    return peek().type == type;
+  }
+  private Token advance() {
+    if (!isAtEnd()) current++;
+    return previous();
+  }
+  private boolean isAtEnd() {
+    return peek().type == EOF;
+  }
+  private Token peek() {
+    return tokens.get(current);
+  }
+  private Token previous() {
+    return tokens.get(current - 1);
+  }
+</code>
+Now we have to define a table of operators with their details.
+First use the handy Java 17 [[https://openjdk.org/jeps/395|records]] feature to quickly define a new ''Operator'' dataclass; this will hold the attributes of the operators we want to parse.
+Put it at the top of the ''Parser'' class:
+<code java [highlight_lines_extra="2,3,4"]>
+class Parser {
+  private static enum Fix { PREFIX, INFIX, POSTFIX }
+  private static record Operator(Fix fix, TokenType token,
+      boolean assoc, int lowPrec, int highPrec) {}
+  private final List<Token> tokens;
+</code>
+You can find operator attributes on page 271 of //[[https://lamport.azurewebsites.net/tla/book.html|Specifying Systems]]// by Leslie Lamport, or [[https://github.com/tlaplus/tlaplus/blob/13e5a39b5368a6da4906b8ed1c2c1114d2e7de15/tlatools/org.lamport.tlatools/src/tla2sany/parser/Operators.java#L130-L234|this TLA⁺ tools source file]].
+We use a small subset of these operators.
+Record their attributes in a table in ''Parser.java'', below ''operatorExpression()'':
+<code java>
+  private static final Operator[] operators = new Operator[] {
+    new Operator(Fix.PREFIX,  NOT,        true,   4,  4 ),
+    new Operator(Fix.PREFIX,  ENABLED,    false,  4,  15),
+    new Operator(Fix.PREFIX,  MINUS,      true,   12, 12),
+    new Operator(Fix.INFIX,   IN,         false,  5,  5 ),
+    new Operator(Fix.INFIX,   EQUAL,      false,  5,  5 ),
+    new Operator(Fix.INFIX,   LESS_THAN,  false,  5,  5 ),
+    new Operator(Fix.INFIX,   DOT_DOT,    false,  9,  9 ),
+    new Operator(Fix.INFIX,   PLUS,       true,   10, 10),
+    new Operator(Fix.INFIX,   MINUS,      true,   11, 11),
+    new Operator(Fix.POSTFIX, PRIME,      false,  15, 15),
+  };
+</code>
+Here's something a bit odd.
+In TLA⁺, the infix minus operator (subtraction) has higher precedence at 11-11 than the infix plus operator (addition) at 10-10!
+In grade school you probably learned acronyms like PEMDAS or BEDMAS to remember the order of operations in arithmetic.
+Really, you learned parsing rules!
+Now when writing our own parsing algorithm we come to understand that the order of these operations is not inscribed in the bedrock of mathematical truth but instead is a simple convention of mathematical notation.
+TLA⁺ subverts this by parsing the expression ''a + b - c'' as ''a + (b - c)'' instead of the PEMDAS-style ''(a + b) - c''.
+While this design decision is unusual, it is [[https://groups.google.com/g/tlaplus/c/Bt1fl8GvizE/m/ohYTfQWiCgAJ|unlikely to cause any problems]].
+We need one more helper method before we can start working on ''operatorExpression()'': a superpowered ''match()'' method specific to operators, which will try to match any operators of a given fix type & (low) precedence.
+Add this code above ''match()'':
+<code java>
+  private Operator matchOp(Fix fix, int prec) {
+    for (Operator op : operators) {
+      if (op.fix == fix && op.lowPrec == prec) {
+        if (match(op.token)) return op;
+      }
+    }
+    return null;
+  }
+</code>
+Okay, we're all set for the main event!
+Here is how we modify ''operatorExpression()'' to parse infix operators.
+You can see a strong resemblance to the infix operator parsing code given in the book (new lines highlighted):
+<code java [highlight_lines_extra="4,7,8,9,10,11"]>
+  private Expr operatorExpression(int prec) {
+    if (prec == 16) return primary();
+    Operator op;
+    Expr expr = operatorExpression(prec + 1);
+    while ((op = matchOp(Fix.INFIX, prec)) != null) {
+      Token operator = previous();
+      Expr right = operatorExpression(op.highPrec + 1);
+      expr = new Expr.Binary(expr, operator, right);
+    }
+    return expr;
+  }
+</code>
+We use Java's combined conditional-assignment atop the ''while'' loop to make our code more terse, both checking whether any operators were matched and getting details of the matched operator if so.
+The interior of the ''while'' loop is largely identical to infix parsing logic from the book, except for when we recurse to a higher precedence level: in the book the code recurses one level above the matched operator's precedence, but here we go one level above the //upper bound// of the matched operator's precedence.
+It takes a bit of thinking to understand why this works for parsing expressions according to the precedence rules we want, but it does.
+Take a minute to ponder it.
+If you still don't get it, wait until we have a fully-functional expression parser then play around with this line to see how it changes the behavior.
+Our code implements precedence ranges, but assumes all infix operators are associative.
+We need to modify the loop to return immediately if the infix operator is not associative:
+<code java [highlight_lines_extra="5"]>
+    while ((op = matchOp(Fix.INFIX, prec)) != null) {
+      Token operator = previous();
+      Expr right = operatorExpression(op.highPrec + 1);
+      expr = new Expr.Binary(expr, operator, right);
+      if (!op.assoc) return expr;
+    }
+</code>
+And that's how we parse infix operators!
+That was the most complicated case, so let's move on to prefix operators.
+Again our code resembles the prefix operator parsing logic given in the book.
+Add this snippet near the top of ''operatorExpression()'':
+<code java [highlight_lines_extra="5,6,7,8,9,10"]>
+  private Expr operatorExpression(int prec) {
+    if (prec == 16) return primary();
+    Operator op;
+    if ((op = matchOp(Fix.PREFIX, prec)) != null) {
+      Token opToken = previous();
+      Expr expr = operatorExpression(
+                    op.assoc ? op.lowPrec : op.highPrec + 1);
+      return new Expr.Unary(opToken, expr);
+    }
+    Expr expr = operatorExpression(prec + 1);
+</code>
+Of note is the expression ''op.assoc ? op.lowPrec : op.highPrec + 1'' controlling the precedence level at which we recurse.
+To understand this expression, it is helpful to consider an example.
+The negative prefix operator ''-1'' is associative, so we should be able to parse the expression ''--1'' as ''-(-1)''.
+In order to do that after consuming the first ''-'' we have to recurse into ''operatorExpression()'' //at the same precedence level//.
+Then we will consume the second ''-'', then again recurse and ultimately consume ''1'' in our yet-to-be-defined ''primary'' method.
+In contrast, the prefix operator ''ENABLED'' is not associative and has range 4-15.
+In that case we want to reject the expression ''ENABLED ENABLED TRUE'', so we recurse at a level higher than ''ENABLED'''s upper precedence bound.
+Thus the second ''ENABLED'' will not be matched and we will report a parse error using methods defined in chapter section 6.3.
+All that remains is to parse postfix operators.
+These are not covered in the book, but they pretty much resemble infix operators without matching an expression on the right-hand side of the operator.
+Add the highlighted code below the ''INFIX'' logic in ''operatorExpression()'':
+<code java [highlight_lines_extra="3,4,5,6,7"]>
+    }
+    while ((op = matchOp(Fix.POSTFIX, prec)) != null) {
+      Token opToken = previous();
+      expr = new Expr.Unary(opToken, expr);
+      if (!op.assoc) break;
+    }
+    return expr
+  }
+</code>
+And that completes our operator parsing logic!
+All that remains is our ''primary()'' method.
+It is fairly similar to that given in the book, with some omissions since our minimal TLA⁺ subset does not contain strings or null values.
+Add this code after ''operatorExpression()'':
+<code java>
+  private Expr primary() {
+    if (match(FALSE)) return new Expr.Literal(false);
+    if (match(TRUE)) return new Expr.Literal(true);
+    if (match(NUMBER)) {
+      return new Expr.Literal(previous().literal);
+    }
+    if (match(LEFT_PAREN)) {
+      Expr expr = expression();
+      consume(RIGHT_PAREN, "Expect ')' after expression.");
+      return new Expr.Grouping(expr);
+    }
+  }
+</code>
+We have two final additions to ''primary()'', the ''IF''/''THEN''/''ELSE'' operator and the set construction operator.
+Here's how we parse ''IF''/''THEN''/''ELSE''; add the highlighted code to the bottom of ''primary()'':
+<code java [highlight_lines_extra="4,5,6,7,8,9,10,11,12"]>
+      return new Expr.Grouping(expr);
+    }
+    if (match(IF)) {
+      Token operator = previous();
+      Expr condition = expression();
+      consume(THEN, "'THEN' required after 'IF' expression.");
+      Expr yes = expression();
+      consume(ELSE, "'ELSE' required after 'THEN' expression.");
+      Expr no = expression();
+      return new Expr.Ternary(operator, condition, yes, no);
+    }
+  }
+</code>
+And here's how we parse the set construction operator.
+Again add the highlighted code to the bottom of ''primary()'':
+<code java [highlight_lines_extra="4,5,6,7,8,9,10,11,12,13,14"]>
+      return new Expr.Ternary(operator, condition, yes, no);
+    }
+    if (match(LEFT_BRACE)) {
+      Token operator = previous();
+      List<Expr> elements = new ArrayList<>();
+      if (!check(RIGHT_BRACE)) {
+        do {
+          elements.add(expression());
+        } while (match(COMMA));
+      }
+      consume(RIGHT_BRACE, "'}' is required to terminate finite set literal.");
+      return new Expr.Variadic(operator, elements);
+    }
+  }
+</code>
+This will handle expressions like ''{1,2,3}'' and (crucially) ''{}''.
+The ''consume()'' operator is defined in the next section, along with error reporting generally.
+===== Section 6.3: Syntax Errors =====
+We don't need to modify most of the error reporting code given in [[https://craftinginterpreters.com/parsing-expressions.html#syntax-errors|Section 6.3]].
+Here it is; add the ''consume()'' method after ''match()'' in the ''Parser'' class:
+<code java>
+  private Token consume(TokenType type, String message) {
+    if (check(type)) return advance();
+    throw error(peek(), message);
+  }
+</code>
+Then add the ''error()'' method after ''previous()'':
+<code java>
+  private ParseError error(Token token, String message) {
+    Lox.error(token, message);
+    return new ParseError();
+  }
+</code>
+In the ''TlaPlus'' class, add the ''error()'' method after ''report()'':
+<code java>
+  static void error(Token token, String message) {
+    if (token.type == TokenType.EOF) {
+      report(token.line, " at end", message);
+    } else {
+      report(token.line, " at '" + token.lexeme + "'", message);
+    }
+  }
+</code>
+Add the ''ParseError'' class definition at the top of the ''Parser'' class:
+<code java [highlight_lines_extra="5"]>
+class Parser {
+  private static enum Fix { PREFIX, INFIX, POSTFIX }
+  private static record Operator(Fix fix, TokenType token,
+      boolean assoc, int lowPrec, int highPrec) {}
+  private static class ParseError extends RuntimeException {}
+  private final List<Token> tokens;
+</code>
+One piece of code we //do// need to modify is the ''synchronization()'' method.
+TLA⁺ doesn't have statements separated by semicolons, so our best choice of synchronization point is the ''EQUAL_EQUAL'' symbol denoting an operator definition; add this in the ''Parser'' class after ''error()'':
+<code java>
+  private void synchronize() {
+    advance();
+    while(!isAtEnd()) {
+      if (previous().type == EQUAL_EQUAL) return;
+      advance();
+    }
+  }
+</code>
+Fully-featured TLA⁺ parsers generally look ahead to identify the start of a new operator definition (not trivial!) and insert a synthetic zero-width token for the parser to consume.
+For an example, see SANY's very complicated ''[[https://github.com/tlaplus/tlaplus/blob/13e5a39b5368a6da4906b8ed1c2c1114d2e7de15/tlatools/org.lamport.tlatools/javacc/tla%2B.jj#L229-L392|belchDEF()]]'' method.
+We won't be doing that here, but it's a good technique to know about; looking ahead then inserting synthetic tokens to disambiguate syntax is a common method when parsing complicated & ambiguous languages like TLA⁺.
+===== Section 6.4: Wiring up the Parser =====
+In [[https://craftinginterpreters.com/parsing-expressions.html#wiring-up-the-parser|section 6.4]] we finally arrive at the point where you can parse TLA⁺ expressions in your REPL.
+In the ''run()'' method of the ''TlaPlus'' class, delete the code printing out the scanned tokens (or not, it can be informative to keep around) and replace it with this:
+<code java [highlight_lines_extra="3,4,6,7,9"]>
+    List<Token> tokens = scanner.scanTokens();
+    Parser parser = new Parser(tokens);
+    Expr expression = parser.parse();
+    // Stop if there was a syntax error.
+    if (hadError) return;
+    System.out.println(new AstPrinter().print(expression));
+  }
+</code>
+Fire up the interpreter and parse a TLA⁺ expression!
+<code>
+> {1 + 2, IF TRUE THEN 3 ELSE 4, {}}
+({ (+ 1 2) (IF true 3 4) ({))
+</code>
+If you got out of sync, you can find a snapshot of the expected state of the code in [[https://github.com/tlaplus/devkit/tree/main/3-expressions|this repo directory]].
+Next tutorial: [[creating:evaluation|Evaluating Constant TLA⁺ Expressions]]!
+====== Challenges ======
+Here are some optional challenges to flesh out your TLA⁺ interpreter, roughly ranked from simplest to most difficult.
+[[creating:scanning|< Previous Page]] | [[creating:start#table_of_contents|Table of Contents]] | [[creating:evaluation|Next Page >]]