creating:expressions

This is an old revision of the document!


Handling Constant TLA⁺ Expressions

This tutorial page covers the next three chapters in Crafting Interpreters:

Same as the book, we could build a parser for our entire minimal TLA⁺ language subset before moving on to interpreting it, but that would be boring! Instead we focus on a simple vertical slice of the language: expressions. And not just any expressions, constant expressions - expressions that do not contain variables or identifiers that we would have to resolve. Just primitive literal values stuck together, not too much more advanced than a simple calculator app. This will give us a skeleton on which to hang the rest of the language.

Each section in this tutorial corresponds to one or more sections of Crafting Interpreters (henceforth referred to as "the book"). First read the section of the book, then read the corresponding commentary & modifications given by this tutorial.

Chapter 5 focuses more on concepts than code, so this section will not have many TLA⁺-specific modifications. However, since this tutorial is intended to be self-contained code-wise, all necessary code is reproduced.

Everything before Section 5.1.3: A Grammar for Lox expressions applies equally to TLA⁺. The ambiguous first-draft grammar for Lox expressions can be adapted to TLA⁺:

expression     → literal
               | unary
               | binary
               | ternary
               | variadic
               | grouping ;
 
literal        → NUMBER | "TRUE" | "FALSE" ;
grouping       → "(" expression ")" ;
unary          → (( "ENABLED" | "-" ) expression ) | ( expression "'" ) ;
binary         → expression operator expression ;
ternary        → "IF" expression "THEN" expression "ELSE" expression;
variadic       → "{" ( expression ( "," expression )* )? "}"
operator       → "=" | "+" | "-" | ".." | "/\" | "\/" | "<"  | "\in" ;

There are a few interesting differences. The unary rule now captures both prefix and suffix operators, which both only accept a single parameter. The ternary rule matches the IF/THEN/ELSE operator, with the three parameters being the predicate, the true branch, and the false branch. The operators are changed to the set of operators defined in our TLA⁺ implementation. These are all the expressions we can use without introducing the concept of identifiers referring to something else.

There is also the variadic rule matching finite set literals like {1, 2, 3} or the empty set {}. This one is so named because it's where we'll put operators accepting varying numbers of parameters. It's kind of weird to think of the finite set literal {1, 2, 3} as an operator, but it is! The only difference between {1, 2, 3} and an operator like constructSet(1, 2, 3) is syntactic sugar. This is the perspective of a language implementer. Later on we will extend the definition of variadic to include vertically-aligned conjunction & disjunction lists.

While some programmers might have an aversion to generating code, the approach taken in the book is actually very convenient - as you will discover if you take some time to prototype & play around with different class representations of the parse tree! In Section 5.2.2: Metaprogramming the trees, the main differences in the GenerateAst class reflect our modification of the Lox grammar in the previous section:

package com.craftinginterpreters.tool;
 
import java.io.IOException;
import java.io.PrintWriter;
import java.util.Arrays;
import java.util.List;
 
public class GenerateAst {
  public static void main(String[] args) throws IOException {
    if (args.length != 1) {
      System.err.println("Usage: generate_ast <output directory>");
      System.exit(64);
    }
    String outputDir = args[0];
    defineAst(outputDir, "Expr", Arrays.asList(
      "Binary   : Expr left, Token operator, Expr right",
      "Grouping : Expr expression",
      "Literal  : Object value",
      "Unary    : Token operator, Expr expr",
      "Ternary  : Token operator, Expr first, Expr second, Expr third",
      "Variadic : Token operator, List<Expr> parameters"
    ));
  }
}

In the defineAst function, we only need to modify the output code so that it uses the package .tla instead of .lox; add this after the main method of GenerateAst:

  private static void defineAst(
      String outputDir, String baseName, List<String> types)
      throws IOException {
    String path = outputDir + "/" + baseName + ".java";
    PrintWriter writer = new PrintWriter(path, "UTF-8");
 
    writer.println("package com.craftinginterpreters.tla;");
    writer.println();
    writer.println("import java.util.List;");
    writer.println();
    writer.println("abstract class " + baseName + " {");
 
 
    // The AST classes.
    for (String type : types) {
      String className = type.split(":")[0].trim();
      String fields = type.split(":")[1].trim();
      defineType(writer, baseName, className, fields);
    }
    writer.println("}");
    writer.close();
  }

Finally, the defineType method - added after the defineAst method - is unchanged:

  private static void defineType(
      PrintWriter writer, String baseName,
      String className, String fieldList) {
    writer.println("  static class " + className + " extends " +
        baseName + " {");
 
    // Constructor.
    writer.println("    " + className + "(" + fieldList + ") {");
 
    // Store parameters in fields.
    String[] fields = fieldList.split(", ");
    for (String field : fields) {
      String name = field.split(" ")[1];
      writer.println("      this." + name + " = " + name + ";");
    }
 
    writer.println("    }");
 
    // Fields.
    writer.println();
    for (String field : fields) {
      writer.println("    final " + field + ";");
    }
 
    writer.println("  }");
  }

Section 5.3 introduces the Visitor pattern. No TLA⁺-specific differences are necessary when modifying GenerateAst to support it. Insert this line in defineAst():

    writer.println("abstract class " + baseName + " {");
 
    defineVisitor(writer, baseName, types);
 
    // The AST classes.

This calls the defineVisitor function which writes the visitor interface, defined as follows:

  private static void defineVisitor(
      PrintWriter writer, String baseName, List<String> types) {
    writer.println("  interface Visitor<R> {");
 
    for (String type : types) {
      String typeName = type.split(":")[0].trim();
      writer.println("    R visit" + typeName + baseName + "(" +
          typeName + " " + baseName.toLowerCase() + ");");
    }
 
    writer.println("  }");
  }

Again insert some lines in defineAst() to create the abstract accept method:

      defineType(writer, baseName, className, fields);
    }
 
    // The base accept() method.
    writer.println();
    writer.println("  abstract <R> R accept(Visitor<R> visitor);");
 
    writer.println("}");

Finally, insert some lines in defineType to add a types-specific accept method in each output class:

    writer.println("    }");
 
    // Visitor pattern.
    writer.println();
    writer.println("    @Override");
    writer.println("    <R> R accept(Visitor<R> visitor) {");
    writer.println("      return visitor.visit" +
        className + baseName + "(this);");
    writer.println("    }");
 
    // Fields.

Our generated syntax tree node types now support the visitor pattern!

Section 5.4 provides an implementation of a visitor called AstPrinter that prints out the parse tree. There are a few TLA⁺-specific modifications, starting of course with the package name:

package com.craftinginterpreters.tla;
 
class AstPrinter implements Expr.Visitor<String> {
  String print(Expr expr) {
    return expr.accept(this);
  }
}

We also have some modifications and additions to the visit methods of the AstPrinter class, again reflecting our modified TLA⁺-specific grammar:

  @Override
  public String visitBinaryExpr(Expr.Binary expr) {
    return parenthesize(expr.operator.lexeme,
                        expr.left, expr.right);
  }
 
  @Override
  public String visitGroupingExpr(Expr.Grouping expr) {
    return parenthesize("group", expr.expression);
  }
 
  @Override
  public String visitLiteralExpr(Expr.Literal expr) {
    if (expr.value == null) return "nil";
    return expr.value.toString();
  }
 
  @Override
  public String visitUnaryExpr(Expr.Unary expr) {
    return parenthesize(expr.operator.lexeme, expr.expr);
  }
 
  @Override
  public string visitTernaryExpr(Expr.Ternary expr) {
    return parenthesize(expr.operator.lexeme, expr.first,
                        expr.second, expr.third);
  }
 
  @Override
  public string visitVariadicExpr(Expr.Variadic expr) {
    return parenthesize(expr.operator.lexeme,
                        expr.parameters.toArray(Expr[]::new));
  }

The parenthesize method is unchanged from the book and should be inserted after the visit methods:

  private String parenthesize(String name, Expr... exprs) {
    StringBuilder builder = new StringBuilder();
 
    builder.append("(").append(name);
    for (Expr expr : exprs) {
      builder.append(" ");
      builder.append(expr.accept(this));
    }
    builder.append(")");
 
    return builder.toString();
  }

It isn't necessary to define a main method in the AstPrinter class, but if you'd like to try it out you are free to copy the one given in the book.

In Chapter 6, we finally build a parse tree out of our tokens!

First, as in section 6.1 of the book, we have to disambiguate our grammar. The way that precedence works in TLA⁺ is different from Lox, and indeed different from most other languages! One side-quote from this section of the book reads:

While not common these days, some languages specify that certain pairs of operators have no relative precedence. That makes it a syntax error to mix those operators in an expression without using explicit grouping.
Likewise, some operators are *non-associative*. That means it’s an error to use that operator more than once in a sequence. For example, Perl’s range operator isn’t associative, so a .. b is OK, but a .. b .. c is an error.

TLA⁺ has both of these features! Instead of operators occupying a slot in some hierarchy of precedence, each operator has a precedence range. When the precedence ranges of two operators overlap it is a parse error. For example, the ENABLED prefix operator has precedence range 4-15, and the prime operator has precedence range 15-15; thus, the following expression has a precedence conflict and should not parse:

ENABLED x'

Users must add parentheses to indicate their desired grouping in the expression. Similarly, some operators like = are not associative so a = b = c should be a parse error. Both of these factors combine to make our operator parsing code quite a bit different from that given in the book. Worry not, it can still be made terse and understandable!

Section 6.2 is where we start writing our parser in the recursive descent style. Recursive descent can seem a bit magical even if you're used to reasoning about recursive functions! Unfortunately TLA⁺ requires us to push this magic even further. Worry not, we'll take it step by step.

We start with the same basic Parser.java file as in the book, only renaming lox to tla in the package name as usual:

package com.craftinginterpreters.tla;
 
import java.util.List;
 
import static com.craftinginterpreters.tla.TokenType.*;
 
class Parser {
  private final List<Token> tokens;
  private int current = 0;
 
  Parser(List<Token> tokens) {
    this.tokens = tokens;
  }
}

Now we hit our first major difference. In Lox, precedence is given by a small hierarchy of named rules like equality, comparison, term, etc. TLA⁺ is more complicated than that. The full language has around 100 operators spanning precedences 1 to 15! If we wanted to match the book's style we'd have to write a tower of recursive functions like:

private Expr operatorExpressionPrec1() { ... }
private Expr operatorExpressionPrec2() { ... }
private Expr operatorExpressionPrec3() { ... }
...
private Expr operatorExpressionPrec15() { ... }

where each operatorExpressionPrecN function parses all the prefix, infix, and postfix operators of precedence N, and calls operatorExpressionPrecN+1. Life is too short for this. Instead, we'll adopt a technique alluded to in the text:

If you wanted to do some clever Java 8, you could create a helper method for parsing a left-associative series of binary operators given a list of token types, and an operand method handle to simplify this redundant code.

Here's the skeleton of our operator parsing function; the trick is to make the precedence a parameter to the function instead of a component of the name. Add this code after the Parser constructor:

  private Expr expression() {
    return operatorExpression(1);
  }
 
  private Expr operatorExpression(int prec) {
    if (prec == 16) return primary();
 
    Expr expr = operatorExpression(prec + 1);
 
    return expr;
  }

Before filling out operatorExpression, we'll add the helper methods; these form an incredibly well-designed parsing API and are unchanged from the book:

  private boolean match(TokenType... types) {
    for (TokenType type : types) {
      if (check(type)) {
        advance();
        return true;
      }
    }
 
    return false;
  }
 
  private boolean check(TokenType type) {
    if (isAtEnd()) return false;
    return peek().type == type;
  }
 
  private Token advance() {
    if (!isAtEnd()) current++;
    return previous();
  }
 
  private boolean isAtEnd() {
    return peek().type == EOF;
  }
 
  private Token peek() {
    return tokens.get(current);
  }
 
  private Token previous() {
    return tokens.get(current - 1);
  }

Now we have to define a table of operators with their details. For this, create a new file Operator.java containing a class recording each operator's fix type, token type, associativity, and precedence range:

package com.craftinginterpreters.tla;
 
enum Fix {
  PREFIX, INFIX, POSTFIX
}
 
class Operator {
  final Fix fix;
  final TokenType token;
  final boolean assoc;
  final int lowPrec;
  final int highPrec;
 
  public Operator(Fix fix, TokenType token, boolean assoc,
                  int lowPrec, int highPrec) {
    this.fix = fix;
    this.token = token;
    this.assoc = assoc;
    this.lowPrec = lowPrec;
    this.highPrec = highPrec;
  }
}

For convenience, import the Fix enum values in Parser.java so they can be referenced directly:

package com.craftinginterpreters.tla;
 
import java.util.List;
import java.util.ArrayList;
 
import static com.craftinginterpreters.tla.TokenType.*;
import static com.craftinginterpreters.tla.Fix.*;
 
class Parser {

You can find operator attributes on page 271 of Specifying Systems by Leslie Lamport, or this TLA⁺ tools source file. We use a small subset of these operators. Record their attributes in a table in Parser.java, below the operatorExpression method:

  private static final Operator[] operators = new Operator[] {
    new Operator(PREFIX,  NEGATION,   true,   4,  4 ),
    new Operator(PREFIX,  ENABLED,    false,  4,  15),
    new Operator(PREFIX,  MINUS,      true,   12, 12),
    new Operator(INFIX,   AND,        true,   3,  3 ),
    new Operator(INFIX,   OR,         true,   3,  3 ),
    new Operator(INFIX,   IN,         false,  5,  5 ),
    new Operator(INFIX,   EQUAL,      false,  5,  5 ),
    new Operator(INFIX,   LESS_THAN,  false,  5,  5 ),
    new Operator(INFIX,   DOT_DOT,    false,  9,  9 ),
    new Operator(INFIX,   PLUS,       true,   10, 10),
    new Operator(INFIX,   MINUS,      true,   11, 11),
    new Operator(POSTFIX, PRIME,      false,  15, 15),
  };

Here's something a bit odd. In TLA⁺, the infix minus operator (subtraction) has higher precedence at 11-11 than the infix plus operator (addition) at 10-10! In grade school you probably learned acronyms like PEMDAS or BEDMAS to remember the order of operations in arithmetic. Really, you learned parsing rules! Now when writing our own parsing algorithm we come to understand that the order of these operations is not inscribed in the bedrock of mathematical truth but instead is a simple convention of mathematical notation. TLA⁺ subverts this by parsing the expression a + b - c as a + (b - c) instead of the PEMDAS-style (a + b) - c. While this design decision is unusual, it is unlikely to cause any problems.

We need one more helper method before we can start working on operatorExpression: a superpowered match method specific to operators, which will try to match any operators of a given fix type & (low) precedence. Add this code above match:

  private Operator matchOp(Fix fix, int prec) {
    for (Operator op : operators) {
      if (op.fix == fix && op.lowPrec == prec) {
        if (match(op.token)) return op;
      }
    }
 
    return null;
  }

Okay, we're all set for the main event! Here is how we modify our operatorExpression method to parse infix operators. You can see a strong resemblance to the infix operator parsing code given in the book (new lines highlighted):

  private Expr operatorExpression(int prec) {
    if (prec == 16) return primary();
 
    Operator op;
 
    Expr expr = operatorExpression(prec + 1);
    while ((op = matchOp(INFIX, prec)) != null) {
      Token operator = previous();
      Expr right = operatorExpression(op.highPrec + 1);
      expr = new Expr.Binary(expr, operator, right);
    }
 
    return expr;
  }

We use Java's combined conditional-assignment atop the while loop to make our code more terse, both checking whether any operators were matched and getting details of the matched operator if so. The interior of the while loop is largely identical to infix parsing logic from the book, except for when we recurse to a higher precedence level: we go directly past the upper bound of the precedence range. It takes a bit of thinking to understand why this works for parsing expressions according to the precedence we want, but it does. Take a minute to ponder it. If you still don't get it, wait until we have a fully-functional expression parser then play around with this line to see how it changes the behavior.

This code implements precedence ranges, but assumes all infix operators are associative. We need to modify the loop to break if the infix operator is not associative:

    while ((op = matchOp(INFIX, prec)) != null) {
      Token operator = previous();
      Expr right = operatorExpression(op.highPrec + 1);
      expr = new Expr.Binary(expr, operator, right);
      if (!op.assoc) break;
    }

And that's how we parse infix operators! That was the most complicated case, so let's move on to prefix operators. Again our code resembles the prefix operator parsing logic given in the book. Add this snippet near the top of the operatorExpression method:

  private Expr operatorExpression(int prec) {
    if (prec == 16) return primary();
 
    Operator op;
    if ((op = matchOp(PREFIX, prec)) != null) {
      Token opToken = previous();
      Expr expr = operatorExpression(
                  op.assoc ? op.lowPrec : op.highPrec + 1);
      return new Expr.Unary(opToken, expr);
    }
 
    Expr expr = operatorExpression(prec + 1);

Of note is the expression op.assoc ? op.lowPrec : op.highPrec + 1 controlling the precedence level at which we recurse. To understand this expression, it is helpful to consider an example. The negative prefix operator -1 is associative, so we should be able to parse the expression --1 as -(-1). In order to do that after consuming the first - we have to recurse into operatorExpression at the same precedence level. Then we will consume the second -, then again recurse and ultimately consume 1 in our yet-to-be-defined primary method. In contrast, the prefix operator ENABLED is not associative and has range 4-15. In that case we want to reject the expression ENABLED ENABLED TRUE, so we recurse at a level higher than ENABLED's upper precedence bound. Thus the second ENABLED will not be matched and we will report a parse error using methods defined in chapter section 6.3.

All that remains is to parse postfix operators. These are not covered in the book, but they pretty much resemble infix operators without matching an expression on the right-hand side of the operator. Add the highlighted code below the INFIX logic in operatorExpression:

    }
 
    while ((op = matchOp(POSTFIX, prec)) != null) {
      Token opToken = previous();
      expr = new Expr.Unary(opToken, expr);
      if (!op.assoc) break;
    }
 
    return expr
  }

And that completes our operator parsing logic!

  • creating/expressions.1744394335.txt.gz
  • Last modified: 2025/04/11 17:58
  • by ahelwer