This is an old revision of the document!
Handling Constant TLA⁺ Expressions
This tutorial page covers the next two chapters in Crafting Interpreters:
Same as the book, we could build a parser for our entire minimal TLA⁺ language subset before moving on to interpreting it, but that would be boring! Instead we focus on a simple vertical slice of the language: expressions. And not just any expressions, constant expressions - expressions that do not contain variables or identifiers that we would have to resolve. Just primitive literal values stuck together, not too much more advanced than a simple calculator app. This will give us a skeleton on which to hang the rest of the language.
Each section in this tutorial corresponds to one or more sections of Crafting Interpreters (henceforth referred to as "the book"). First read the section of the book, then read the corresponding commentary & modifications given by this tutorial.
Chapter 5: Representing Code
Chapter 5 focuses more on concepts than code, so this section will not have many TLA⁺-specific modifications. However, since this tutorial is intended to be self-contained code-wise, all necessary code is reproduced.
Section 5.1: Context-Free Grammars
Everything before Section 5.1.3: A Grammar for Lox expressions applies equally to TLA⁺. The ambiguous first-draft grammar for Lox expressions can be adapted to TLA⁺:
expression → literal | unary | binary | ternary | variadic | grouping ; literal → NUMBER | "TRUE" | "FALSE" ; grouping → "(" expression ")" ; unary → (( "ENABLED" | "-" ) expression ) | ( expression "'" ) ; binary → expression operator expression ; ternary → "IF" expression "THEN" expression "ELSE" expression; variadic → "{" ( expression ( "," expression )* )? "}" operator → "=" | "+" | "-" | ".." | "/\" | "\/" | "<" | "\in" ;
There are a few interesting differences.
The unary
rule now captures both prefix and suffix operators, which both only accept a single parameter.
The ternary
rule matches the IF
/THEN
/ELSE
operator, with the three parameters being the predicate, the true branch, and the false branch.
The operators are changed to the set of operators defined in our TLA⁺ implementation.
These are all the expressions we can use without introducing the concept of identifiers referring to something else.
There is also the variadic
rule matching finite set literals like {1, 2, 3}
or the empty set {}
.
This one is so named because it's where we'll put operators accepting varying numbers of parameters.
It's kind of weird to think of the finite set literal {1, 2, 3}
as an operator, but it is!
The only difference between {1, 2, 3}
and an operator like constructSet(1, 2, 3)
is syntactic sugar.
This is the perspective of a language implementer.
Later on we will extend the definition of variadic
to include vertically-aligned conjunction & disjunction lists.
Section 5.2: Implementing Syntax Trees
While some programmers might have an aversion to generating code, the approach taken in the book is actually very convenient - as you will discover if you take some time to prototype & play around with different class representations of the parse tree!
In Section 5.2.2: Metaprogramming the trees, the main differences in the GenerateAst
class reflect our modification of the Lox grammar in the previous section:
package tool; import java.io.IOException; import java.io.PrintWriter; import java.util.Arrays; import java.util.List; public class GenerateAst { public static void main(String[] args) throws IOException { if (args.length != 1) { System.err.println("Usage: generate_ast <output directory>"); System.exit(64); } String outputDir = args[0]; defineAst(outputDir, "Expr", Arrays.asList( "Binary : Expr left, Token operator, Expr right", "Grouping : Expr expression", "Literal : Object value", "Unary : Token operator, Expr expr", "Ternary : Token operator, Expr first, Expr second, Expr third", "Variadic : Token operator, List<Expr> parameters" )); } }
In the defineAst
function, we only need to modify the output code so that it uses the package .tla
instead of .lox
; add this after the main
method of GenerateAst
:
private static void defineAst( String outputDir, String baseName, List<String> types) throws IOException { String path = outputDir + "/" + baseName + ".java"; PrintWriter writer = new PrintWriter(path, "UTF-8"); writer.println("package tla;"); writer.println(); writer.println("import java.util.List;"); writer.println(); writer.println("abstract class " + baseName + " {"); // The AST classes. for (String type : types) { String className = type.split(":")[0].trim(); String fields = type.split(":")[1].trim(); defineType(writer, baseName, className, fields); } writer.println("}"); writer.close(); }
Finally, the defineType
method - added after the defineAst
method - is unchanged:
private static void defineType( PrintWriter writer, String baseName, String className, String fieldList) { writer.println(" static class " + className + " extends " + baseName + " {"); // Constructor. writer.println(" " + className + "(" + fieldList + ") {"); // Store parameters in fields. String[] fields = fieldList.split(", "); for (String field : fields) { String name = field.split(" ")[1]; writer.println(" this." + name + " = " + name + ";"); } writer.println(" }"); // Fields. writer.println(); for (String field : fields) { writer.println(" final " + field + ";"); } writer.println(" }"); }
Section 5.3: Working with Trees
Section 5.3 introduces the Visitor pattern.
No TLA⁺-specific differences are necessary when modifying GenerateAst
to support it.
Insert this line in defineAst()
:
writer.println("abstract class " + baseName + " {"); defineVisitor(writer, baseName, types); // The AST classes.
This calls the defineVisitor
function which writes the visitor interface, defined as follows:
private static void defineVisitor( PrintWriter writer, String baseName, List<String> types) { writer.println(" interface Visitor<R> {"); for (String type : types) { String typeName = type.split(":")[0].trim(); writer.println(" R visit" + typeName + baseName + "(" + typeName + " " + baseName.toLowerCase() + ");"); } writer.println(" }"); }
Again insert some lines in defineAst()
to create the abstract accept
method:
defineType(writer, baseName, className, fields); } // The base accept() method. writer.println(); writer.println(" abstract <R> R accept(Visitor<R> visitor);"); writer.println("}");
Finally, insert some lines in defineType
to add a types-specific accept
method in each output class:
writer.println(" }"); // Visitor pattern. writer.println(); writer.println(" @Override"); writer.println(" <R> R accept(Visitor<R> visitor) {"); writer.println(" return visitor.visit" + className + baseName + "(this);"); writer.println(" }"); // Fields.
Our generated syntax tree node types now support the visitor pattern!
Section 5.4: A (Not Very) Pretty Printer
Section 5.4 provides an implementation of a visitor called AstPrinter
that prints out the parse tree.
There are a few TLA⁺-specific modifications, starting of course with the package name:
package tla; class AstPrinter implements Expr.Visitor<String> { String print(Expr expr) { return expr.accept(this); } }
We also have some modifications and additions to the visit
methods of the AstPrinter
class, again reflecting our modified TLA⁺-specific grammar:
@Override public String visitBinaryExpr(Expr.Binary expr) { return parenthesize(expr.operator.lexeme, expr.left, expr.right); } @Override public String visitGroupingExpr(Expr.Grouping expr) { return parenthesize("group", expr.expression); } @Override public String visitLiteralExpr(Expr.Literal expr) { if (expr.value == null) return "nil"; return expr.value.toString(); } @Override public String visitUnaryExpr(Expr.Unary expr) { return parenthesize(expr.operator.lexeme, expr.expr); } @Override public string visitTernaryExpr(Expr.Ternary expr) { return parenthesize(expr.operator.lexeme, expr.first, expr.second, expr.third); } @Override public string visitVariadicExpr(Expr.Variadic expr) { return parenthesize(expr.operator.lexeme, expr.parameters.toArray(Expr[]::new)); }
The parenthesize
method is unchanged from the book and should be inserted after the visit
methods:
private String parenthesize(String name, Expr... exprs) { StringBuilder builder = new StringBuilder(); builder.append("(").append(name); for (Expr expr : exprs) { builder.append(" "); builder.append(expr.accept(this)); } builder.append(")"); return builder.toString(); }
It isn't necessary to define a main
method in the AstPrinter
class, but if you'd like to try it out you are free to copy the one given in the book.
Chapter 6: Parsing Expressions
In Chapter 6, we finally build a parse tree out of our tokens!
Section 6.1: Ambiguity and the Parsing Game
First, as in section 6.1 of the book, we have to disambiguate our grammar. The way that precedence works in TLA⁺ is different from Lox, and indeed different from most other languages! One side-quote from this section of the book reads:
While not common these days, some languages specify that certain pairs of operators have no relative precedence. That makes it a syntax error to mix those operators in an expression without using explicit grouping.
Likewise, some operators are *non-associative*. That means it’s an error to use that operator more than once in a sequence. For example, Perl’s range operator isn’t associative, so a .. b is OK, but a .. b .. c is an error.
TLA⁺ has both of these features!
Instead of operators occupying a slot in some hierarchy of precedence, each operator has a precedence range.
When the precedence ranges of two operators overlap it is a parse error.
For example, the ENABLED
prefix operator has precedence range 4-15, and the prime operator has precedence range 15-15; thus, the following expression has a precedence conflict and should not parse:
ENABLED x'
Users must add parentheses to indicate their desired grouping in the expression.
Similarly, some operators like =
are not associative so a = b = c
should be a parse error.
Both of these factors combine to make our operator parsing code quite a bit different from that given in the book.
Worry not, it can still be made terse and understandable!
Section 6.2: Recursive Descent Parsing
Section 6.2 is where we start writing our parser in the recursive descent style. Recursive descent can seem a bit magical even if you're used to reasoning about recursive functions! Unfortunately TLA⁺ requires us to push this magic even further. We'll take it step by step.
We start with the same basic Parser.java
file as in the book, renaming lox
to tla
in the package name as usual.
We also add an import for ArrayList
, which we will later make use of when parsing variadic operators:
package tla; import java.util.List; import java.util.ArrayList; import static tla.TokenType.*; class Parser { private final List<Token> tokens; private int current = 0; Parser(List<Token> tokens) { this.tokens = tokens; } }
Now we hit our first major difference.
In Lox, precedence is given by a small hierarchy of named rules like equality
, comparison
, term
, etc.
TLA⁺ is more complicated than that.
The full language has around 100 operators spanning precedences 1 to 15!
If we wanted to match the book's style we'd have to write a tower of recursive functions like:
private Expr operatorExpressionPrec1() { ... } private Expr operatorExpressionPrec2() { ... } private Expr operatorExpressionPrec3() { ... } ... private Expr operatorExpressionPrec15() { ... }
where each operatorExpressionPrecN()
function parses all the prefix, infix, and postfix operators of precedence N
, and calls operatorExpressionPrecN+1()
.
Life is too short for this.
Instead, we'll adopt a technique alluded to in the text:
If you wanted to do some clever Java 8, you could create a helper method for parsing a left-associative series of binary operators given a list of token types, and an operand method handle to simplify this redundant code.
Here's the skeleton of our operator parsing function; the trick is to make the precedence a parameter to the function instead of a component of the name.
Add this code after the Parser()
constructor:
private Expr expression() { return operatorExpression(1); } private Expr operatorExpression(int prec) { if (prec == 16) return primary(); Expr expr = operatorExpression(prec + 1); return expr; }
Before filling out operatorExpression()
, we'll add the helper methods; these form an incredibly well-designed parsing API and are unchanged from the book:
private boolean match(TokenType... types) { for (TokenType type : types) { if (check(type)) { advance(); return true; } } return false; } private boolean check(TokenType type) { if (isAtEnd()) return false; return peek().type == type; } private Token advance() { if (!isAtEnd()) current++; return previous(); } private boolean isAtEnd() { return peek().type == EOF; } private Token peek() { return tokens.get(current); } private Token previous() { return tokens.get(current - 1); }
Now we have to define a table of operators with their details.
For this, create a new file Operator.java
containing a class recording operator fix type, token type, associativity, and precedence range:
package tla; enum Fix { PREFIX, INFIX, POSTFIX } class Operator { final Fix fix; final TokenType token; final boolean assoc; final int lowPrec; final int highPrec; public Operator(Fix fix, TokenType token, boolean assoc, int lowPrec, int highPrec) { this.fix = fix; this.token = token; this.assoc = assoc; this.lowPrec = lowPrec; this.highPrec = highPrec; } }
For convenience, import the Fix
enum values in Parser.java
so they can be referenced directly:
package tla; import java.util.List; import java.util.ArrayList; import static tla.TokenType.*; import static tla.Fix.*; class Parser {
You can find operator attributes on page 271 of Specifying Systems by Leslie Lamport, or this TLA⁺ tools source file.
We use a small subset of these operators.
Record their attributes in a table in Parser.java
, below operatorExpression()
:
private static final Operator[] operators = new Operator[] { new Operator(PREFIX, NOT, true, 4, 4 ), new Operator(PREFIX, ENABLED, false, 4, 15), new Operator(PREFIX, MINUS, true, 12, 12), new Operator(INFIX, AND, true, 3, 3 ), new Operator(INFIX, OR, true, 3, 3 ), new Operator(INFIX, IN, false, 5, 5 ), new Operator(INFIX, EQUAL, false, 5, 5 ), new Operator(INFIX, LESS_THAN, false, 5, 5 ), new Operator(INFIX, DOT_DOT, false, 9, 9 ), new Operator(INFIX, PLUS, true, 10, 10), new Operator(INFIX, MINUS, true, 11, 11), new Operator(POSTFIX, PRIME, false, 15, 15), };
Here's something a bit odd.
In TLA⁺, the infix minus operator (subtraction) has higher precedence at 11-11 than the infix plus operator (addition) at 10-10!
In grade school you probably learned acronyms like PEMDAS or BEDMAS to remember the order of operations in arithmetic.
Really, you learned parsing rules!
Now when writing our own parsing algorithm we come to understand that the order of these operations is not inscribed in the bedrock of mathematical truth but instead is a simple convention of mathematical notation.
TLA⁺ subverts this by parsing the expression a + b - c
as a + (b - c)
instead of the PEMDAS-style (a + b) - c
.
While this design decision is unusual, it is unlikely to cause any problems.
We need one more helper method before we can start working on operatorExpression()
: a superpowered match()
method specific to operators, which will try to match any operators of a given fix type & (low) precedence.
Add this code above match()
:
private Operator matchOp(Fix fix, int prec) { for (Operator op : operators) { if (op.fix == fix && op.lowPrec == prec) { if (match(op.token)) return op; } } return null; }
Okay, we're all set for the main event!
Here is how we modify operatorExpression()
to parse infix operators.
You can see a strong resemblance to the infix operator parsing code given in the book (new lines highlighted):
private Expr operatorExpression(int prec) { if (prec == 16) return primary(); Operator op; Expr expr = operatorExpression(prec + 1); while ((op = matchOp(INFIX, prec)) != null) { Token operator = previous(); Expr right = operatorExpression(op.highPrec + 1); expr = new Expr.Binary(expr, operator, right); } return expr; }
We use Java's combined conditional-assignment atop the while
loop to make our code more terse, both checking whether any operators were matched and getting details of the matched operator if so.
The interior of the while
loop is largely identical to infix parsing logic from the book, except for when we recurse to a higher precedence level: in the book the code recurses one level above the matched operator's precedence, but here we go one level above the upper bound of the matched operator's precedence.
It takes a bit of thinking to understand why this works for parsing expressions according to the precedence rules we want, but it does.
Take a minute to ponder it.
If you still don't get it, wait until we have a fully-functional expression parser then play around with this line to see how it changes the behavior.
Our code implements precedence ranges, but assumes all infix operators are associative. We need to modify the loop to return immediately if the infix operator is not associative:
while ((op = matchOp(INFIX, prec)) != null) { Token operator = previous(); Expr right = operatorExpression(op.highPrec + 1); expr = new Expr.Binary(expr, operator, right); if (!op.assoc) return expr; }
And that's how we parse infix operators!
That was the most complicated case, so let's move on to prefix operators.
Again our code resembles the prefix operator parsing logic given in the book.
Add this snippet near the top of operatorExpression()
:
private Expr operatorExpression(int prec) { if (prec == 16) return primary(); Operator op; if ((op = matchOp(PREFIX, prec)) != null) { Token opToken = previous(); Expr expr = operatorExpression( op.assoc ? op.lowPrec : op.highPrec + 1); return new Expr.Unary(opToken, expr); } Expr expr = operatorExpression(prec + 1);
Of note is the expression op.assoc ? op.lowPrec : op.highPrec + 1
controlling the precedence level at which we recurse.
To understand this expression, it is helpful to consider an example.
The negative prefix operator -1
is associative, so we should be able to parse the expression --1
as -(-1)
.
In order to do that after consuming the first -
we have to recurse into operatorExpression()
at the same precedence level.
Then we will consume the second -
, then again recurse and ultimately consume 1
in our yet-to-be-defined primary
method.
In contrast, the prefix operator ENABLED
is not associative and has range 4-15.
In that case we want to reject the expression ENABLED ENABLED TRUE
, so we recurse at a level higher than ENABLED
's upper precedence bound.
Thus the second ENABLED
will not be matched and we will report a parse error using methods defined in chapter section 6.3.
All that remains is to parse postfix operators.
These are not covered in the book, but they pretty much resemble infix operators without matching an expression on the right-hand side of the operator.
Add the highlighted code below the INFIX
logic in operatorExpression()
:
} while ((op = matchOp(POSTFIX, prec)) != null) { Token opToken = previous(); expr = new Expr.Unary(opToken, expr); if (!op.assoc) break; } return expr }
And that completes our operator parsing logic!
All that remains is our primary()
method.
It is fairly similar to that given in the book, with some omissions since our minimal TLA⁺ subset does not contain strings or null values.
Add this code after operatorExpression()
:
private Expr primary() { if (match(FALSE)) return new Expr.Literal(false); if (match(TRUE)) return new Expr.Literal(true); if (match(NUMBER)) { return new Expr.Literal(previous().literal); } if (match(LEFT_PAREN)) { Expr expr = expression(); consume(RIGHT_PAREN, "Expect ')' after expression."); return new Expr.Grouping(expr); } }
We have two final additions to primary()
, the IF
/THEN
/ELSE
operator and the set construction operator.
Here's how we parse IF
/THEN
/ELSE
; add the highlighted code to the bottom of primary()
:
return new Expr.Grouping(expr); } if (match(IF)) { Token operator = previous(); Expr condition = expression(); consume(THEN, "'THEN' required after 'IF' expression."); Expr yes = expression(); consume(ELSE, "'ELSE' required after 'THEN' expression."); Expr no = expression(); return new Expr.Ternary(operator, condition, yes, no); } }
And here's how we parse the set construction operator.
Again add the highlighted code to the bottom of primary()
:
return new Expr.Ternary(operator, condition, yes, no); } if (match(LEFT_BRACE)) { Token operator = previous(); List<Expr> elements = new ArrayList<Expr>(); if (RIGHT_BRACE != peek().type) { do { elements.add(expression()); } while (match(COMMA)); } consume(RIGHT_BRACE, "'}' is required to terminate finite set literal."); return new Expr.Variadic(operator, elements); } }
This will handle expressions like {1,2,3}
and (crucially) {}
.
The consume()
operator is defined in the next section, along with error reporting generally.
Section 6.3: Syntax Errors
We don't need to modify most of the error reporting code given in Section 6.3.
Here it is; add the consume()
method after match()
in the Parser
class:
private Token consume(TokenType type, String message) { if (check(type)) return advance(); throw error(peek(), message); }
Then add the error()
method after previous()
:
private ParseError error(Token token, String message) { Lox.error(token, message); return new ParseError(); }
In the TlaPlus
class, add the error()
method after report()
:
static void error(Token token, String message) { if (token.type == TokenType.EOF) { report(token.line, " at end", message); } else { report(token.line, " at '" + token.lexeme + "'", message); } }
Add the ParseError
class definition at the top of the Parser
class:
class Parser { private static class ParseError extends RuntimeException {} private final List<Token> tokens;
One piece of code we do need to modify is the synchronization()
method.
TLA⁺ doesn't have statements separated by semicolons, so our best choice of synchronization point is the EQUAL_EQUAL
symbol denoting an operator definition; add this in the Parser
class after error()
:
private void synchronize() { advance(); while(!isAtEnd()) { if (previous().type == EQUAL_EQUAL) return; advance(); } }
Fully-featured TLA⁺ parsers generally look ahead to identify the start of a new operator definition (not trivial!) and insert a synthetic zero-width token for the parser to consume.
For an example, see SANY's very complicated belchDEF()
method.
We won't be doing that here, but it's a good technique to know about; looking ahead then inserting synthetic tokens to disambiguate syntax is a common method when parsing complicated & ambiguous languages like TLA⁺.
Section 6.4: Wiring up the Parser
In section 6.4 we finally arrive at the point where you can parse TLA⁺ expressions in your REPL.
In the run()
method of the TlaPlus
class, delete the code printing out the scanned tokens (or not, it can be informative to keep around) and replace it with this:
List<Token> tokens = scanner.scanTokens(); Parser parser = new Parser(tokens); Expr expression = parser.parse(); // Stop if there was a syntax error. if (hadError) return; System.out.println(new AstPrinter().print(expression)); }
Fire up the interpreter and parse a TLA⁺ expression!
> {1 + 2, IF TRUE THEN 3 ELSE 4, {}} ({ (+ 1 2) (IF true 3 4) ({))
If you got out of sync, you can find a snapshot of the expected state of the code in this repo directory. Next tutorial: Evaluating Constant TLA⁺ Expressions!
Challenges
Here are some optional challenges to flesh out your TLA⁺ interpreter, roughly ranked from simplest to most difficult.