Differences

This shows you the differences between two versions of the page.

--- creating:jlists [2025/04/27 21:44] – Finished how to handle infix op ambiguity ahelwer
+++ creating:jlists [2025/05/21 18:21] (current) – Disjunction does not short-circuit ahelwer
@@ Line 39: / Line 39: @@
     if (match(AND, OR)) {
       Token op = previous();
-      List<Expr> juncts = new ArrayList<Expr>();
+      List<Expr> juncts = new ArrayList<>();
       do {
         juncts.add(expression());
@@ Line 92: / Line 92: @@
 <code java [highlight_lines_extra="5,6"]>
 private static final Operator[] operators = new Operator[] {
-    new Operator(PREFIX,  NOT,        true,   4,  4 ),
+    new Operator(Fix.PREFIX,  NOT,        true,   4,  4 ),
-    new Operator(PREFIX,  ENABLED,    false,  4,  15),
+    new Operator(Fix.PREFIX,  ENABLED,    false,  4,  15),
-    new Operator(PREFIX,  MINUS,      true,   12, 12),
+    new Operator(Fix.PREFIX,  MINUS,      true,   12, 12),
-    new Operator(INFIX,   AND,        true,   3,  3 ),
+    new Operator(Fix.INFIX,   AND,        true,   3,  3 ),
-    new Operator(INFIX,   OR,         true,   3,  3 ),
+    new Operator(Fix.INFIX,   OR,         true,   3,  3 ),
-    new Operator(INFIX,   IN,         false,  5,  5 ),
+    new Operator(Fix.INFIX,   IN,         false,  5,  5 ),
 </code>
@@ Line 115: / Line 115: @@
 ====== Beyond Context-Free ======
-The answer to our parsing problem is deceptively simple: before consuming a ''/\'' or ''\/'' token as an infix operator in ''operatorExpression()'', first we check whether that token could actually be the start of another junct in a jlist that is currently being parsed.
+The answer to our parsing problem is deceptively simple: before matching a token in ''check()'', we first check to ensure it starts on a column to the right of the current jlist's column.
-If so, we prioritize treating the ''/\'' or ''\/'' token as the start of a new junct instead of an infix operator symbol.
+If the token instead starts to the left or equal to the current jlist's column, the match fails and the token is not consumed - even if the token is what is expected by the caller to ''check()''!
+If we aren't currently parsing a jlist then ''check()'' functions as normal.
 For readers who have taken a computer science class in formal languages, alarm bells should be going off - changing the parse behavior depending on the current context is a //big// change!
@@ Line 130: / Line 131: @@
 Ultimately the shape of this state is a stack of nested jlists.
-Each entry in the stack records the type of jlist (conjunction or disjunction) and the column of its vertical alignment.
+Each entry in the stack records the column of a jlist's vertical alignment.
-When we start parsing a new jlist, we push its details to this stack.
+When we start parsing a new jlist, we push its column to this stack.
-When we finish parsing a jlist, we pop it from the stack.
+When we finish parsing a jlist, we pop from the stack.
-The top of this stack is the "current" jlist context, and it modifies how some of our helper methods work.
+The top of this stack is the "current" jlist context, and it modifies how ''check()'' works.
-Some of the logic here gets a bit finicky, so let's encapsulate it within a class.
-Create a new file called ''JListContext.java''; import [[https://docs.oracle.com/javase/8/docs/api/java/util/Deque.html|deque]]-related classes so we have a stack, and use the handy Java 17 [[https://openjdk.org/jeps/395|records]] feature to cut down on boilerplate and define a quick ''JListInfo'' class for our stack to hold:
-<code java>
-package tla;
-import java.util.ArrayDeque;
-import java.util.Deque;
-class JListContext {
-  private record JListInfo(TokenType type, int column) { }
-  private final Deque<JListInfo> stack = new ArrayDeque<>();
-}
-</code>
-Add an instance of ''JListContext'' as a new class variable in the ''Parser'' class:
+We'll be using Java's ''[[https://docs.oracle.com/javase/8/docs/api/java/util/ArrayDeque.html|ArrayDeque]]'' class as a stack.
+It only needs to hold the jlist column.
+Define a new class variable near the top of the ''Parser'' class:
 <code java [highlight_lines_extra="4"]>
   private final List<Token> tokens;
   private int current = 0;
   private final boolean replMode;
-  private final JListContext jlists = new JListContext();
+  private final ArrayDeque<Integer> jlists = new ArrayDeque<>();
 </code>
-In the ''JListContext'' class, add some helpers to create a new jlist or terminate the current one:
+Import ''ArrayDeque'' at the top of ''Parser.java'':
+<code java [highlight_lines_extra="5"]>
+package tla;
-<code java>
+import java.util.List;
-  public void startNew(Token op) {
+import java.util.ArrayList;
-    stack.push(new JListInfo(op.type, op.column));
+import java.util.ArrayDeque;
-  }
-  public void terminateCurrent() {
+import static tla.TokenType.*;
-    stack.pop();
-  }
+class Parser {
 </code>
-Back in the ''Parser'' class, augment our jlist parsing logic with appropriate calls to these methods:
+Now augment our jlist parsing logic in ''primary()'' to push to & pop from the ''jlists'' stack when beginning and ending a jlist:
 <code java [highlight_lines_extra="3,8"]>
     if (match(AND, OR)) {
       Token op = previous();
-      jlists.startNew(op);
+      jlists.push(op.column);
-      List<Expr> juncts = new ArrayList<Expr>();
+      List<Expr> juncts = new ArrayList<>();
       do {
         juncts.add(expression());
       } while (matchBullet(op.type, op.column));
-      jlists.terminateCurrent();
+      jlists.pop();
       return new Expr.Variadic(op, juncts);
     }
 </code>
-Now we write a critical method.
+Then add this critical line to the ''check()'' helper, to always return false if we are inside a jlist and the next token is not to the right of the jlist's column:
-In ''JListContext'', define ''isNewBullet()'' to return true if the given token could act as the start of a new junct in the current jlist:
+<code java [highlight_lines_extra="3"]>
-<code haskell>
+  private boolean check(TokenType type) {
-  public boolean isNewBullet(Token op) {
+    if (isAtEnd()) return false;
-    JListInfo current = this.stack.peekFirst();
+    if (!jlists.isEmpty() && peek().column <= jlists.peek()) return false;
-    return current != null
+    return peek().type == type;
-        && current.type == op.type
-        && current.column == op.column;
-  }
-</code>
-''isNewBullet()'' first retrieves the top element of our stack, representing the current jlist.
-If we are not currently parsing a jlist, this element will be ''null'' and the method returns false.
-Otherwise, it checks the current jlist attributes against the given token.
-If the token type & column both match, the method returns true.
-Where should call this powerful method?
-There are actually a few possibilities, but I think it makes the most sense to check ''isNewBullet()'' in the ''matchOp()'' method of the ''Parser'' class:
-<code java [highlight_lines_extra="2"]>
-  private Operator matchOp(Fix fix, int prec) {
-    if (jlists.isNewBullet(peek())) return null;
-    for (Operator op : operators) {
-      if (op.fix == fix && op.lowPrec == prec) {
-        if (match(op.token)) return op;
-      }
-    }
-    return null;
   }
 </code>
@@ Line 218: / Line 184: @@
 This works!
 We can parse jlists again!
-It's a bit tricky to figure out what we actually did, so let's take a close look at our infix operator parsing code in ''operatorExpression()'':
+It's a bit tricky to figure out how this changes things in practice, so let's take a close look at our infix operator parsing code in ''operatorExpression()'':
 <code java>
     Expr expr = operatorExpression(prec + 1);
-    while ((op = matchOp(INFIX, prec)) != null) {
+    while ((op = matchOp(Fix.INFIX, prec)) != null) {
       Token operator = previous();
       Expr right = operatorExpression(op.highPrec + 1);
@@ Line 233: / Line 199: @@
   /\ 1
   /\ 2
-  /\ 3
 </code>
 The parser will:
@@ Line 240: / Line 205: @@
   - find ''1'' as the value of ''expr'' at the top of the infix op parsing loop
   - call ''matchOp()'' with ''/\'' as the next token to be looked at
-  - in ''matchOp()'', call ''jlists.isNewBullet()'', and immediately return null
+  - in ''matchOp()'', ultimately call ''check()'' which returns false because of our new logic
-  - eventually return ''1'' to the jlist loop as a complete expression
+  - never enter the infix op parsing loop and return ''1'' to the jlist loop as a complete expression
   - in the jlist loop, call ''matchBullet()'' to successfully consume ''/\''
-If the call to ''jlists.isNewBullet()'' were not present, ''matchOp()'' would have matched the next ''/\'' as just another infix operator token.
+If the line we added in ''check()'' were not present, ''matchOp()'' would have matched the next ''/\'' as just another infix operator token.
 But we pre-empted it!
-So now ''/\'' and ''\/'' tokens will only be parsed as infix operator tokens when they would not be the start of a new junct in a currently-active jlist.
+So now ''/\'' and ''\/'' tokens will only be parsed as infix operator tokens when they are to the right of the current jlist.
+Note that our ''matchBullet()'' helper was intentionally written not to use ''check()'' so that it is actually capable of matching & consuming the ''/\'' or ''\/'' token.
-====== Termination ======
-We're close, but we're not //quite// there yet.
+There is one other benefit we've unlocked.
-Consider the following TLA⁺ snippet:
+It isn't enough to parse valid jlists, we must also reject invalid ones!
+The TLA⁺ language specification requires that the parser reject attempts to defeat vertical alignment encapsulation by abusing delimiters like ''(''/'')'' or ''{''/''}''.
+What this means is that inputs like the following should fail to parse:
 <code haskell>
 op ==
   /\ 1
-  /\ \/ 2
+  /\ (2
+)
   /\ 3
 </code>
-Our code parses this as though it's:
+Indeed, our parser will detect an error here.
-<code haskell>
+The parentheses parsing logic will call ''consume()'' attempting to match the closing '')'', but within ''consume()'' the call to ''check()'' will always fail since '')'' is not to the right of the current jlist column.
-op ==
+This gives rise to a parse error.
-  /\ 1
-  /\ \/ (2 /\ 3)
+====== Error Recovery ======
+Talk of parsing errors nicely segues us onto the topic of error recovery.
+Recall that on error, we call ''synchronize()'' in the ''Parser'' class and desperately churn through tokens looking for the start of an operator definition.
+Jlists complicate this a bit!
+What happens if an error occurs while parsing a jlist and we enter ''synchronize()'' carrying around a bunch of state in the ''jlists'' stack?
+Well, nonsensical things happen.
+To fix this we just wipe out our jlist stack at the top of ''synchronize()'':
+<code java [highlight_lines_extra="2"]>
+  private void synchronize() {
+    jlists.clear();
+    advance();
+    while (!isAtEnd()) {
 </code>
-We thought we had defeated the tyranny of infix op ambiguity, but we were wrong!
-What happened?
-Well, our call to ''jlists.isNewBullet()'' checks the ''/\'' in ''/\ 3'' against the //top-most active jlist//, which is a ''\/'' at a different column!
-The enclosing ''/\'' jlist is not checked.
-You might be tempted to fix this by modifying ''isNewBullet()'' to check tokens against //all// the jlists in the stack, but actually this problem requires a deeper approach.
-Each junct's expression should be entirely contained to the right of its enclosing jlist's vertical alignment.
+Done.
-Put another way, if you put a token on the next line to the left of the ''/\'' or ''\/'', that should terminate the jlist.
+You've successfully parsed vertically-aligned conjunction & disjunction lists in TLA⁺!
-That enables you to do neat things like:
+This puts you in rarified air.
-<code haskell>
+Only a handful of people in the world possess this knowledge, and now you are among them.
-op ==
-  /\ A
+====== Evaluation ======
-  /\ B
-  = C
+Now that we can parse jlists, let's interpret them.
+Similar to the logical operators covered in the book, conjunction lists short-circuit.
+That means conjuncts are evaluated in order and, if a single conjunct is false, evaluation immediately stops and returns false.
+In an odd contrast, disjunction lists do //not// short-circuit; this is because they are used to express nondeterminism, as you will learn several chapters from now.
+Add conjunction list evaluation logic to ''visitVariadicExpr()'' in the ''Interpreter'' class, below the set constructor logic:
+<code java [highlight_lines_extra="2,3,4,5,6,7,8"]>
+        return set;
+      case AND:
+        for (Expr conjunct : expr.parameters) {
+          Object result = evaluate(conjunct);
+          checkBooleanOperand(expr.operator, result);
+          if (!(boolean)result) return false;
+        }
+        return true;
+      default:
+        // Unreachable.
+        return null
 </code>
-which should be parsed as ''(A /\ B) = C''.
-The ''='' sign is not to the right of the jlist's vertical alignment, so it terminates that jlist.
+Then add the disjunction list logic right below that:
-To check this, we need another helper method in ''JListContext'' class called ''isAboveCurrent()'':
+<code java [highlight_lines_extra="2,3,4,5,6,7,8,9"]>
+        return true;
+      case OR:
+        boolean result = false;
+        for (Expr disjunct : expr.parameters) {
+          Object junctResult = evaluate(disjunct);
+          checkBooleanOperand(expr.operator, junctResult);
+          result |= (Boolean)junctResult;
+        }
+        return result;
+      default:
+        // Unreachable.
+        return null;
+</code>
+Remember we also parsed the ''/\'' and ''\/'' //infix// operators, so we need to evaluate those as well.
+This is a bit annoying!
+They should function in the exact same way as their respective jlists, but now we have to copy a duplicate of our evaluation logic to ''visitBinaryExpr()''.
+So, we'll perform a trick which also has its parallel in chapter 9 of the book: [[https://craftinginterpreters.com/control-flow.html#desugaring|desugaring]]!
+We will flatten infix ''/\'' and ''\/'' operators so they actually form a jlist.
+In the ''Parser'' class, modify the infix operator parsing loop in ''operatorExpression()'' by replacing the ''expr ='' line with a call to a new helper, ''flattenInfix()'':
+<code java [highlight_lines_extra="5"]>
+    Expr expr = operatorExpression(prec + 1);
+    while ((op = matchOp(Fix.INFIX, prec)) != null) {
+      Token operator = previous();
+      Expr right = operatorExpression(op.highPrec + 1);
+      expr = flattenInfix(expr, operator, right);
+      if (!op.assoc) return expr;
+    }
+</code>
+Define ''flattenInfix()'' above ''matchBullet()'' in the ''Parser'' class:
 <code java>
-  public boolean isAboveCurrent(Token tok) {
+  private Expr flattenInfix(Expr left, Token op, Expr right) {
-    JListInfo current = this.stack.peekFirst();
+    if (op.type == AND) {
-    return current == null || current.column < tok.column;
+    } else if (op.type == OR) {
+    } else {
+      return new Expr.Binary(left, op, right);
+    }
   }
 </code>
+The helper returns a regular binary expression except when the operator is ''AND'' or ''OR''; in those cases, we return two-parameter instances of ''Expr.Variadic'' instead:
-We then add a call to ''isAboveCurrent()'' in a very fundamental helper method of our ''Parser'' class, ''check()'':
+<code java [highlight_lines_extra="3,4,5,6,8,9,10,11"]>
-<code java [highlight_lines_extra="2"]>
+  private Expr flattenInfix(Expr left, Token op, Expr right) {
-  private boolean check(TokenType type) {
+    if (op.type == AND) {
-    if (!jlists.isAboveCurrent(peek())) return false;
+      List<Expr> conjuncts = new ArrayList<>();
-    if (isAtEnd()) return false;
+      conjuncts.add(left);
-    return peek().type == type;
+      conjuncts.add(right);
+      return new Expr.Variadic(op, conjuncts);
+    } else if (op.type == OR) {
+      List<Expr> disjuncts = new ArrayList<>();
+      disjuncts.add(left);
+      disjuncts.add(right);
+      return new Expr.Variadic(op, disjuncts);
+    } else {
+      return new Expr.Binary(left, op, right);
+    }
   }
 </code>
-What does this do?
+====== Further Desugaring ======
-Well, within a given call to ''expression()'', it essentially forbids matching any tokens that are not to the right of the current jlist!
-If an expression is being parsed within the context of a jlist, and it encounters a token on the next line that is not to the right of the current jlist alignment, control flow is ultimately kicked back up to the jlist loop which then calls ''jlists.terminateCurrent()''.
-Parsing can then resume as expected.
-There is one other benefit we've unlocked.
+This is all right, but it could be even better!
-It isn't enough to parse valid jlists, we must also reject invalid ones!
+An expression like ''1 /\ 2 /\ 3 /\ 4'' will be translated to:
-The TLA⁺ language specification requires that the parser reject attempts to defeat vertical alignment encapsulation by abusing delimiters like ''(''/'')'' or ''{''/''}''.
-What this means is that inputs like the following should fail to parse:
 <code haskell>
-op ==
+/\ /\ /\ 1
-  /\ 1
+      /\ 2
-  /\ (2
+   /\ 3
-)
+/\ 4
-  /\ 3
 </code>
-Indeed, our parser will detect an error here.
+Which is quite a lot of nesting!
-The parentheses parsing logic will call ''consume()'' attempting to match the closing '')'', but within ''consume()'' the call to ''check()'' will always fail due to our call to ''jlists.aboveCurrent()''.
+It would be nice if it were instead translated to a single flat jlist with four conjuncts.
-This gives rise to a parse error.
+This rewrite is safe because conjunction & disjunction are associative.
+So, define a new ''flattenJLists()'' helper which accepts a list of juncts, then builds a new jlist by flattening any nested jlists inside the juncts if they're the same type as the containing jlist.
+Here's what it looks like:
+<code java>
+  private Expr flattenJLists(Token op, List<Expr> juncts) {
+    List<Expr> flattened = new ArrayList<>();
+    for (Expr junct : juncts) {
+      Expr.Variadic vjunct;
+      if ((vjunct = asVariadicOp(op, junct)) != null) {
+        flattened.addAll(vjunct.parameters);
+      } else {
+        flattened.add(junct);
+      }
+    }
-====== Error Recovery ======
+    return new Expr.Variadic(op, flattened);
+  }
+</code>
-Talk of parsing errors nicely segues us onto the topic of error recovery.
+This uses Java's conditional-assign syntax along with the ''asVariadicOp()'' helper, which returns an ''Expr.Variadic'' instance if the given expression is a jlist of the given type:
+<code java>
+  private Expr.Variadic asVariadicOp(Token op, Expr expr) {
+    if (expr instanceof Expr.Variadic) {
+      Expr.Variadic vExpr = (Expr.Variadic)expr;
+      if (vExpr.operator.type == op.type) return vExpr;
+    }
+    return null;
+  }
+</code>
+Replace the calls to ''new Expr.Variadic()'' in ''flattenInfix()'' with calls to ''flattenJLists()'' instead:
+<code java [highlight_lines_extra="6,11"]>
+  private Expr flattenInfix(Expr left, Token op, Expr right) {
+    if (op.type == AND) {
+      List<Expr> conjuncts = new ArrayList<>();
+      conjuncts.add(left);
+      conjuncts.add(right);
+      return flattenJLists(op, conjuncts);
+    } else if (op.type == OR) {
+      List<Expr> disjuncts = new ArrayList<>();
+      disjuncts.add(left);
+      disjuncts.add(right);
+      return flattenJLists(op, disjuncts);
+    } else {
+      return new Expr.Binary(left, op, right);
+    }
+  }
+</code>
+Do the same in the jlist parsing loop in ''primary()'':
+<code java [highlight_lines_extra="9"]>
+    if (match(AND, OR)) {
+      Token op = previous();
+      jlists.push(op.column);
+      List<Expr> juncts = new ArrayList<>();
+      do {
+        juncts.add(expression());
+      } while (matchBullet(op.type, op.column));
+      jlists.pop();
+      return flattenJLists(op, juncts);
+    }
+</code>
+Infix ''/\'' and ''\/'' operators will now be interpreted by our jlist evaluation code, and jlists will be as flat as they possibly can be.
+So concludes this lynchpin tutorial on conjunction & disjunction lists.
+Good job making it to the end!
+If your code got out of sync during this tutorial, you can find its expected state [[https://github.com/tlaplus/devkit/tree/main/6-jlists|here]].
+Continue on the [[creating:operators|next page]], where we learn to parse operators with parameters!
 ====== Challenges ======
@@ Line 330: / Line 425: @@
   - It's tempting to summarize this chapter as us solving the jlist parsing problem by making jlists have higher precedence than infix operators, but that is not quite the case. Think carefully about what precedence means; is there a difference between what might be called //lexical// precedence - where one interpretation of a token takes higher precedence than another - and parsing precedence? Did we make use of that here? What are some ways that parsers can deal with the problem of the same token having multiple possible meanings?
   - Jlists are not the only context-sensitive language construct in TLA⁺. Nested proof steps are another. Take some time to read the [[https://lamport.azurewebsites.net/tla/tla2-guide.pdf|TLA⁺ 2 language spec (pdf)]] and think about what makes proof syntax context-sensitive and how you might parse it if you had to.
+  - Write unit tests for your jlist parsing code. Think of every weird jlist case you can. Look at [[https://github.com/tlaplus/rfcs/blob/2a772d9dd11acec5d7dedf30abfab91a49de48b8/language_standard/tests/tlaplus_syntax/conjlist.txt|this standard test file]] for inspiration.
   - If you are familiar with the [[https://youtu.be/IycOPFmEQk8|pumping lemma for context-free languages]], use it to prove that TLA⁺ with jlists is not context-free.
   - Most courses in formal languages skip directly from context-free grammars to Turing machines, but this misses a number of automata of intermediate power. See whether it is possible to use [[https://cs.stackexchange.com/a/170282/110483|context-free grammars with storage]] to concisely express a formal grammar for TLA⁺ that includes jlists. The viability of this is unknown; share your results with [[https://groups.google.com/g/tlaplus|the mailing list]] if accomplished!