creating:jlists

This is an old revision of the document!


PHP's gd library is missing or unable to create PNG images

Conjunction & Disjunction Lists

Chapter 9 of Crafting Interpreters covers control flow - if, while, for, and all that. TLA⁺ doesn't really have that, because TLA⁺ specifications aren't imperative programs. However, TLA⁺ does have something similar. When specifying a system's actions, we commonly need to do two things: write preconditions or "guards" controlling whether an action can be taken, and specify the set of possible next-state transitions. For the former we use conjunction (aka logical "and"), and for the latter we use disjunction (aka logical "or"). These operators are so ubiquitous in TLA⁺ that a special syntax was developed to ease their use: vertically-aligned conjunction & disjunction lists, henceforth called "jlists" for brevity. Here is what they look like:

op ==
  /\ A
  /\ B
  /\ \/ C
     \/ D

Here, /\ is conjunction and \/ is disjunction. If a set of conjuncts or disjuncts ("juncts") are vertically aligned (have the same start column) then those juncts are grouped together. So, we want to parse the above example as (/\ A B (\/ C D)), using our Expr.Variadic type. This is a very nice language feature, but parsing these lists properly is the greatest challenge we've yet faced. It is only a slight exaggeration to say the purpose of this entire tutorial series is to teach you how to parse these jlists. Let's begin!

A First Attempt

Before we increase the complexity of our parser, let's convince ourselves it is necessary. We will attempt to parse jlists using familiar recursive descent techniques, then see where it goes wrong. At first this seems like it should be possible! Our jlists resemble the set constructor {1, 2, 3} syntax in that they're started with a unique token (\/ or /\), then feature an unbounded number of expressions separated by delimiters - here, \/ or /\. There isn't really a terminating token but that does not seem to be an obstacle. So, try adding this to the primary() method of your Parser class, below the set constructor logic; it uses the same do/while method to parse the juncts as the set constructor parsing logic uses to parse the element expressions:

      return new Expr.Variadic(operator, elements);
    }
 
    if (match(AND, OR)) {
      Token op = previous();
      List<Expr> juncts = new ArrayList<Expr>();
      do {
        juncts.add(expression());
      } while (matchBullet(op.type, op.column));
      return new Expr.Variadic(op, juncts);
    }
 
    throw error(peek(), "Expect expression.");
  }

Remember our special column field we added to the Token class way back in the scanning chapter? Now we put it to use in a new helper function for the Parser class, matchBullet(); think "bullet" as in "bullet points":

  private boolean matchBullet(TokenType op, int column) {
    if (peek().type == op && peek().column == column) {
      advance();
      return true;
    }
 
    return false;
  }

matchBullet() consumes a /\ or \/ token only if it is of the expected type and column. In this way our jlist parsing logic will only add another expression if it finds another vertically-aligned /\ or \/ token on the next line. This all... kind of works? You can parse a surprisingly broad set of jlists with just our simple logic! Try it out; here are some jlists that can now be parsed:

op ==
  /\ 1
  /\ 2
op ==
  \/ 1
  \/ 2
op ==
  /\ 1
  /\ \/ 2
     \/ 3

Make them as nested and wild as you like, our code will handle them! So are we done? Unfortunately not. Here's the proverbial wrench: TLA⁺ also allows \/ and /\ as infix operators. We skipped defining them in the expressions chapter where we learned how to parse operators of different precedences; let's add them to operators table in our Token class now:

private static final Operator[] operators = new Operator[] {
    new Operator(PREFIX,  NOT,        true,   4,  4 ),
    new Operator(PREFIX,  ENABLED,    false,  4,  15),
    new Operator(PREFIX,  MINUS,      true,   12, 12),
    new Operator(INFIX,   AND,        true,   3,  3 ),
    new Operator(INFIX,   OR,         true,   3,  3 ),
    new Operator(INFIX,   IN,         false,  5,  5 ),

Just like that, our jlist parsing code no longer works. matchBullet() never has a chance to match a /\ or \/ token, because they're eaten up by operatorExpression() first. So, a jlist like:

op ==
  /\ 1
  /\ 2
  /\ 3

is parsed as a jlist with a single conjunct, the expression 1 /\ 2 /\ 3. This is awful! How can we fix it?

Beyond Context-Free

The answer to our parsing problem is deceptively simple: before consuming a /\ or \/ token as an infix operator in operatorExpression(), first we check whether that token could actually be the start of another junct in a jlist that is currently being parsed. If so, we prioritize treating the /\ or \/ token as the start of a new junct instead of an infix operator symbol.

For readers who have taken a computer science class in formal languages, alarm bells should be going off - changing the parse behavior depending on the current context is a big change! In theoretical terms, our parser is currently context-free: it doesn't matter what an expression() is being nested inside, it will be parsed the same no matter what. The context - or surrounding code being parsed - does not change how expression() behaves. However, if we want to parse jlists correctly we will need to break this limitation and move to the more powerful context-sensitive class of formal languages.

What does this change mean in practical terms? First, it means we cannot easily write the formal TLA⁺ grammar in Backus-Naur Form (BNF) as we learned to do in Chapter 5: Representing Code. Although grammar notation does exist for context-sensitive languages, it is unintuitive and more of a hindrance to langage implementers than a help. Thus formal TLA⁺ grammars exclude jlists and use BNF to define the non-jlist parts of the language, then use plain language to describe how jlists work. Second - returning to our parser implementation here - it means that our Parser class carries some additional mutable state in a class variable, and its methods change their behavior based on that state.

Ultimately the shape of this state is a stack of nested jlists. Each entry in the stack records the type of jlist (conjunction or disjunction) and the column of its vertical alignment. When we start parsing a new jlist, we push its details to this stack. When we finish parsing a jlist, we pop it from the stack. The top of this stack is the "current" jlist context, and it modifies how some of our helper methods work.

Some of the logic here gets a bit finicky, so let's encapsulate it within a class. Create a new file called JListContext.java; import deque-related classes so we have a stack, and use the handy Java 17 records feature to cut down on boilerplate and define a quick JListInfo class for our stack to hold:

package tla;
 
import java.util.ArrayDeque;
import java.util.Deque;
 
class JListContext {
 
  private record JListInfo(TokenType type, int column) { }
 
  private final Deque<JListInfo> stack = new ArrayDeque<>();
}

Add an instance of JListContext as a new class variable in the Parser class:

  private final List<Token> tokens;
  private int current = 0;
  private final boolean replMode;
  private final JListContext jlists = new JListContext();

In the JListContext class, add some helpers to create a new jlist or terminate the current one:

  public void startNew(Token op) {
    stack.push(new JListInfo(op.type, op.column));
  }
 
  public void terminateCurrent() {
    stack.pop();
  }

Back in the Parser class, augment our jlist parsing logic with appropriate calls to these methods:

    if (match(AND, OR)) {
      Token op = previous();
      jlists.startNew(op);
      List<Expr> juncts = new ArrayList<Expr>();
      do {
        juncts.add(expression());
      } while (matchBullet(op.type, op.column));
      jlists.terminateCurrent();
      return new Expr.Variadic(op, juncts);
    }

Now we write a critical method. In JListContext, define isNewBullet() to return true if the given token could act as the start of a new junct in the current jlist:

  public boolean isNewBullet(Token op) {
    JListInfo current = this.stack.peekFirst();
    return current != null
        && current.type == op.type
        && current.column == op.column;
  }

isNewBullet() first retrieves the top element of our stack, representing the current jlist. If we are not currently parsing a jlist, this element will be null and the method returns false. Otherwise, it checks the current jlist attributes against the given token. If the token type & column both match, the method returns true.

Where should call this powerful method? There are actually a few possibilities, but I think it makes the most sense to check isNewBullet() in the matchOp() method of the Parser class:

  private Operator matchOp(Fix fix, int prec) {
    if (jlists.isNewBullet(peek())) return null;
    for (Operator op : operators) {
      if (op.fix == fix && op.lowPrec == prec) {
        if (match(op.token)) return op;
      }
    }
 
    return null;
  }

This works! We can parse jlists again! It's a bit tricky to figure out what we actually did, so let's take a close look at our infix operator parsing code in operatorExpression():

    Expr expr = operatorExpression(prec + 1);
    while ((op = matchOp(INFIX, prec)) != null) {
      Token operator = previous();
      Expr right = operatorExpression(op.highPrec + 1);
      expr = new Expr.Binary(expr, operator, right);
      if (!op.assoc) return expr;
    }

Consider what happens when trying to parse this TLA⁺ snippet:

op ==
  /\ 1
  /\ 2
  /\ 3

The parser will:

  1. look for an expression following op ==
  2. find /\, push a new jlist to the stack, then look for an expression
  3. find 1 as the value of expr at the top of the infix op parsing loop
  4. call matchOp() with /\ as the next token to be looked at
  5. in matchOp(), call jlists.isNewBullet(), and immediately return null
  6. eventually return 1 to the jlist loop as a complete expression
  7. in the jlist loop, call matchBullet() to successfully consume /\

If the call to jlists.isNewBullet() were not present, matchOp() would have matched the next /\ as just another infix operator token. But we pre-empted it! So now /\ and \/ tokens will only be parsed as infix operator tokens when they would not be the start of a new junct in a currently-active jlist.

Termination

We're close, but we're not quite there yet. Consider the following TLA⁺ snippet:

op ==
  /\ 1
  /\ \/ 2
  /\ 3

Our code parses this as though it's:

op ==
  /\ 1
  /\ \/ (2 /\ 3)

We thought we had defeated the tyranny of infix op ambiguity, but we were wrong! What happened? Well, our call to jlists.isNewBullet() checks the /\ in /\ 3 against the top-most active jlist, which is a \/ at a different column! The enclosing /\ jlist is not checked. You might be tempted to fix this by modifying isNewBullet() to check tokens against all the jlists in the stack, but actually this problem requires a deeper approach.

Each junct's expression should be entirely contained to the right of its enclosing jlist's vertical alignment. Put another way, if you put a token on the next line to the left of the /\ or \/, that should terminate the jlist. That enables you to do neat things like:

op ==
  /\ A
  /\ B
  = C

which should be parsed as (A /\ B) = C. The = sign is not to the right of the jlist's vertical alignment, so it terminates that jlist. To check this, we need another helper method in JListContext class called isAboveCurrent():

  public boolean isAboveCurrent(Token tok) {
    JListInfo current = this.stack.peekFirst();
    return current == null || current.column < tok.column;
  }

We then add a call to isAboveCurrent() in a very fundamental helper method of our Parser class, check():

  private boolean check(TokenType type) {
    if (!jlists.isAboveCurrent(peek())) return false;
    if (isAtEnd()) return false;
    return peek().type == type;
  }

What does this do? Well, within a given call to expression(), it essentially forbids matching any tokens that are not to the right of the current jlist! If an expression is being parsed within the context of a jlist, and it encounters a token on the next line that is not to the right of the current jlist alignment, control flow is ultimately kicked back up to the jlist loop which then calls jlists.terminateCurrent(). Parsing can then resume as expected.

There is one other benefit we've unlocked. It isn't enough to parse valid jlists, we must also reject invalid ones! The TLA⁺ language specification requires that the parser reject attempts to defeat vertical alignment encapsulation by abusing delimiters like (/) or {/}. What this means is that inputs like the following should fail to parse:

op ==
  /\ 1
  /\ (2
)
  /\ 3

Indeed, our parser will detect an error here. The parentheses parsing logic will call consume() attempting to match the closing ), but within consume() the call to check() will always fail due to our call to jlists.aboveCurrent(). This gives rise to a parse error.

Error Recovery

Talk of parsing errors nicely segues us onto the topic of error recovery.

Challenges

Here are a number of optional challenges, in roughly increasing levels of difficulty.

  1. Python uses indentation to determine statement membership in a code block. Does this make Python context-sensitive?
  2. It's tempting to summarize this chapter as us solving the jlist parsing problem by making jlists have higher precedence than infix operators, but that is not quite the case. Think carefully about what precedence means; is there a difference between what might be called lexical precedence - where one interpretation of a token takes higher precedence than another - and parsing precedence? Did we make use of that here? What are some ways that parsers can deal with the problem of the same token having multiple possible meanings?
  3. Jlists are not the only context-sensitive language construct in TLA⁺. Nested proof steps are another. Take some time to read the TLA⁺ 2 language spec (pdf) and think about what makes proof syntax context-sensitive and how you might parse it if you had to.
  4. If you are familiar with the pumping lemma for context-free languages, use it to prove that TLA⁺ with jlists is not context-free.
  5. Most courses in formal languages skip directly from context-free grammars to Turing machines, but this misses a number of automata of intermediate power. See whether it is possible to use context-free grammars with storage to concisely express a formal grammar for TLA⁺ that includes jlists. The viability of this is unknown; share your results with the mailing list if accomplished!

< Previous Page | Table of Contents | Next Page >

  • creating/jlists.1745790245.txt.gz
  • Last modified: 2025/04/27 21:44
  • by ahelwer