Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
creating:jlists [2025/04/27 19:38] – First parse attempt, explain context-sensitivity ahelwer | creating:jlists [2025/05/21 18:21] (current) – Disjunction does not short-circuit ahelwer | ||
---|---|---|---|
Line 14: | Line 14: | ||
/\ B | /\ B | ||
/\ \/ C | /\ \/ C | ||
- | \/ D | + | \/ D |
</ | </ | ||
Line 39: | Line 39: | ||
if (match(AND, OR)) { | if (match(AND, OR)) { | ||
Token op = previous(); | Token op = previous(); | ||
- | List< | + | List< |
do { | do { | ||
juncts.add(expression()); | juncts.add(expression()); | ||
Line 51: | Line 51: | ||
Remember our special '' | Remember our special '' | ||
- | Now we put it to use in a new helper function for the '' | + | Now we put it to use in a new helper function for the '' |
<code java> | <code java> | ||
private boolean matchBullet(TokenType op, int column) { | private boolean matchBullet(TokenType op, int column) { | ||
Line 59: | Line 58: | ||
return true; | return true; | ||
} | } | ||
- | | + | |
return false; | return false; | ||
} | } | ||
</ | </ | ||
- | The '' | + | '' |
In this way our jlist parsing logic will only add another expression if it finds another vertically-aligned ''/ | In this way our jlist parsing logic will only add another expression if it finds another vertically-aligned ''/ | ||
- | This method... kind of works? | + | This all... kind of works? |
You can parse a surprisingly broad set of jlists with just our simple logic! | You can parse a surprisingly broad set of jlists with just our simple logic! | ||
Try it out; here are some jlists that can now be parsed: | Try it out; here are some jlists that can now be parsed: | ||
- | < | + | < |
op == | op == | ||
/\ 1 | /\ 1 | ||
/\ 2 | /\ 2 | ||
</ | </ | ||
- | < | + | < |
op == | op == | ||
\/ 1 | \/ 1 | ||
\/ 2 | \/ 2 | ||
</ | </ | ||
- | < | + | < |
op == | op == | ||
/\ 1 | /\ 1 | ||
Line 93: | Line 92: | ||
<code java [highlight_lines_extra=" | <code java [highlight_lines_extra=" | ||
private static final Operator[] operators = new Operator[] { | private static final Operator[] operators = new Operator[] { | ||
- | new Operator(PREFIX, | + | new Operator(Fix.PREFIX, |
- | new Operator(PREFIX, | + | new Operator(Fix.PREFIX, |
- | new Operator(PREFIX, | + | new Operator(Fix.PREFIX, |
- | new Operator(INFIX, | + | new Operator(Fix.INFIX, |
- | new Operator(INFIX, | + | new Operator(Fix.INFIX, |
- | new Operator(INFIX, | + | new Operator(Fix.INFIX, |
</ | </ | ||
Just like that, our jlist parsing code no longer works. | Just like that, our jlist parsing code no longer works. | ||
- | The '' | + | '' |
So, a jlist like: | So, a jlist like: | ||
- | < | + | < |
op == | op == | ||
/\ 1 | /\ 1 | ||
Line 112: | Line 111: | ||
is parsed as a jlist with a single conjunct, the expression '' | is parsed as a jlist with a single conjunct, the expression '' | ||
This is awful! | This is awful! | ||
- | How can this be fixed? | + | How can we fix it? |
====== Beyond Context-Free ====== | ====== Beyond Context-Free ====== | ||
- | The answer to our parsing problem is deceptively simple: before | + | The answer to our parsing problem is deceptively simple: before |
+ | If the token instead starts to the left or equal to the current jlist's column, the match fails and the token is not consumed - even if the token is what is expected by the caller to '' | ||
+ | If we aren't currently parsing | ||
For readers who have taken a computer science class in formal languages, alarm bells should be going off - changing the parse behavior depending on the current context is a //big// change! | For readers who have taken a computer science class in formal languages, alarm bells should be going off - changing the parse behavior depending on the current context is a //big// change! | ||
In theoretical terms, our parser is currently // | In theoretical terms, our parser is currently // | ||
Line 126: | Line 128: | ||
Although grammar notation [[https:// | Although grammar notation [[https:// | ||
Thus formal TLA⁺ grammars exclude jlists and use BNF to define the non-jlist parts of the language, then use plain language to describe how jlists work. | Thus formal TLA⁺ grammars exclude jlists and use BNF to define the non-jlist parts of the language, then use plain language to describe how jlists work. | ||
- | Second - returning to our parser implementation here - it means that our '' | + | Second - returning to our parser implementation here - it means that our '' |
Ultimately the shape of this state is a stack of nested jlists. | Ultimately the shape of this state is a stack of nested jlists. | ||
- | Each entry in the stack records the type of jlist (conjunction or disjunction) and the column of its vertical alignment. | + | Each entry in the stack records the column |
- | When we start parsing a new jlist, we push its details | + | When we start parsing a new jlist, we push its column |
- | When we finish parsing a jlist, we pop it from the stack. | + | When we finish parsing a jlist, we pop from the stack. |
- | The top of this stack is the " | + | The top of this stack is the " |
- | Some of the logic here gets a bit finicky, so let' | + | We'll be using Java's '' |
- | Create a new file called | + | It only needs to hold the jlist column. |
- | <code java> | + | Define a new class variable near the top of the '' |
+ | <code java [highlight_lines_extra=" | ||
+ | private final List< | ||
+ | private int current = 0; | ||
+ | private final boolean replMode; | ||
+ | private final ArrayDeque< | ||
+ | </ | ||
+ | |||
+ | Import '' | ||
+ | <code java [highlight_lines_extra=" | ||
package tla; | package tla; | ||
+ | import java.util.List; | ||
+ | import java.util.ArrayList; | ||
import java.util.ArrayDeque; | import java.util.ArrayDeque; | ||
- | import java.util.Deque; | ||
import static tla.TokenType.*; | import static tla.TokenType.*; | ||
- | class JListContext | + | class Parser |
- | Deque< | + | |
- | } | + | |
</ | </ | ||
- | We're going to use the handy Java 17 [[https://openjdk.org/jeps/ | + | Now augment our jlist parsing logic in '' |
- | <code java [highlight_lines_extra=" | + | <code java [highlight_lines_extra=" |
- | class JListContext { | + | if (match(AND, OR)) { |
- | private | + | Token op = previous(); |
- | | + | jlists.push(op.column); |
- | | + | List< |
+ | do { | ||
+ | juncts.add(expression()); | ||
+ | } while (matchBullet(op.type, | ||
+ | jlists.pop(); | ||
+ | return new Expr.Variadic(op, | ||
+ | } | ||
+ | </code> | ||
+ | |||
+ | Then add this critical line to the '' | ||
+ | <code java [highlight_lines_extra=" | ||
+ | private | ||
+ | | ||
+ | | ||
+ | return peek().type == type; | ||
} | } | ||
+ | </ | ||
- | private record JListInfo(JListType type, int column) { } | + | This works! |
+ | We can parse jlists again! | ||
+ | It's a bit tricky to figure out how this changes things in practice, so let's take a close look at our infix operator parsing code in '' | ||
+ | <code java> | ||
+ | Expr expr = operatorExpression(prec + 1); | ||
+ | while ((op = matchOp(Fix.INFIX, prec)) != null) { | ||
+ | Token operator = previous(); | ||
+ | Expr right = operatorExpression(op.highPrec + 1); | ||
+ | expr = new Expr.Binary(expr, | ||
+ | if (!op.assoc) return expr; | ||
+ | | ||
+ | </ | ||
+ | Consider what happens when trying to parse this TLA⁺ snippet: | ||
+ | <code haskell> | ||
+ | op == | ||
+ | /\ 1 | ||
+ | /\ 2 | ||
+ | </ | ||
+ | The parser will: | ||
+ | - look for an expression following '' | ||
+ | - find ''/ | ||
+ | - find '' | ||
+ | - call '' | ||
+ | - in '' | ||
+ | - never enter the infix op parsing loop and return '' | ||
+ | - in the jlist loop, call '' | ||
- | private final Deque<JListInfo> stack = new ArrayDeque< | + | If the line we added in '' |
- | } | + | But we pre-empted it! |
+ | So now ''/ | ||
+ | Note that our '' | ||
+ | |||
+ | There is one other benefit we've unlocked. | ||
+ | It isn't enough to parse valid jlists, we must also reject invalid ones! | ||
+ | The TLA⁺ language specification requires that the parser reject attempts to defeat vertical alignment encapsulation by abusing delimiters like '' | ||
+ | What this means is that inputs like the following should fail to parse: | ||
+ | <code haskell> | ||
+ | op == | ||
+ | /\ 1 | ||
+ | /\ (2 | ||
+ | ) | ||
+ | /\ 3 | ||
</ | </ | ||
+ | Indeed, our parser will detect an error here. | ||
+ | The parentheses parsing logic will call '' | ||
+ | This gives rise to a parse error. | ||
+ | |||
+ | ====== Error Recovery ====== | ||
+ | |||
+ | Talk of parsing errors nicely segues us onto the topic of error recovery. | ||
+ | Recall that on error, we call '' | ||
+ | Jlists complicate this a bit! | ||
+ | What happens if an error occurs while parsing a jlist and we enter '' | ||
+ | Well, nonsensical things happen. | ||
+ | To fix this we just wipe out our jlist stack at the top of '' | ||
+ | <code java [highlight_lines_extra=" | ||
+ | private void synchronize() { | ||
+ | jlists.clear(); | ||
+ | advance(); | ||
+ | |||
+ | while (!isAtEnd()) { | ||
+ | </ | ||
+ | |||
+ | Done. | ||
+ | You've successfully parsed vertically-aligned conjunction & disjunction lists in TLA⁺! | ||
+ | This puts you in rarified air. | ||
+ | Only a handful of people in the world possess this knowledge, and now you are among them. | ||
+ | |||
+ | ====== Evaluation ====== | ||
+ | |||
+ | Now that we can parse jlists, let's interpret them. | ||
+ | Similar to the logical operators covered in the book, conjunction lists short-circuit. | ||
+ | That means conjuncts are evaluated in order and, if a single conjunct is false, evaluation immediately stops and returns false. | ||
+ | In an odd contrast, disjunction lists do //not// short-circuit; | ||
+ | |||
+ | Add conjunction list evaluation logic to '' | ||
+ | <code java [highlight_lines_extra=" | ||
+ | return set; | ||
+ | case AND: | ||
+ | for (Expr conjunct : expr.parameters) { | ||
+ | Object result = evaluate(conjunct); | ||
+ | checkBooleanOperand(expr.operator, | ||
+ | if (!(boolean)result) return false; | ||
+ | } | ||
+ | return true; | ||
+ | default: | ||
+ | // Unreachable. | ||
+ | return null | ||
+ | </ | ||
+ | |||
+ | Then add the disjunction list logic right below that: | ||
+ | <code java [highlight_lines_extra=" | ||
+ | return true; | ||
+ | case OR: | ||
+ | boolean result = false; | ||
+ | for (Expr disjunct : expr.parameters) { | ||
+ | Object junctResult = evaluate(disjunct); | ||
+ | checkBooleanOperand(expr.operator, | ||
+ | result |= (Boolean)junctResult; | ||
+ | } | ||
+ | return result; | ||
+ | default: | ||
+ | // Unreachable. | ||
+ | return null; | ||
+ | </ | ||
+ | |||
+ | Remember we also parsed the ''/ | ||
+ | This is a bit annoying! | ||
+ | They should function in the exact same way as their respective jlists, but now we have to copy a duplicate of our evaluation logic to '' | ||
+ | So, we'll perform a trick which also has its parallel in chapter 9 of the book: [[https:// | ||
+ | We will flatten infix ''/ | ||
+ | |||
+ | In the '' | ||
+ | <code java [highlight_lines_extra=" | ||
+ | Expr expr = operatorExpression(prec + 1); | ||
+ | while ((op = matchOp(Fix.INFIX, | ||
+ | Token operator = previous(); | ||
+ | Expr right = operatorExpression(op.highPrec + 1); | ||
+ | expr = flattenInfix(expr, | ||
+ | if (!op.assoc) return expr; | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | Define '' | ||
+ | <code java> | ||
+ | private Expr flattenInfix(Expr left, Token op, Expr right) { | ||
+ | if (op.type == AND) { | ||
+ | | ||
+ | } else if (op.type == OR) { | ||
+ | | ||
+ | } else { | ||
+ | return new Expr.Binary(left, | ||
+ | } | ||
+ | } | ||
+ | </ | ||
+ | The helper returns a regular binary expression except when the operator is '' | ||
+ | <code java [highlight_lines_extra=" | ||
+ | private Expr flattenInfix(Expr left, Token op, Expr right) { | ||
+ | if (op.type == AND) { | ||
+ | List< | ||
+ | conjuncts.add(left); | ||
+ | conjuncts.add(right); | ||
+ | return new Expr.Variadic(op, | ||
+ | } else if (op.type == OR) { | ||
+ | List< | ||
+ | disjuncts.add(left); | ||
+ | disjuncts.add(right); | ||
+ | return new Expr.Variadic(op, | ||
+ | } else { | ||
+ | return new Expr.Binary(left, | ||
+ | } | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | ====== Further Desugaring ====== | ||
+ | |||
+ | This is all right, but it could be even better! | ||
+ | An expression like '' | ||
+ | <code haskell> | ||
+ | /\ /\ /\ 1 | ||
+ | /\ 2 | ||
+ | /\ 3 | ||
+ | /\ 4 | ||
+ | </ | ||
+ | Which is quite a lot of nesting! | ||
+ | It would be nice if it were instead translated to a single flat jlist with four conjuncts. | ||
+ | This rewrite is safe because conjunction & disjunction are associative. | ||
+ | So, define a new '' | ||
+ | Here's what it looks like: | ||
+ | <code java> | ||
+ | private Expr flattenJLists(Token op, List< | ||
+ | List< | ||
+ | for (Expr junct : juncts) { | ||
+ | Expr.Variadic vjunct; | ||
+ | if ((vjunct = asVariadicOp(op, | ||
+ | flattened.addAll(vjunct.parameters); | ||
+ | } else { | ||
+ | flattened.add(junct); | ||
+ | } | ||
+ | } | ||
+ | |||
+ | return new Expr.Variadic(op, | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | This uses Java's conditional-assign syntax along with the '' | ||
+ | <code java> | ||
+ | private Expr.Variadic asVariadicOp(Token op, Expr expr) { | ||
+ | if (expr instanceof Expr.Variadic) { | ||
+ | Expr.Variadic vExpr = (Expr.Variadic)expr; | ||
+ | if (vExpr.operator.type == op.type) return vExpr; | ||
+ | } | ||
+ | |||
+ | return null; | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | Replace the calls to '' | ||
+ | <code java [highlight_lines_extra=" | ||
+ | private Expr flattenInfix(Expr left, Token op, Expr right) { | ||
+ | if (op.type == AND) { | ||
+ | List< | ||
+ | conjuncts.add(left); | ||
+ | conjuncts.add(right); | ||
+ | return flattenJLists(op, | ||
+ | } else if (op.type == OR) { | ||
+ | List< | ||
+ | disjuncts.add(left); | ||
+ | disjuncts.add(right); | ||
+ | return flattenJLists(op, | ||
+ | } else { | ||
+ | return new Expr.Binary(left, | ||
+ | } | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | Do the same in the jlist parsing loop in '' | ||
+ | <code java [highlight_lines_extra=" | ||
+ | if (match(AND, OR)) { | ||
+ | Token op = previous(); | ||
+ | jlists.push(op.column); | ||
+ | List< | ||
+ | do { | ||
+ | juncts.add(expression()); | ||
+ | } while (matchBullet(op.type, | ||
+ | jlists.pop(); | ||
+ | return flattenJLists(op, | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | Infix ''/ | ||
+ | So concludes this lynchpin tutorial on conjunction & disjunction lists. | ||
+ | Good job making it to the end! | ||
+ | If your code got out of sync during this tutorial, you can find its expected state [[https:// | ||
+ | Continue on the [[creating: | ||
+ | |||
+ | ====== Challenges ====== | ||
+ | |||
+ | Here are a number of optional challenges, in roughly increasing levels of difficulty. | ||
- | Challenges: | + | - Python uses indentation to determine statement membership in a code block. Does this make Python context-sensitive? |
- | - If you are familiar with the [[https://en.wikipedia.org/wiki/ | + | - It's tempting to summarize this chapter as us solving the jlist parsing problem by making jlists have higher precedence than infix operators, but that is not quite the case. Think carefully about what precedence means; is there a difference between what might be called //lexical// precedence - where one interpretation of a token takes higher precedence than another - and parsing precedence? Did we make use of that here? What are some ways that parsers can deal with the problem of the same token having multiple possible meanings? |
+ | - Jlists are not the only context-sensitive language construct in TLA⁺. Nested proof steps are another. Take some time to read the [[https:// | ||
+ | - Write unit tests for your jlist parsing code. Think of every weird jlist case you can. Look at [[https:// | ||
+ | - If you are familiar with the [[https://youtu.be/IycOPFmEQk8|pumping lemma for context-free languages]], | ||
+ | - Most courses in formal languages skip directly from context-free grammars to Turing machines, but this misses a number of automata of intermediate power. See whether it is possible to use [[https:// | ||
[[creating: | [[creating: | ||