Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
creating:jlists [2025/04/27 21:44] – Finished how to handle infix op ambiguity ahelwer | creating:jlists [2025/05/21 18:21] (current) – Disjunction does not short-circuit ahelwer | ||
---|---|---|---|
Line 39: | Line 39: | ||
if (match(AND, OR)) { | if (match(AND, OR)) { | ||
Token op = previous(); | Token op = previous(); | ||
- | List< | + | List< |
do { | do { | ||
juncts.add(expression()); | juncts.add(expression()); | ||
Line 92: | Line 92: | ||
<code java [highlight_lines_extra=" | <code java [highlight_lines_extra=" | ||
private static final Operator[] operators = new Operator[] { | private static final Operator[] operators = new Operator[] { | ||
- | new Operator(PREFIX, | + | new Operator(Fix.PREFIX, |
- | new Operator(PREFIX, | + | new Operator(Fix.PREFIX, |
- | new Operator(PREFIX, | + | new Operator(Fix.PREFIX, |
- | new Operator(INFIX, | + | new Operator(Fix.INFIX, |
- | new Operator(INFIX, | + | new Operator(Fix.INFIX, |
- | new Operator(INFIX, | + | new Operator(Fix.INFIX, |
</ | </ | ||
Line 115: | Line 115: | ||
====== Beyond Context-Free ====== | ====== Beyond Context-Free ====== | ||
- | The answer to our parsing problem is deceptively simple: before | + | The answer to our parsing problem is deceptively simple: before |
- | If so, we prioritize treating | + | If the token instead starts to the left or equal to the current jlist' |
+ | If we aren't currently parsing a jlist then '' | ||
For readers who have taken a computer science class in formal languages, alarm bells should be going off - changing the parse behavior depending on the current context is a //big// change! | For readers who have taken a computer science class in formal languages, alarm bells should be going off - changing the parse behavior depending on the current context is a //big// change! | ||
Line 130: | Line 131: | ||
Ultimately the shape of this state is a stack of nested jlists. | Ultimately the shape of this state is a stack of nested jlists. | ||
- | Each entry in the stack records the type of jlist (conjunction or disjunction) and the column of its vertical alignment. | + | Each entry in the stack records the column |
- | When we start parsing a new jlist, we push its details | + | When we start parsing a new jlist, we push its column |
- | When we finish parsing a jlist, we pop it from the stack. | + | When we finish parsing a jlist, we pop from the stack. |
- | The top of this stack is the " | + | The top of this stack is the " |
- | + | ||
- | Some of the logic here gets a bit finicky, so let's encapsulate it within a class. | + | |
- | Create a new file called | + | |
- | <code java> | + | |
- | package tla; | + | |
- | + | ||
- | import java.util.ArrayDeque; | + | |
- | import java.util.Deque; | + | |
- | + | ||
- | class JListContext { | + | |
- | + | ||
- | private record JListInfo(TokenType type, int column) { } | + | |
- | + | ||
- | private final Deque< | + | |
- | } | + | |
- | </ | + | |
- | + | ||
- | Add an instance of '' | + | |
+ | We'll be using Java's '' | ||
+ | It only needs to hold the jlist column. | ||
+ | Define a new class variable near the top of the '' | ||
<code java [highlight_lines_extra=" | <code java [highlight_lines_extra=" | ||
private final List< | private final List< | ||
private int current = 0; | private int current = 0; | ||
private final boolean replMode; | private final boolean replMode; | ||
- | private final JListContext | + | private final ArrayDeque< |
</ | </ | ||
- | In the '' | + | Import |
+ | <code java [highlight_lines_extra=" | ||
+ | package tla; | ||
- | < | + | import |
- | public void startNew(Token op) { | + | import java.util.ArrayList; |
- | stack.push(new JListInfo(op.type, op.column)); | + | import java.util.ArrayDeque; |
- | } | + | |
- | public void terminateCurrent() { | + | import static tla.TokenType.*; |
- | stack.pop(); | + | |
- | } | + | class Parser { |
</ | </ | ||
- | Back in the '' | + | Now augment our jlist parsing logic in '' |
<code java [highlight_lines_extra=" | <code java [highlight_lines_extra=" | ||
if (match(AND, OR)) { | if (match(AND, OR)) { | ||
Token op = previous(); | Token op = previous(); | ||
- | jlists.startNew(op); | + | jlists.push(op.column); |
- | List< | + | List< |
do { | do { | ||
juncts.add(expression()); | juncts.add(expression()); | ||
} while (matchBullet(op.type, | } while (matchBullet(op.type, | ||
- | jlists.terminateCurrent(); | + | jlists.pop(); |
return new Expr.Variadic(op, | return new Expr.Variadic(op, | ||
} | } | ||
</ | </ | ||
- | Now we write a critical | + | Then add this critical |
- | In '' | + | <code java [highlight_lines_extra=" |
- | <code haskell> | + | private |
- | public boolean isNewBullet(Token op) { | + | if (isAtEnd()) return |
- | JListInfo current = this.stack.peekFirst(); | + | if (!jlists.isEmpty() |
- | return current != null | + | return |
- | && current.type == op.type | + | |
- | && current.column == op.column; | + | |
- | } | + | |
- | </ | + | |
- | '' | + | |
- | If we are not currently parsing | + | |
- | Otherwise, it checks the current jlist attributes against the given token. | + | |
- | If the token type & column both match, the method returns true. | + | |
- | + | ||
- | Where should call this powerful method? | + | |
- | There are actually a few possibilities, | + | |
- | <code java [highlight_lines_extra=" | + | |
- | private | + | |
- | if (jlists.isNewBullet(peek())) return | + | |
- | | + | |
- | | + | |
- | if (match(op.token)) return | + | |
- | } | + | |
- | } | + | |
- | + | ||
- | return | + | |
} | } | ||
</ | </ | ||
Line 218: | Line 184: | ||
This works! | This works! | ||
We can parse jlists again! | We can parse jlists again! | ||
- | It's a bit tricky to figure out what we actually did, so let's take a close look at our infix operator parsing code in '' | + | It's a bit tricky to figure out how this changes things in practice, so let's take a close look at our infix operator parsing code in '' |
<code java> | <code java> | ||
Expr expr = operatorExpression(prec + 1); | Expr expr = operatorExpression(prec + 1); | ||
- | while ((op = matchOp(INFIX, | + | while ((op = matchOp(Fix.INFIX, prec)) != null) { |
Token operator = previous(); | Token operator = previous(); | ||
Expr right = operatorExpression(op.highPrec + 1); | Expr right = operatorExpression(op.highPrec + 1); | ||
Line 233: | Line 199: | ||
/\ 1 | /\ 1 | ||
/\ 2 | /\ 2 | ||
- | /\ 3 | ||
</ | </ | ||
The parser will: | The parser will: | ||
Line 240: | Line 205: | ||
- find '' | - find '' | ||
- call '' | - call '' | ||
- | - in '' | + | - in '' |
- | - eventually | + | - never enter the infix op parsing loop and return '' |
- in the jlist loop, call '' | - in the jlist loop, call '' | ||
- | If the call to '' | + | If the line we added in '' |
But we pre-empted it! | But we pre-empted it! | ||
- | So now ''/ | + | So now ''/ |
- | + | Note that our '' | |
- | ====== Termination ====== | + | |
- | We're close, but we're not //quite// there yet. | + | There is one other benefit we've unlocked. |
- | Consider | + | It isn't enough to parse valid jlists, we must also reject invalid ones! |
+ | The TLA⁺ language specification requires that the parser reject attempts to defeat vertical alignment encapsulation by abusing delimiters like '' | ||
+ | What this means is that inputs like the following | ||
<code haskell> | <code haskell> | ||
op == | op == | ||
/\ 1 | /\ 1 | ||
- | /\ \/ 2 | + | /\ (2 |
+ | ) | ||
/\ 3 | /\ 3 | ||
</ | </ | ||
- | Our code parses | + | Indeed, our parser will detect an error here. |
- | < | + | The parentheses parsing logic will call '' |
- | op == | + | This gives rise to a parse error. |
- | | + | |
- | /\ \/ (2 /\ 3) | + | ====== Error Recovery ====== |
+ | |||
+ | Talk of parsing errors nicely segues us onto the topic of error recovery. | ||
+ | Recall that on error, we call '' | ||
+ | Jlists complicate | ||
+ | What happens if an error occurs while parsing a jlist and we enter '' | ||
+ | Well, nonsensical things happen. | ||
+ | To fix this we just wipe out our jlist stack at the top of '' | ||
+ | < | ||
+ | | ||
+ | | ||
+ | advance(); | ||
+ | |||
+ | while (!isAtEnd()) { | ||
</ | </ | ||
- | We thought we had defeated the tyranny of infix op ambiguity, but we were wrong! | ||
- | What happened? | ||
- | Well, our call to '' | ||
- | The enclosing ''/ | ||
- | You might be tempted to fix this by modifying '' | ||
- | Each junct's expression should be entirely contained to the right of its enclosing jlist' | + | Done. |
- | Put another way, if you put a token on the next line to the left of the '' | + | You've successfully parsed vertically-aligned conjunction & disjunction lists in TLA⁺! |
- | That enables you to do neat things like: | + | This puts you in rarified air. |
- | < | + | Only a handful |
- | op == | + | |
- | /\ A | + | ====== Evaluation ====== |
- | /\ B | + | |
- | = C | + | Now that we can parse jlists, let' |
+ | Similar to the logical operators covered in the book, conjunction lists short-circuit. | ||
+ | That means conjuncts are evaluated in order and, if a single conjunct is false, evaluation immediately stops and returns false. | ||
+ | In an odd contrast, disjunction lists do //not// short-circuit; | ||
+ | |||
+ | Add conjunction list evaluation logic to '' | ||
+ | < | ||
+ | | ||
+ | case AND: | ||
+ | for (Expr conjunct : expr.parameters) { | ||
+ | Object result | ||
+ | | ||
+ | if (!(boolean)result) return false; | ||
+ | } | ||
+ | return true; | ||
+ | default: | ||
+ | | ||
+ | | ||
</ | </ | ||
- | which should be parsed | + | |
- | The '' | + | Then add the disjunction list logic right below that: |
- | To check this, we need another helper method | + | <code java [highlight_lines_extra=" |
+ | return true; | ||
+ | case OR: | ||
+ | boolean result = false; | ||
+ | for (Expr disjunct : expr.parameters) { | ||
+ | Object junctResult = evaluate(disjunct); | ||
+ | checkBooleanOperand(expr.operator, | ||
+ | result |= (Boolean)junctResult; | ||
+ | } | ||
+ | return result; | ||
+ | default: | ||
+ | // Unreachable. | ||
+ | return null; | ||
+ | </ | ||
+ | |||
+ | Remember we also parsed | ||
+ | This is a bit annoying! | ||
+ | They should function in the exact same way as their respective jlists, but now we have to copy a duplicate | ||
+ | So, we'll perform a trick which also has its parallel in chapter 9 of the book: [[https:// | ||
+ | We will flatten infix ''/ | ||
+ | |||
+ | In the '' | ||
+ | <code java [highlight_lines_extra=" | ||
+ | Expr expr = operatorExpression(prec + 1); | ||
+ | while ((op = matchOp(Fix.INFIX, | ||
+ | Token operator = previous(); | ||
+ | Expr right = operatorExpression(op.highPrec + 1); | ||
+ | expr = flattenInfix(expr, | ||
+ | if (!op.assoc) return expr; | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | Define '' | ||
<code java> | <code java> | ||
- | | + | |
- | | + | |
- | | + | |
+ | | ||
+ | |||
+ | } else { | ||
+ | return new Expr.Binary(left, | ||
+ | } | ||
} | } | ||
</ | </ | ||
- | + | The helper returns | |
- | We then add a call to '' | + | <code java [highlight_lines_extra=" |
- | <code java [highlight_lines_extra=" | + | private |
- | private | + | if (op.type == AND) { |
- | if (!jlists.isAboveCurrent(peek())) return | + | List< |
- | if (isAtEnd()) return | + | conjuncts.add(left); |
- | return | + | conjuncts.add(right); |
+ | | ||
+ | | ||
+ | List< | ||
+ | disjuncts.add(left); | ||
+ | disjuncts.add(right); | ||
+ | | ||
+ | | ||
+ | | ||
+ | } | ||
} | } | ||
</ | </ | ||
- | What does this do? | + | ====== Further Desugaring ====== |
- | Well, within a given call to '' | + | |
- | If an expression is being parsed within the context of a jlist, and it encounters a token on the next line that is not to the right of the current jlist alignment, control flow is ultimately kicked back up to the jlist loop which then calls '' | + | |
- | Parsing can then resume as expected. | + | |
- | There is one other benefit we've unlocked. | + | This is all right, but it could be even better! |
- | It isn't enough to parse valid jlists, we must also reject invalid ones! | + | An expression |
- | The TLA⁺ language specification requires that the parser reject attempts to defeat vertical alignment encapsulation by abusing delimiters | + | |
- | What this means is that inputs like the following should fail to parse: | + | |
<code haskell> | <code haskell> | ||
- | op == | + | /\ /\ /\ 1 |
- | | + | /\ 2 |
- | /\ (2 | + | |
- | ) | + | /\ 4 |
- | | + | |
</ | </ | ||
- | Indeed, our parser will detect an error here. | + | Which is quite a lot of nesting! |
- | The parentheses parsing logic will call '' | + | It would be nice if it were instead translated to a single flat jlist with four conjuncts. |
- | This gives rise to a parse error. | + | This rewrite is safe because conjunction & disjunction are associative. |
+ | So, define a new '' | ||
+ | Here's what it looks like: | ||
+ | <code java> | ||
+ | private Expr flattenJLists(Token op, List< | ||
+ | List< | ||
+ | for (Expr junct : juncts) { | ||
+ | Expr.Variadic vjunct; | ||
+ | if ((vjunct = asVariadicOp(op, | ||
+ | flattened.addAll(vjunct.parameters); | ||
+ | } else { | ||
+ | flattened.add(junct); | ||
+ | } | ||
+ | } | ||
- | ====== Error Recovery ====== | + | return new Expr.Variadic(op, |
+ | } | ||
+ | </ | ||
- | Talk of parsing errors nicely segues us onto the topic of error recovery. | + | This uses Java's conditional-assign syntax along with the '' |
+ | <code java> | ||
+ | private Expr.Variadic asVariadicOp(Token op, Expr expr) { | ||
+ | if (expr instanceof Expr.Variadic) { | ||
+ | Expr.Variadic vExpr = (Expr.Variadic)expr; | ||
+ | if (vExpr.operator.type == op.type) return vExpr; | ||
+ | } | ||
+ | |||
+ | return null; | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | Replace the calls to '' | ||
+ | <code java [highlight_lines_extra=" | ||
+ | private Expr flattenInfix(Expr left, Token op, Expr right) { | ||
+ | if (op.type == AND) { | ||
+ | List< | ||
+ | conjuncts.add(left); | ||
+ | conjuncts.add(right); | ||
+ | return flattenJLists(op, | ||
+ | } else if (op.type == OR) { | ||
+ | List< | ||
+ | disjuncts.add(left); | ||
+ | disjuncts.add(right); | ||
+ | return flattenJLists(op, | ||
+ | } else { | ||
+ | return new Expr.Binary(left, | ||
+ | } | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | Do the same in the jlist parsing loop in '' | ||
+ | <code java [highlight_lines_extra=" | ||
+ | if (match(AND, OR)) { | ||
+ | Token op = previous(); | ||
+ | jlists.push(op.column); | ||
+ | List< | ||
+ | do { | ||
+ | juncts.add(expression()); | ||
+ | } while (matchBullet(op.type, | ||
+ | jlists.pop(); | ||
+ | return flattenJLists(op, | ||
+ | } | ||
+ | </ | ||
+ | Infix ''/ | ||
+ | So concludes this lynchpin tutorial on conjunction & disjunction lists. | ||
+ | Good job making it to the end! | ||
+ | If your code got out of sync during this tutorial, you can find its expected state [[https:// | ||
+ | Continue on the [[creating: | ||
====== Challenges ====== | ====== Challenges ====== | ||
Line 330: | Line 425: | ||
- It's tempting to summarize this chapter as us solving the jlist parsing problem by making jlists have higher precedence than infix operators, but that is not quite the case. Think carefully about what precedence means; is there a difference between what might be called //lexical// precedence - where one interpretation of a token takes higher precedence than another - and parsing precedence? Did we make use of that here? What are some ways that parsers can deal with the problem of the same token having multiple possible meanings? | - It's tempting to summarize this chapter as us solving the jlist parsing problem by making jlists have higher precedence than infix operators, but that is not quite the case. Think carefully about what precedence means; is there a difference between what might be called //lexical// precedence - where one interpretation of a token takes higher precedence than another - and parsing precedence? Did we make use of that here? What are some ways that parsers can deal with the problem of the same token having multiple possible meanings? | ||
- Jlists are not the only context-sensitive language construct in TLA⁺. Nested proof steps are another. Take some time to read the [[https:// | - Jlists are not the only context-sensitive language construct in TLA⁺. Nested proof steps are another. Take some time to read the [[https:// | ||
+ | - Write unit tests for your jlist parsing code. Think of every weird jlist case you can. Look at [[https:// | ||
- If you are familiar with the [[https:// | - If you are familiar with the [[https:// | ||
- Most courses in formal languages skip directly from context-free grammars to Turing machines, but this misses a number of automata of intermediate power. See whether it is possible to use [[https:// | - Most courses in formal languages skip directly from context-free grammars to Turing machines, but this misses a number of automata of intermediate power. See whether it is possible to use [[https:// |