Model-Checking Safety Properties

This has been a long journey. We learned scanning, then parsing, then how to interpret basic constant expressions. Statements were added next, then we mastered the difficult art of parsing conjunction & disjunction lists. Finally, we learned how to apply operators and move between states. Everything is in place; it is time to write a TLA⁺ model-checker.

We will write a checker for TLA⁺ safety properties. Formally, safety properties are properties that can be disproven by a counterexample state trace of finite length. This is a bit abstract; in concrete terms, safety properties are expressions that evaluate to either true or false in different system states. Ideally, a safety property is true in every system state! Safety properties are often informally described as "nothing bad happens" properties, also called invariants. For example, when developing a multi-core CPU memory caching system, a good safety property would be that no two CPU caches ever disagree about the value stored at a given memory address. However, our systems often surprise us and find their way to corners of the state space where a safety property does not hold. When that happens, the best way to debug the problem is to see a path from some initial state to the state violating the invariant. This path is what is meant by a state trace counterexample of finite length. It isn't enough to just produce a state violating the invariant; we must also show how the system got there! That is the task of this chapter.

Inquisitive readers might wonder what sort of properties require counterexamples of infinite length. Those are what we call liveness properties, or "something good eventually happens" properties. A counterexample to something good eventually happening is a state trace ending in an infinite loop, showing it is possible for the system to run forever without ever satisfying the liveness property. Liveness properties (and how to check them) are a fascinating topic, but beyond the scope of this chapter.

Initializing the Model

Our overall approach to checking safety properties has the following structure:

Parse a TLA⁺ specification, identifying Init, Next, and the invariants.
Find the spec's initial state(s) satisfying Init.
Repeatedly use Next to find all states in the system following from the initial state(s).
Check to ensure the invariants hold in every discovered state.
If a state violates an invariant, construct a state trace leading from some initial state to the violating state.

For this purpose, create a new file ModelChecker.java and pre-emptively import all the datastructures we'll need. Also define a constructor accepting an Interpreter instance and a list of parsed Stmt objects comprising the TLA⁺ spec:

package tla;
 
import java.util.ArrayDeque;
import java.util.ArrayList;
import java.util.Deque;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
 
class ModelChecker {
 
  private final Interpreter interpreter;
 
  ModelChecker(Interpreter interpreter, List<Stmt> spec) {
    this.interpreter = interpreter;
  }
}

The real TLA⁺ tools allow users to identify the init, next and invariant predicates in a model configuration file. We skip all that and simply require our specs to have operator definitions named Init, Next, and a few options for invariants. Iterate through all the spec statements to find them:

  private final Interpreter interpreter;
  private Stmt.OpDef init = null;
  private Stmt.OpDef next = null;
  private List<Stmt.OpDef> invariants = new ArrayList<>();
 
  ModelChecker(Interpreter interpreter, List<Stmt> spec) {
    this.interpreter = interpreter;
    for (Stmt unit : spec) {
      if (unit instanceof Stmt.OpDef) {
        Stmt.OpDef op = (Stmt.OpDef)unit;
        switch (op.name.lexeme) {
          case "Init"   : init = op; break;
          case "Next"   : next = op; break;
          case "Inv"    : invariants.add(op); break;
          case "TypeOK" : invariants.add(op); break;
          case "Safety" : invariants.add(op); break;
        }
      }
    }

We should be nice to the user and alert them if their spec is missing one of these definitions or the definition is not of the proper form. Add a validate() helper to the ModelChecker class:

  private static void validate(Stmt.OpDef op, String name) {
    if (op == null) {
      throw new IllegalArgumentException(
          "Spec requires '" + name + "' operator.");
    }
 
    if (!op.params.isEmpty()) {
      throw new IllegalArgumentException(
          "Spec '" + name + "' operator cannot take parameters.");
    }
  }

Then add calls to validate() at the bottom of our ModelChecker constructor:

    validate(init, "Init");
    validate(next, "Next");
    for (Stmt.OpDef inv : invariants) validate(inv, inv.name.lexeme);
  }

As written it is possible for users to define no invariants at all, which is fine. Step one complete!

Exploring the State Space

Now for the main event - everything has led up to this! Create a new checkSafety() method in the ModelChecker class:

  StateTrace checkSafety() {
 
    return null;
  }

This method indicates success (the invariants holds in every state) by returning null. If the invariants do not hold in every state, the method returns a state trace leading to the failing state. Define the StateTrace record near the top of the ModelChecker class:

class ModelChecker {
 
  record Step(String action, Map<String, Object> state) { }
  record StateTrace(String failingInvariant, List<Step> trace) { }
 
  private final Interpreter interpreter;

Let's talk about how we are going to explore the state space. There are many, many methods of doing this. We'll use breadth-first search (BFS), which explores the state space like a ripple in a pond moving outward from the initial state(s). BFS has the nice property that whenever we first encounter a state, we have also found the shortest path to that state from the initial state(s). It works as follows:

Push all initial states onto a queue of pending states, and add them to a set of visited states.
While the pending state queue is not empty, pop the head state from the queue for examination.
Find all next states reachable in one step from the state under examination.
For each next state, if it has not already been visited then push it onto the pending state queue and add it to the set of visited states.

Not too bad! Only two datastructures, a queue to hold the pending states and a set to record all the states we've visited. Assuming there are a finite number of states, BFS will eventually examine all states then terminate. If there are an infinite number of states - an easy-to-accomplish feat in TLA⁺ - then BFS will never terminate, but that is a user problem.

There is one more thing we need to keep track of so we can reconstruct state traces: each state's predecessor, the state from which we first stepped to it. This combines nicely with the visited set; instead of a set of states, we use a map from states to their predecessors. If a state is in the predecessor map, it must have been visited. The predecessor of all initial states is null.

With all that out of the way, here's how we write the preamble to the main BFS loop in checkSafety():

  StateTrace checkSafety() {
    Deque<Map<String, Object>> pendingStates = new ArrayDeque<>();
    Map<Map<String, Object>, Map<String, Object>> predecessors = new HashMap<>();
    for (Map<String, Object> initialState : interpreter.getNextStates(init.name, init.body)) {
      predecessors.put(initialState, null);
      pendingStates.add(initialState);
    }
 
    while (!pendingStates.isEmpty()) {
 
    }
 
    return null;
  }

Now all the initial states have been pushed onto the pending states queue and marked visited. Here is the rest of the BFS algorithm, a loop that runs in checkSafety() until the pending state queue is empty:

    while (!pendingStates.isEmpty()) {
      Map<String, Object> current = pendingStates.remove();
      interpreter.goToState(current);
 
      for (Map<String, Object> next : interpreter.getNextStates(next.name, next.body)) {
        if (!predecessors.containsKey(next)) {
          predecessors.put(next, current);
          pendingStates.add(next);
        }
      }
    }

This will correctly explore the state space in breadth-first order. But to what end?

Checking the Invariants

Here we ensure the invariants hold in every discovered state. There is a minor design decision to be made: do we check the invariants upon first encountering a state, or later after popping it from the pending state queue? The latter turns out to be simpler because no special logic is necessary for the initial states, so that is what we do; add the highlighted code to checkSafety():

    while (!pendingStates.isEmpty()) {
      Map<String, Object> current = pendingStates.remove();
      interpreter.goToState(current);
      for (Stmt.OpDef invariant : invariants) {
        if (!(boolean)invariant.body.accept(interpreter)) {
          return reconstructStateTrace(predecessors, current, invariant);
        }
      }

If an invariant fails then we return a state trace! How do we construct that trace? Here is how to implement the reconstructStateTrace() helper for the ModelChecker class; first, use the predecessors graph to walk backward from the failing state to an initial state, then reverse the list of states:

  List<Map<String, Object>> reconstructStateTrace(
      Map<Map<String, Object>, Map<String, Object>> predecessors,
      Map<String, Object> state,
      Stmt.OpDef invariant
  ) {
    List<Map<String, Object>> trace = new ArrayList<>();
    Map<String, Object> predecessor = state;
    do {
      trace.add(predecessor);
      predecessor = predecessors.get(predecessor);
    } while (predecessor != null);
    trace =  trace.reversed();
 
  }

Then, convert the list of states into a StateTrace object, augmenting the list of states with information about the failed invariant and the action taken at each step:

    trace = trace.reversed();
 
    List<Step> steps = new ArrayList<>();
    Stmt.OpDef action = init;
    for (Map<String, Object> nextState : trace) {
      steps.add(new Step(action.name.lexeme, nextState));
      action = next;
    }
 
    return new StateTrace(invariant.name.lexeme, steps);
  }

Done! We have successfully implemented a TLA⁺ model-checker for safety properties!

Hooking Up the Front-End

Now we modify TlaPlus.java to run the model-checker when given a TLA⁺ source file. In past chapters we've focused on running TLA⁺ in a REPL where users can evaluate expressions and transition between states. Full TLA⁺ source files cannot be run as programs; all we can do is model-check them.

The simplest place to call the model-checker is at the bottom of the run() method in the TlaPlus class; add this code:

    if (!replMode) {
      ModelChecker mc = new ModelChecker(interpreter, statements);
      ModelChecker.StateTrace trace = mc.checkSafety();
      System.out.println(
          trace == null
          ? "Invariants hold on state space."
          : "Invariants do not hold; trace:\n" + trace
      );
    }
  }

Now we can model-check TLA⁺ specs! What's a good first example? The traditional "hello world" model-checking problem is the water jugs puzzle from the movie Die Hard 3, which you can watch here. I rewrote the Generalized Die Hard puzzle spec in our minimal TLA⁺ language subset; here it is:

Jug == 1 .. 2
Capacity == [
  j \in Jug |->
      IF j = 1 THEN 3
      ELSE IF j = 2 THEN 5
      ELSE 0
]
Goal == 4
 
VARIABLE contents
 
Min(m, n) == IF m < n THEN m ELSE n
 
FillJug(target) ==
  contents' = [
    j \in Jug |->
      IF j = target THEN Capacity[j] ELSE contents[j]
  ]
 
EmptyJug(target) ==
  contents' = [
    j \in Jug |->
      IF j = target THEN 0 ELSE contents[j]
  ]
 
JugToJug(from, to) ==
  /\ ~(from = to)
  /\ contents' = [
      j \in Jug |->
        IF j = from THEN
          contents[j] - Min(contents[from], Capacity[to] - contents[to])
        ELSE IF j = to THEN
          contents[j] + Min(contents[from], Capacity[to] - contents[to])
        ELSE
          contents[j]
    ]
 
TypeOK == \A j \in Jug : contents[j] \in 0 .. Capacity[j]
 
Init == contents = [j \in Jug |-> 0]
 
Next ==
  \E j \in Jug :
    \/ FillJug(j)
    \/ EmptyJug(j)
    \/ \E k \in Jug : JugToJug(j, k)
 
Inv == \A j \in Jug : ~(contents[j] = Goal)

You can vary the problem definition by modifying the definition of Jug and the Capacity map. As-is, it checks the classic Die Hard jugs puzzle with one 3-gallon jug and one 5-gallon jug. We set the invariant to assert that no jug will every contain 4 gallons of water. When run on this spec, our model-checker spits out a state trace showing how to get 4 gallons of water in the 5-gallon jug!

After all the preliminary work, this chapter was surprisingly short. We have now implemented the core of a minimal TLA⁺ model-checker for safety properties. One chapter remains, clarifying how operator parameters interact with the prime operator. It's important to understand, although somewhat optional to implement. Read it here!

Challenges

Here are some optional challenges to flesh out your TLA⁺ model-checker, roughly ranked from simplest to most difficult. You should save a copy of your code before attempting these.

The state trace presentation leaves something to be desired; override the toString() method on the StateTrace and Step records to give a more readable presentation of the state trace.
To save on memory, real finite-state model-checkers like TLC don't store the entire state in the predecessors map; instead, they store 64-bit hashes of each state. It is very important that hash collisions (two separate states hashing to the same value) never occur, or the correctness of the model-checker becomes compromised. Java provides a built-in hashCode() method for each object that returns a 32-bit int hash of its contents. Modify your predecessors map to use this hash code. Is this sufficient? Do you encounter any collisions?
Ignoring the safety issues with using Java's built-in hashCode(), modify the reconstructStateTrace() method to work with a hash-based predecessor map instead of a state-based one. How does it become more difficult?
Currently, our code only specifies Init or Next as the action taken at each step in a state trace. It would be nice if it were more specific and output FillJug(1), EmptyJug(2), or JugToJug(1, 2), for example. Modify the reconstructStateTrace() method to break down the Init and Next actions to make a best effort to identify the sub-action taken. This will involve analyzing existential quantification and disjunction expressions.

< Previous Page | Table of Contents | Next Page >