Create your own TLA⁺ tools

Create your own TLA⁺ tools

Building your own simplified parsing, model-checking, and even proof-checking tools for TLA⁺ is the best way to prepare yourself for serious work on the existing TLA⁺ tools. This process will give you a strong understanding of how & why the tools are written the way they are. Even though your solution will not match their workings line-for-line, you'll learn to recognize the tricky parts and can compare & contrast your own approach with the existing implementation. You'll also be able to rapidly prototype new capabilities in your own codebase, an experience that will guide you as you interface with the complexity of an industrial-strength implementation. While writing your own TLA⁺ tools may seem like a tall order, it is a proven strategy for smoothing out the otherwise-near-vertical learning curve of contributing to tooling for any formal language. Do not think you have to fully flesh out your implementation before you can begin working on the existing TLA⁺ tools! The purpose of this exercise is to acquire specific knowledge; often you will expand your implementation in tandem with your work on the existing tools, each informing the other. Your implementation will grow with you over time.

This tutorial series strives to build a culture of demystification, where knowledge of TLA⁺ inner workings is not arcane but commonplace. The tools do not use complicated algorithms; their workings can all be understood by any software engineer. Language tooling (for any formal language) is also a somewhat blessed area of software development in that specifications of expected program behavior are relatively complete and unambiguous. Thus, with enough living knowledge (and comprehensive test batteries) we can confidently make changes to code long after its original authors have moved on; the challenges of legacy code and technical debt can be made tractable.

Approaching TLA⁺ from the perspective of an implementer will also bring you better knowledge of how to use TLA⁺ itself. Developing an intuition of what TLC is doing when it model-checks your spec will enable you to write specifications that make better use of its capabilities. And, if you reach the edge of those capabilities - well, you're well-placed to extend TLC to handle it!

Overview

This tutorial uses Java. There are two good reasons for this:

At time of writing, the most mature TLA⁺ tools are written in Java, and this tutorial prepares you to work on them.
The high-quality free online textbook Crafting Interpreters by Robert Nystrom uses Java, and this tutorial is a simple TLA⁺-themed wrapper around its concepts. That book states its own reasons for using Java, which you can read.

You will need to install the Java Development Kit appropriate for your system. This tutorial uses the Java 17 records feature for terse definition of dataclasses, so you must install JDK 17 or higher. While all code artifacts can be produced entirely by reading this tutorial, if you get out of sync you can find working implementations in this git repository.

The tutorial has you construct the TLA⁺ tools from the bottom up. First you'll learn to parse its syntax, then how to check its semantics, then how to interpret its expressions, and finally how to check the safety of a simple specification. Future expansions will cover liveness checking and, possibly, proof checking. A minimal subset of the TLA⁺ language is used; users are encouraged to extend their tools to handle additional TLA⁺ syntax as desired.

The Language

TLA⁺ is a large, complicated, and ambiguous language that takes a huge amount of work to parse correctly & completely. This is assisted by a comprehensive test corpus exercising the language's nooks & crannies, but here we will make life easy on ourselves. This tutorial only uses a minimal subset of TLA⁺ syntax, enough to handle the generalized DieHard spec with some inconvenience. While that may seem limiting, this tutorial tries to focus on the difficult & interesting parts of bringing TLA⁺ to life instead of more mundane drudgework like handling all hundred-some user-definable operator symbols. You are encouraged to extend this minimal core as you wish; language tooling is best developed incrementally! Slowly filling in the details of this rough language sketch has a satisfying meditative aspect.

Here is what our minimal language subset includes:

Whole natural numbers like 0, 5, 10, etc.
Boolean values TRUE and FALSE
Finite set literals, like {1, 2, 3, 4, 5}
Parentheses for expression grouping
IF/THEN/ELSE expressions
Single-line comments like \* comment text
The ENABLED, negative (-), and logical negation (~) prefix operators
Some infix operators: \in, =, +, -, .., /\, \/, and <
The variable-priming suffix operator
Quantification over sets with \E and \A
Function construction and application, like [e \in S |-> op(e)] and f[e]
Named operator definitions like op == ... and op(x, y, z) == ...
Declaration of VARIABLES
Vertically-aligned conjunction & disjunction lists

Notably, we do not use the familiar ---- MODULE Name ----/==== encapsulation. Files simply consist of a series of unit-level TLA⁺ definitions. We also do not leave room for a separate model configuration file; all CONSTANT values must instead be hardcoded, and well-known names like Init, Next, TypeOK, and Inv identify important definitions.

As outlined above, you are free to add missing features (or even your own invented features!) as you wish. Each chapter ends with a set of challenges that often involve adding these missing features.

Getting Started

Read part I (the first three chapters) of free online textbook Crafting Interpreters by Robert Nystrom. We will be closely following the material in this book, modifying it to our uses. The first two chapters are a nice introduction and overview of language implementation generally. Chapter three specifies a toy language called Lox, to be used as an object of study for the remainder of the book. Our minimal TLA⁺ subset has some similarity to Lox with regard to expressions involving integers and booleans, but also differences - you can skip the section on closures (unless you want to implement higher-order operator parameters yourself) and the section on classes. What's important is that it's similar enough for all the fundamentals to still apply!

Code snippets are shown often in this tutorial. When a snippet is similar to code given in the book with the exception of a few lines, the differing lines will be highlighted. For example:

  private char advance() {
    column++;
    return source.charAt(current++);
  }

Highlighting is also used to identify new code being added to a large section of code given previously, with line numbers and surrounding lines included for context. For example:

      case '\n':
        column = 0;
        line++;
        break;

Sometimes the code given here is so different from that provided by the book that highlighting is simply ommitted to reduce visual noise. A parenthetical will specify the meaning of highlighted code when ambiguous.

To begin the tutorial, start at page Scanning TLA⁺ Tokens.

Table of Contents | Next Page >