Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
creating:scanning [2025/04/06 19:40] – Change link to next tutorial page ahelwer | creating:scanning [2025/04/27 17:35] (current) – Fixed link to repo ahelwer | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Scanning TLA⁺ Tokens ====== | + | ======= Scanning TLA⁺ Tokens |
This page corresponds to [[https:// | This page corresponds to [[https:// | ||
Line 5: | Line 5: | ||
For each section in the chapter, first read the section in the book and then read the corresponding section tutorial on this page to see how to adapt the concepts to TLA⁺. | For each section in the chapter, first read the section in the book and then read the corresponding section tutorial on this page to see how to adapt the concepts to TLA⁺. | ||
- | ==== Section 4.1: The Interpreter Framework ==== | + | ====== Section 4.1: The Interpreter Framework |
Almost everything in [[https:// | Almost everything in [[https:// | ||
Line 12: | Line 12: | ||
<code java [enable_line_numbers=" | <code java [enable_line_numbers=" | ||
- | package | + | package tla; |
import java.io.BufferedReader; | import java.io.BufferedReader; | ||
Line 80: | Line 80: | ||
</ | </ | ||
- | ==== Section 4.2: Lexemes and Tokens ==== | + | ====== Section 4.2: Lexemes and Tokens |
The '' | The '' | ||
Line 87: | Line 87: | ||
<code java [enable_line_numbers=" | <code java [enable_line_numbers=" | ||
- | package | + | package tla; |
enum TokenType { | enum TokenType { | ||
Line 93: | Line 93: | ||
LEFT_PAREN, RIGHT_PAREN, | LEFT_PAREN, RIGHT_PAREN, | ||
LEFT_BRACKET, | LEFT_BRACKET, | ||
- | MINUS, PLUS, LESS_THAN, | + | MINUS, PLUS, LESS_THAN, |
// Short fixed-length tokens. | // Short fixed-length tokens. | ||
Line 99: | Line 99: | ||
// Literals. | // Literals. | ||
- | IDENTIFIER, | + | IDENTIFIER, |
// Keywords. | // Keywords. | ||
Line 121: | Line 121: | ||
<code java [enable_line_numbers=" | <code java [enable_line_numbers=" | ||
- | package | + | package tla; |
class Token { | class Token { | ||
Line 144: | Line 144: | ||
</ | </ | ||
- | ==== Section 4.4: The Scanner Class ==== | + | ====== Section 4.4: The Scanner Class ====== |
Nothing in section 4.3 requires modification, | Nothing in section 4.3 requires modification, | ||
Line 150: | Line 150: | ||
<code java [enable_line_numbers=" | <code java [enable_line_numbers=" | ||
- | package | + | package tla; |
import java.util.ArrayList; | import java.util.ArrayList; | ||
Line 157: | Line 157: | ||
import java.util.Map; | import java.util.Map; | ||
- | import static | + | import static tla.TokenType.*; |
class Scanner { | class Scanner { | ||
Line 188: | Line 188: | ||
</ | </ | ||
- | ==== Section 4.5: Recognizing Lexemes ==== | + | ====== Section 4.5: Recognizing Lexemes |
[[https:// | [[https:// | ||
Line 209: | Line 209: | ||
case ' | case ' | ||
case '<': | case '<': | ||
- | case ' | + | case ' |
case ' | case ' | ||
default: | default: | ||
Line 285: | Line 285: | ||
</ | </ | ||
- | ==== Section 4.6: Longer Lexemes ==== | + | ====== Section 4.6: Longer Lexemes |
In [[https:// | In [[https:// | ||
Line 364: | Line 364: | ||
private void number() { | private void number() { | ||
while (isDigit(peek())) advance(); | while (isDigit(peek())) advance(); | ||
- | addToken(NAT_NUMBER, | + | addToken(NUMBER, |
Integer.parseInt(source.substring(start, | Integer.parseInt(source.substring(start, | ||
} | } | ||
</ | </ | ||
- | ==== Section 4.7: Reserved Words and Identifiers ==== | + | ====== Section 4.7: Reserved Words and Identifiers |
In [[https:// | In [[https:// | ||
Line 473: | Line 473: | ||
symbols.put(" | symbols.put(" | ||
symbols.put(" | symbols.put(" | ||
- | symbols.put(" | + | symbols.put(" |
- | symbols.put(" | + | symbols.put(" |
symbols.put(" | symbols.put(" | ||
} | } | ||
Line 508: | Line 508: | ||
Isn't it amazing how quickly this is coming together? | Isn't it amazing how quickly this is coming together? | ||
The simplicity of the required code is one of the great wonders of language implementation. | The simplicity of the required code is one of the great wonders of language implementation. | ||
- | If you got lost somewhere along the way, you can find a snapshot of the code on this page [[https:// | + | If you got lost somewhere along the way, you can find a snapshot of the code on this page [[https:// |
Next we learn how to collect our tokens into a parse tree! | Next we learn how to collect our tokens into a parse tree! | ||
- | Continue the tutorial at [[creating: | + | Continue the tutorial at [[creating: |
- | ===== Challenges ===== | + | ====== Challenges |
Here are some optional challenges to flesh out your TLA⁺ scanner, roughly ranked from simplest to most difficult. | Here are some optional challenges to flesh out your TLA⁺ scanner, roughly ranked from simplest to most difficult. | ||
Line 522: | Line 522: | ||
- Similar to nested block comments, add support for extramodular text & nested modules. TLA⁺ files are properly supposed to ignore all text outside of modules, treating it the same as comments. Lexing TLA⁺ tokens should only start after reading ahead and detecting a '' | - Similar to nested block comments, add support for extramodular text & nested modules. TLA⁺ files are properly supposed to ignore all text outside of modules, treating it the same as comments. Lexing TLA⁺ tokens should only start after reading ahead and detecting a '' | ||
- Add Unicode support. Instead of using the '' | - Add Unicode support. Instead of using the '' | ||
+ | |||
+ | [[creating: | ||