creating:scanning

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
creating:scanning [2025/04/06 19:40] – Change link to next tutorial page ahelwercreating:scanning [2025/04/27 17:35] (current) – Fixed link to repo ahelwer
Line 1: Line 1:
-====== Scanning TLA⁺ Tokens ======+======= Scanning TLA⁺ Tokens =======
  
 This page corresponds to [[https://craftinginterpreters.com/scanning.html|chapter 4]] of //Crafting Interpreters// by Robert Nystrom, henceforth referred to as "the book". This page corresponds to [[https://craftinginterpreters.com/scanning.html|chapter 4]] of //Crafting Interpreters// by Robert Nystrom, henceforth referred to as "the book".
Line 5: Line 5:
 For each section in the chapter, first read the section in the book and then read the corresponding section tutorial on this page to see how to adapt the concepts to TLA⁺. For each section in the chapter, first read the section in the book and then read the corresponding section tutorial on this page to see how to adapt the concepts to TLA⁺.
  
-==== Section 4.1: The Interpreter Framework ====+====== Section 4.1: The Interpreter Framework ======
  
 Almost everything in [[https://craftinginterpreters.com/scanning.html#the-interpreter-framework|section 4.1]] can be left unchanged from what's given, although of course it makes sense to change some of the names from "Lox" to "TlaPlus" or similar. Almost everything in [[https://craftinginterpreters.com/scanning.html#the-interpreter-framework|section 4.1]] can be left unchanged from what's given, although of course it makes sense to change some of the names from "Lox" to "TlaPlus" or similar.
Line 12: Line 12:
  
 <code java [enable_line_numbers="true",highlight_lines_extra="1,6,11,27"]> <code java [enable_line_numbers="true",highlight_lines_extra="1,6,11,27"]>
-package com.craftinginterpreters.tla;+package tla;
  
 import java.io.BufferedReader; import java.io.BufferedReader;
Line 80: Line 80:
 </code> </code>
  
-==== Section 4.2: Lexemes and Tokens ====+====== Section 4.2: Lexemes and Tokens ======
  
 The ''TokenType'' class in [[https://craftinginterpreters.com/scanning.html#lexemes-and-tokens|section 4.2]] is our first major departure from the book. The ''TokenType'' class in [[https://craftinginterpreters.com/scanning.html#lexemes-and-tokens|section 4.2]] is our first major departure from the book.
Line 87: Line 87:
  
 <code java [enable_line_numbers="true"]> <code java [enable_line_numbers="true"]>
-package com.craftinginterpreters.tla;+package tla;
  
 enum TokenType { enum TokenType {
Line 93: Line 93:
   LEFT_PAREN, RIGHT_PAREN, LEFT_BRACE, RIGHT_BRACE,   LEFT_PAREN, RIGHT_PAREN, LEFT_BRACE, RIGHT_BRACE,
   LEFT_BRACKET, RIGHT_BRACKET, COMMA, COLON,   LEFT_BRACKET, RIGHT_BRACKET, COMMA, COLON,
-  MINUS, PLUS, LESS_THAN, NEGATION, PRIME,+  MINUS, PLUS, LESS_THAN, NOT, PRIME,
  
   // Short fixed-length tokens.   // Short fixed-length tokens.
Line 99: Line 99:
  
   // Literals.   // Literals.
-  IDENTIFIER, NAT_NUMBER, TRUE, FALSE,+  IDENTIFIER, NUMBER, TRUE, FALSE,
  
   // Keywords.   // Keywords.
Line 121: Line 121:
  
 <code java [enable_line_numbers="true",highlight_lines_extra="8,10,15"]> <code java [enable_line_numbers="true",highlight_lines_extra="8,10,15"]>
-package com.craftinginterpreters.tla;+package tla;
  
 class Token { class Token {
Line 144: Line 144:
 </code> </code>
  
-==== Section 4.4: The Scanner Class ====+====== Section 4.4: The Scanner Class ======
  
 Nothing in section 4.3 requires modification, so we can move on to the very important ''Scanner'' class in [[https://craftinginterpreters.com/scanning.html#the-scanner-class|section 4.4]]. Nothing in section 4.3 requires modification, so we can move on to the very important ''Scanner'' class in [[https://craftinginterpreters.com/scanning.html#the-scanner-class|section 4.4]].
Line 150: Line 150:
  
 <code java [enable_line_numbers="true",highlight_lines_extra="1,8,16,29"]> <code java [enable_line_numbers="true",highlight_lines_extra="1,8,16,29"]>
-package com.craftinginterpreters.tla;+package tla;
  
 import java.util.ArrayList; import java.util.ArrayList;
Line 157: Line 157:
 import java.util.Map; import java.util.Map;
  
-import static com.craftinginterpreters.tla.TokenType.*;+import static tla.TokenType.*;
  
 class Scanner { class Scanner {
Line 188: Line 188:
 </code> </code>
  
-==== Section 4.5: Recognizing Lexemes ====+====== Section 4.5: Recognizing Lexemes ======
  
 [[https://craftinginterpreters.com/scanning.html#recognizing-lexemes|Section 4.5]] finally gets around to actually handling text input! [[https://craftinginterpreters.com/scanning.html#recognizing-lexemes|Section 4.5]] finally gets around to actually handling text input!
Line 209: Line 209:
       case '+': addToken(PLUS); break;       case '+': addToken(PLUS); break;
       case '<': addToken(LESS_THAN); break;       case '<': addToken(LESS_THAN); break;
-      case '~': addToken(NEGATION); break;+      case '~': addToken(NOT); break;
       case '\'': addToken(PRIME); break;       case '\'': addToken(PRIME); break;
       default:       default:
Line 285: Line 285:
 </code> </code>
  
-==== Section 4.6: Longer Lexemes ====+====== Section 4.6: Longer Lexemes ======
  
 In [[https://craftinginterpreters.com/scanning.html#longer-lexemes|section 4.6]] we learn how to handle lexemes of longer or unlimited length. In [[https://craftinginterpreters.com/scanning.html#longer-lexemes|section 4.6]] we learn how to handle lexemes of longer or unlimited length.
Line 364: Line 364:
   private void number() {   private void number() {
     while (isDigit(peek())) advance();     while (isDigit(peek())) advance();
-    addToken(NAT_NUMBER,+    addToken(NUMBER,
         Integer.parseInt(source.substring(start, current)));         Integer.parseInt(source.substring(start, current)));
   }   }
 </code> </code>
  
-==== Section 4.7: Reserved Words and Identifiers ====+====== Section 4.7: Reserved Words and Identifiers ======
  
 In [[https://craftinginterpreters.com/scanning.html#reserved-words-and-identifiers|section 4.7]] we learn to lex general identifiers; start by modifying the ''default'' case of the ''scanToken()'' method: In [[https://craftinginterpreters.com/scanning.html#reserved-words-and-identifiers|section 4.7]] we learn to lex general identifiers; start by modifying the ''default'' case of the ''scanToken()'' method:
Line 473: Line 473:
     symbols.put("\\forall",   FOR_ALL);     symbols.put("\\forall",   FOR_ALL);
     symbols.put("\\in",       IN);     symbols.put("\\in",       IN);
-    symbols.put("\\lnot",     NEGATION); +    symbols.put("\\lnot",     NOT); 
-    symbols.put("\\neg",      NEGATION);+    symbols.put("\\neg",      NOT);
     symbols.put("\\lor",      OR);     symbols.put("\\lor",      OR);
   }   }
Line 508: Line 508:
 Isn't it amazing how quickly this is coming together? Isn't it amazing how quickly this is coming together?
 The simplicity of the required code is one of the great wonders of language implementation. The simplicity of the required code is one of the great wonders of language implementation.
-If you got lost somewhere along the way, you can find a snapshot of the code on this page [[https://github.com/tlaplus-community/tlaplus-creator/tree/main/scanning|here]].+If you got lost somewhere along the way, you can find a snapshot of the code on this page [[https://github.com/tlaplus/devkit/tree/main/2-scanning|here]].
 Next we learn how to collect our tokens into a parse tree! Next we learn how to collect our tokens into a parse tree!
-Continue the tutorial at [[creating:expressions|Parsing TLA⁺ Expressions]].+Continue the tutorial at [[creating:expressions|Parsing Constant TLA⁺ Expressions]].
  
-===== Challenges =====+====== Challenges ======
  
 Here are some optional challenges to flesh out your TLA⁺ scanner, roughly ranked from simplest to most difficult. Here are some optional challenges to flesh out your TLA⁺ scanner, roughly ranked from simplest to most difficult.
Line 522: Line 522:
   - Similar to nested block comments, add support for extramodular text & nested modules. TLA⁺ files are properly supposed to ignore all text outside of modules, treating it the same as comments. Lexing TLA⁺ tokens should only start after reading ahead and detecting a ''---- MODULE'' sequence. Then, after detecting termination of the module with ''===='', the scanner should revert to ignoring the text. Supporting nested modules complicates this further, since you'll need to keep track of the module nesting level to know when you can start ignoring text again.   - Similar to nested block comments, add support for extramodular text & nested modules. TLA⁺ files are properly supposed to ignore all text outside of modules, treating it the same as comments. Lexing TLA⁺ tokens should only start after reading ahead and detecting a ''---- MODULE'' sequence. Then, after detecting termination of the module with ''===='', the scanner should revert to ignoring the text. Supporting nested modules complicates this further, since you'll need to keep track of the module nesting level to know when you can start ignoring text again.
   - Add Unicode support. Instead of using the ''char'' type, Java represents Unicode codepoints as an ''int''. So, you'll be iterating over an array of ''int''s instead of the characters of a string. Character literals can still be directly compared against ''int''s; our ''case'' statement should be nearly unchanged. Look at the Java 8 [[https://docs.oracle.com/en/java/javase/23/docs/api/java.base/java/lang/CharSequence.html#codePoints()|string.codePoints() method]]. Add support for Unicode symbol variants like ''≜'', ''∈'', ''∧'', ''∨'', ''∃'', and ''∀''. Our code reads files assuming the UTF-8 encoding so that's already sorted.   - Add Unicode support. Instead of using the ''char'' type, Java represents Unicode codepoints as an ''int''. So, you'll be iterating over an array of ''int''s instead of the characters of a string. Character literals can still be directly compared against ''int''s; our ''case'' statement should be nearly unchanged. Look at the Java 8 [[https://docs.oracle.com/en/java/javase/23/docs/api/java.base/java/lang/CharSequence.html#codePoints()|string.codePoints() method]]. Add support for Unicode symbol variants like ''≜'', ''∈'', ''∧'', ''∨'', ''∃'', and ''∀''. Our code reads files assuming the UTF-8 encoding so that's already sorted.
 +
 +[[creating:start|< Previous Page]] | [[creating:start#table_of_contents|Table of Contents]] | [[creating:expressions|Next Page >]]
  
  • creating/scanning.1743968424.txt.gz
  • Last modified: 2025/04/06 19:40
  • by ahelwer