Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
creating:scanning [2025/03/29 14:17] – Moved syntax page to scanning page ahelwer | creating:scanning [2025/03/29 23:50] (current) – Read files assuming the UTF-8 encoding. ahelwer | ||
---|---|---|---|
Line 7: | Line 7: | ||
==== Section 4.1: The Interpreter Framework ==== | ==== Section 4.1: The Interpreter Framework ==== | ||
- | Everything | + | Almost everything |
- | You should thus have followed along and arrived at something similar | + | We do make one small functional modification: |
+ | Here's our main file, '' | ||
- | <code java [enable_line_numbers=" | + | <code java [enable_line_numbers=" |
package com.craftinginterpreters.tla; | package com.craftinginterpreters.tla; | ||
Line 16: | Line 17: | ||
import java.io.IOException; | import java.io.IOException; | ||
import java.io.InputStreamReader; | import java.io.InputStreamReader; | ||
- | import java.nio.charset.Charset; | + | import java.nio.charset.StandardCharsets; |
import java.nio.file.Files; | import java.nio.file.Files; | ||
import java.nio.file.Paths; | import java.nio.file.Paths; | ||
Line 37: | Line 38: | ||
private static void runFile(String path) throws IOException { | private static void runFile(String path) throws IOException { | ||
byte[] bytes = Files.readAllBytes(Paths.get(path)); | byte[] bytes = Files.readAllBytes(Paths.get(path)); | ||
- | run(new String(bytes, | + | run(new String(bytes, |
// Indicate an error in the exit code. | // Indicate an error in the exit code. | ||
Line 82: | Line 83: | ||
The '' | The '' | ||
- | Instead of Lox tokens, we use the atomic components of our minimal TLA⁺ language subset | + | Instead of Lox tokens, we use the atomic components of our minimal TLA⁺ language subset. |
Adapting the snippet in [[https:// | Adapting the snippet in [[https:// | ||
Line 145: | Line 146: | ||
==== Section 4.4: The Scanner Class ==== | ==== Section 4.4: The Scanner Class ==== | ||
- | We now move on to the very important '' | + | Nothing in section 4.3 requires modification, |
Our first modification to the code given in the book is to track the column in addition to the line, mirroring our addition to the '' | Our first modification to the code given in the book is to track the column in addition to the line, mirroring our addition to the '' | ||
Line 356: | Line 357: | ||
We're only handling whole natural numbers, no decimals, so our '' | We're only handling whole natural numbers, no decimals, so our '' | ||
- | <code java> | + | <code java [highlight_lines_extra=" |
private boolean isDigit(char c) { | private boolean isDigit(char c) { | ||
return c >= ' | return c >= ' | ||
Line 459: | Line 460: | ||
</ | </ | ||
- | Then define the symbol map and '' | + | Then define the symbol map and '' |
<code java> | <code java> | ||
Line 466: | Line 467: | ||
static { | static { | ||
symbols = new HashMap<> | symbols = new HashMap<> | ||
- | symbols.put(" | + | symbols.put(" |
symbols.put(" | symbols.put(" | ||
symbols.put(" | symbols.put(" | ||
symbols.put(" | symbols.put(" | ||
symbols.put(" | symbols.put(" | ||
+ | symbols.put(" | ||
+ | symbols.put(" | ||
+ | symbols.put(" | ||
+ | symbols.put(" | ||
} | } | ||
Line 503: | Line 508: | ||
Isn't it amazing how quickly this is coming together? | Isn't it amazing how quickly this is coming together? | ||
The simplicity of the required code is one of the great wonders of language implementation. | The simplicity of the required code is one of the great wonders of language implementation. | ||
+ | If you got lost somewhere along the way, you can find a snapshot of the code on this page [[https:// | ||
+ | Next we learn how to collect our tokens into a parse tree! | ||
+ | Continue the tutorial at [[creating: | ||
+ | |||
+ | ===== Challenges ===== | ||
+ | |||
+ | Here are some optional challenges to flesh out your TLA⁺ scanner, roughly ranked from simplest to most difficult. | ||
+ | You should save a copy of your code before attempting these. | ||
+ | - Our error reporting functionality only reports the line on which the error occurs, even though we now also track the column. Modify the error reporting functions to pipe through and print out the column location of the error. | ||
+ | - Implement token recognition for the '' | ||
+ | - Modify '' | ||
+ | - Add support for nestable block comments like '' | ||
+ | - Similar to nested block comments, add support for extramodular text & nested modules. TLA⁺ files are properly supposed to ignore all text outside of modules, treating it the same as comments. Lexing TLA⁺ tokens should only start after reading ahead and detecting a '' | ||
+ | - Add Unicode support. Instead of using the '' | ||