Both sides previous revision Previous revision Next revision | Previous revision |
creating:scanning [2025/04/14 12:20] – Fixed link to repo ahelwer | creating:scanning [2025/04/27 17:35] (current) – Fixed link to repo ahelwer |
---|
Isn't it amazing how quickly this is coming together? | Isn't it amazing how quickly this is coming together? |
The simplicity of the required code is one of the great wonders of language implementation. | The simplicity of the required code is one of the great wonders of language implementation. |
If you got lost somewhere along the way, you can find a snapshot of the code on this page [[https://github.com/tlaplus-community/tlaplus-creator/tree/main/2-scanning|here]]. | If you got lost somewhere along the way, you can find a snapshot of the code on this page [[https://github.com/tlaplus/devkit/tree/main/2-scanning|here]]. |
Next we learn how to collect our tokens into a parse tree! | Next we learn how to collect our tokens into a parse tree! |
Continue the tutorial at [[creating:expressions|Parsing TLA⁺ Expressions]]. | Continue the tutorial at [[creating:expressions|Parsing Constant TLA⁺ Expressions]]. |
| |
====== Challenges ====== | ====== Challenges ====== |
- Add Unicode support. Instead of using the ''char'' type, Java represents Unicode codepoints as an ''int''. So, you'll be iterating over an array of ''int''s instead of the characters of a string. Character literals can still be directly compared against ''int''s; our ''case'' statement should be nearly unchanged. Look at the Java 8 [[https://docs.oracle.com/en/java/javase/23/docs/api/java.base/java/lang/CharSequence.html#codePoints()|string.codePoints() method]]. Add support for Unicode symbol variants like ''≜'', ''∈'', ''∧'', ''∨'', ''∃'', and ''∀''. Our code reads files assuming the UTF-8 encoding so that's already sorted. | - Add Unicode support. Instead of using the ''char'' type, Java represents Unicode codepoints as an ''int''. So, you'll be iterating over an array of ''int''s instead of the characters of a string. Character literals can still be directly compared against ''int''s; our ''case'' statement should be nearly unchanged. Look at the Java 8 [[https://docs.oracle.com/en/java/javase/23/docs/api/java.base/java/lang/CharSequence.html#codePoints()|string.codePoints() method]]. Add support for Unicode symbol variants like ''≜'', ''∈'', ''∧'', ''∨'', ''∃'', and ''∀''. Our code reads files assuming the UTF-8 encoding so that's already sorted. |
| |
| [[creating:start|< Previous Page]] | [[creating:start#table_of_contents|Table of Contents]] | [[creating:expressions|Next Page >]] |
| |