-
Notifications
You must be signed in to change notification settings - Fork 41
Literal syntax for strings and characters
Most Forth implementations don't have string or character literals. The reason for this lies in how Forth words are parsed. Simplicity is one of the key virtues of Forth. The Forth syntax is extremely simple, building a parser from scratch (even in assembly) is trivial. Forth source code consists of white space separated tokens. These tokens represent either words of numbers. The outer interpreter grabs the next token and looks it up in the dictionary. If it finds it will interpret it as a word (and that word will be either compiled or executed), otherwise it will try to convert it to a number (and that number will be either pushed onto the data stack or compiled as a literal).
Here is how a typical Forth text interpeter looks like.
\ fig-Forth outer interpeter
: interpret ( -- )
begin -find
if state @ <
if cfa , else cfa execute then
else here number dpl @ 1+
if drop [compile] literal
else [compile] dliteral
then
then ?stack
again ;
Most Forth systesm choose to use parsing words to add support for strings or characters, instead of extending the outer interpreter with new cases and making it complicated.
Nothing prevents you to define a new word and name it ".
: " <parse the string from the input until you find a ">
Here is how you can use it to define a Hello World! string. Note that the leading space is not part of the string. Remember that Forth parses white space separated tokens and the " is one of them.
" Hello world!"
In punyforth I chose to use a different parsing word for this because I didn't like the leading space at the beginning of each strings.
: str: <find the first non white space character, treat as the separator, parse the string until the end separator>
str: "Hello World!"
There is no leading space, but unfortunately you have to use this relatively long parsing word before each strings.
I was not satisfied with either of these solutions so I decided to add real string literals. I extended the outer interpreter with 2 hooks and now works like this.
- Find the next token
- If it's in the dictionary compile or execute
- Otherwise try to convert it to a number
- If it's not a number call hook1 when we're interpreting or hook2 if we're compiling
Now the code that recognizes a string can be hooked into the outer interperter. In fact, the number conversion can be extracted out and implemented as a hook.
This is a proof of concept implementation.
: _ ( addr len -- ? )
\ recognize char
2dup 2 = swap c@ char: $ = and if drop ['], 1+ c@ , exit then
\ recognize str
over c@ char: " = if nip [str >r str, r> str] exit then
eundef ;
' _ eundefc ! \ hook it into the compiler
: _ ( addr len -- ? )
\ recognize char
2dup 2 = swap c@ char: $ = and if drop 1+ c@ exit then
\ recognize str
over c@ char: " = if nip dp >r str, 0 c, r> exit then
eundef ;
' _ eundefi ! \ hook it into the text interpreter
This allows us to define strings naturally like this:
: demo "Hello World!" type ;
Or characters like this:
$A emit \ prints out a A character
Attila Magyar