Skip to content

Literal syntax for strings and characters

Attila Magyar edited this page May 27, 2017 · 30 revisions

Most Forth implementations don't have string or character literals. The reason for this lies in how Forth words are parsed. Simplicity is one of the key virtues of Forth. The Forth syntax is extremely simple, building a parser from scratch (even in assembly) is trivial. Forth source code consists of white space separated tokens. These tokens represent either words of numbers. The outer interpreter grabs the next token and looks it up in the dictionary. If it finds it will interpret it as a word (and that word will be either compiled or executed), otherwise it will try to convert it to a number (and that number will be either pushed onto the data stack or compiled as a literal).

Here is how a typical Forth text interpeter looks like.

\ fig-Forth outer interpeter
: interpret ( -- )
  begin -find 
    if state @ <
      if cfa , else cfa execute then
    else here number dpl @ 1+
      if drop [compile] literal
      else    [compile] dliteral
      then
     then ?stack
  again ;

Most Forth systesm choose to use parsing words to add support for strings or characters, instead of extending the outer interpreter with new cases and making it complicated.

Nothing prevents you to define a new word and name it ".

: " <parse the string from the input until you find a ">

Here is how you can use it to define a Hello World! string. Note that the leading space is not part of the string. Remember that Forth parses white space separated tokens and the " is one of them.

" Hello world!"

In punyforth I chose to use a different parsing word for this because I didn't like the leading space at the beginning of each strings.

: str: <find the first non white space character, treat as the separator, parse the string until the end separator>

str: "Hello World!"

There is no leading space, but unfortunately you have to use this relatively long parsing word before each strings.

I was not satisfied with either of these solutions so I decided to add real string literals. I extended the outer interpreter with 2 hooks and now works like this.

  1. Find the next token
  2. If it's in the dictionary compile or execute
  3. Otherwise try to convert it to a number
  4. If it's not a number call hook1 when we're interpreting or hook2 if we're compiling

Now the code that recognizes a string can be hooked into the outer interperter. In fact, the number conversion can be extracted out and implemented as a hook.

This is a proof of concept implementation.

: _ ( addr len -- ? )
    \ recognize char
    2dup 2 = swap c@ char: $ = and if drop ['], 1+ c@ , exit then
    \ recognize str
    over c@ char: " = if nip [str >r str, r> str] exit then
    eundef ;

' _ eundefc ! \ hook it into the compiler
: _ ( addr len -- ? )
    \ recognize char
    2dup 2 = swap c@ char: $ = and if drop 1+ c@ exit then
    \ recognize str
    over c@ char: " = if nip dp >r str, 0 c, r> exit then
    eundef ;

' _ eundefi ! \ hook it into the text interpreter

This allows us to define strings naturally like this:

: demo "Hello World!" type ;

Or characters like this:

$A emit \ prints out a A character
Clone this wiki locally