Skip to content

Introduction to Macros

Joshua Grosso edited this page Feb 1, 2018 · 23 revisions

Metaprogramming can be very useful in reducing or removing boilerplate code. However, some languages have less-than-satisfactory solutions while others have none at all. Template Haskell, for example, has several major problems (e.g. it is notoriously unstable between GHC releases, according to the 2017 State of Haskell), and PureScript doesn't have a macro-like system in the first place.

However, by following in Lisp's footsteps and adopting a homoiconic syntax (yay, parentheses!), Axel is equipped with a simple yet powerful metaprogramming system. Unlike in most languages, Axel source code is effectively a textual representation of the corresponding abstract syntax tree (AST) that the transpiler uses to represent the program internally. Macros, then, are just functions that take lists of Axel expressions and return lists of other Axel expressions. The Axel transpiler runs these macros at compile-time, allowing for code that creates code.

NOTE In this tutorial, -- => just means "evaluates to."

For an example of the power that macros provide us with, let's try to implement the LambdaCase GHC language extension as a macro ourselves.

Quoting

First, for a brief introduction to Axel's metaprogramming abilities. Let's use the following Axel expression as our example:

(+ 1 2)

This applies the + function to two arguments, 1 and 2. Internally, Axel represents this not as a function application per se, but rather as an s-expression (the Lisp name for a parenthesized list) containing the symbols +, 1, and 2. The cool part about this, which Axel inherits from Lisp, is that by passing the expression to the quote "special form," we can get that internal representation and manipulate it. For example:

(quote (+ 1 2)) -- => (SExpression [(Symbol "+") (LiteralInt 1) (LiteralInt 2)])

You can quote any expression you'd like:

(quote 1) -- => (LiteralInt 1)
(quote "test") -- => (LiteralString "test")

You can even quote another application of quote:

(quote (quote 1)) -- (SExpression [(Symbol "quote") (LiteralInt 1)])

Once we have the quoted form of an expression, we can manipulate it like any other data structure, like so:

(case (quote (+ 1 2)))
  ((SExpression sexp) (SExpression (reverse sexp))))
-- => (SExpression [(LiteralInt 2) (LiteralInt 1) (Symbol "+")])

Macros

Let's define a function, backwards, to reverse whatever Axel expression we pass in.

(= backwards (-> Expression Expression) -- It will take an Axel expression & return an Axel expression
  ((expression)

Now, an Axel Expression can be one of several cases: LiteralChar Char, LiteralInt Int, LiteralString String, SExpression [Expression], or Symbol String. The only thing we can reverse is an SExpression, so let's return the input unchanged for all other cases.

   (case expression
     ((LiteralChar _) expression)
     ((LiteralInt _) expression)
     ((LiteralString _) expression)
     ((Symbol _) expression)

For the case that we're passed an SExpression, however, let's extract the list of Expressions inside it and reverse it. We'll have to wrap the new list in an SExpression, since our function expects an Expression to be returned.

     ((SExpression expressions) (SExpression (reverse expressions))))))

Now, we can run our backwards function like so:

-- We have to quote `(2 1 +)` because otherwise it would be evaluated (which is not what we want).
-- Rather, we want the internal representation of `(+ 2 1)` to be passed to `backwards` (which requires a value of type `Expression`).
(backwards '(+ 2 1)) -- => (SExpression [(Symbol "+") (LiteralInt 1) (LiteralInt 2)])

When we evaluate backwards, we get back a new Expression. So, we now have the data representing our new code, but how can we actually execute it to get 3?

Macros to the rescue! Currently, backwards is just a plain old Axel function. But, Axel provides an alternative to normal function definition (=) called defmacro, and it lets us define a function that will actually be run at compile-time. Macros take a list of Expressions and return a list of Expressions. Macro calls in an Axel program are replaced by the results they return.

To see this in action, let's turn backwards into a macro. defmacro doesn't require a type signature to be passed, since it automatically uses the type [Expression] -> IO [Expression]. You'll notice that backwards is a partial function, since it wouldn't be able to handle being passed more than one argument. However, this isn't the worry it would usually be if this were a normal function, since errors will be thrown at compile-time and fixed before the program is actually run.

(defmacro backwards
  ((list expression) 

Because backwards will not have any side-effects, we will return whatever our result will be (since we need to return a value of type IO [Expression] rather than just [Expression]).

   (return

Also, we'll wrap the whole thing in a list, because (again) we're looking to return a list of Expressions rather than just a single Expression. From here, our definition is the same as before:

    [(case expression
      ((LiteralChar _) expression)
      ((LiteralInt _) expression)
      ((LiteralString _) expression)
      ((SExpression expressions) (SExpression (reverse expressions)))
      ((Symbol _) expression))])))

Now, if we actually call backwards in our code, the macro call will be "expanded" during compilation. Also, if you remember from when backwards was a function, we had to explicitly quote our input for it not to be evaluated. However, when passing a form to a macro, it will be automatically quoted. This helps macros remain unobtrusive and makes them feel more like first-class syntactic constructs.

To see how our macro will actually be evaluated, let's walk through it step-by-step. If the program we provide to Axel looks like this:

(= main (IO Unit)
  (() (backwards ("Hello, world!" putStrLn))))

backwards will be executed as the program compiles, after which the program will look like this:

(= main (IO Unit)
  (() (putStrLn "Hello, world!")))

And, when the program is executed at the end of this process, the output will be, as expected:

"Hello, world!"

Congratulations, you've created your very first macro!

Quasiquoting

Now, let's try to create a more complicated macro and implement GHC's LambdaCase extension from scratch. We're going to write our macro using what we know now, and then we'll take a look at "quasiquoting" and see how it helps us make our macros more concise. Finally, we'll see it in action by using it to clean up our example.

For some context, take the following Haskell code:

\x -> case x of
        True -> 1
        False -> 2

Notice that x is immediately passed to the case expression and never used again. This happens commonly enough that the LambdaCase extension was designed to help remove the duplication. It allows us to write the above lambda more concisely, like so:

\case
  True -> 1
  False -> 2

So, it's a pretty simple (yet useful) syntactic transformation. If we want Axel to have the same feature, it could look like the following:

-- Before
(\ (x) (case x
         ((True) 1)
         ((False) 2)

-- After
(\case
  ((True) 1)
  ((False) 2)

The rule to implement this feature is pretty simple: If you see (\case <case 1, case 2, ...>), replace that with (\ (<identifier>) (case <identifier> <case 1, case 2, ...>), where <identifier> is some as-of-yet unused identifier.

Let's start our macro definition like usual:

(defmacro \case
  ((cases) undefined))

Now, we'll fill in the basic details, based on the discussion above. Since we want to return a list, we'll use return to lift our result into IO and then wrap our result with the SExpression constructor.

(defmacro \case
  ((cases) (return [(SExpression TODO)])))

We want to fill in TODO with \ (<some identifier> (case <some identifier> <cases>). Since we want to use \ and case as symbols (instead of using them to refer to what they mean at runtime), we'll quote them with the ' operator.

(defmacro \case
  ((cases) (return [(SExpression ['\ (SExpression [TODO])
                                     (SExpression ['case TODO TODO])])])))

Now, we need to somehow get an unused identifier that we can use as the variable in our lambda. Luckily, Axel provides the gensym helper function, which is designed just for this! Because gensym runs in the IO monad (gensym is of type IO Expression because the Symbol it creates must be unique), our code will as well.

(defmacro \case
  ((cases)
   (do
     <- identifier gensym
     (return [(SExpression ['\ (SExpression [identifier])
                               (SExpression ['case identifier TODO])])]))))

And now, for the final TODO: the cases themselves. The cases parameter of our macro is a list of Expressions that we want to splice directly into our output (that is, we're looking for (case <identifier> <case 1> <case 2> <...>) rather than (case <identifier> (<case 1> <case 2> <...>))). So, we can finish our macro's definition like so:

(defmacro \case
  ((cases)
   (do
    <- identifier gensym
    (return [(SExpression ['\ (SExpression [identifier])
                              (SExpression (: 'case (: identifier cases)))])]))))

And, there we have it!

To be honest, that was a bit tedious to write, and it's not the easiest to understand. Manually constructing our SExpressions adds a lot of extra noise to our macro definition. Fortunately, though, there's a better way!

Meet quasiquote, quote's big sibling. It behaves just like quote on normal expressions:

(quasiquote 1) -- => (LiteralInt 1)
(quasiquote (1 2 3)) -- => (SExpression [(LiteralInt 1) (LiteralInt 2) (LiteralInt 3)])

However, quasiquote has some sidekicks that give it superpowers. If you wrap quasiquote's argument with unquote, it will undo the quasiquotation for just that section. For example:

(quasiquote 1) -- => (LiteralInt 1)
(quasiquote (unquote 1)) -- => 1

(quasiquote (1 (LiteralInt 2) 3)) -- => (SExpression [(LiteralInt 1) (SExpression [(Symbol "LiteralInt") (LiteralInt 2)]) (LiteralInt 3)])
(quasiquote (1 (unquote (LiteralInt 2)) 3)) -- => (SExpression [(LiteralInt 1) (LiteralInt 2) (LiteralInt 3)])

There's also unquote-splicing, which not only undoes quasiquotation but also "splices" the result directly into its enclosing s-expression. For example:

(quasiquote (1 (unquote (SExpression [(LiteralInt 2) (LiteralInt 3)])))) -- => (SExpression [(LiteralInt 1) (SExpression [(LiteralInt 2) (LiteralInt 3)])])
(quasiquote (1 (unquote-splicing (SExpression [(LiteralInt 2) (LiteralInt 3)]))) -- => (SExpression [(LiteralInt 1) (LiteralInt 2) (LiteralInt 3)])

In effect, quasiquote acts as a sort of templating system for creating Axel expressions, where you selectively choose what parts should be evaluated at compile time.

One last note: quasiquoting, like quoting, is common enough that it gets special syntax. While quote becomes ', quasiquote becomes \`` (e.g. (quasiquote (1 2 3))is usually written`(1 2 3)). unquotecan be written with a comma (e.g.(unquote 1)is usually written,1), and unquote-splicingcan be written with a comma and then an at-sign (e.g.(unquote-splicing (1 2 3))is usually written,@(1 2 3)`).

Now, we have all the building blocks we need to clean up our macro definition from earlier. Here's the original definition, for reference:

(defmacro \case
  ((cases)
   (do
    <- identifier gensym
    (return [(SExpression ['\ (SExpression [identifier])
                              (SExpression (: 'case (: identifier cases)))])]))))

And here's a version which uses quasiquote, unquote, and unquote-splicing:

(defmacro \case
  ((cases)
   (do
    <- identifier gensym
    (return `(\ (,identifier)
                (case ,identifier
                      ,@cases)))))

Notice how the second representation closely mirrors the final output (much more so than our first try does). Also, try to figure out how we derived the second representation from the first one. While it might be difficult at first, getting the hang of quoting and quasiquoting just takes practice.

TODO Finish tutorial

Clone this wiki locally