JSFeeds: angularclass.com - Making a Mini-Lisp: Introduction to Transpilers

Thursday, 24 December, 2015 UTC

Making a Mini-Lisp: Introduction to Transpilers

Summary

Table of Contents

Introduction
Constants
Tokenizer
Parser
Code Generator

Introduction

This article is an introduction to Transpilers which is the process of translating one programming language into another. This process may be called compiling, transpiling, or interpreting depending on the nature of the input and output languages. Usually compiling is used to describe the process of translating a higher level language, like Java, into machine language, and traspiling is used to describe the process of translating one higher level language into another higher level language. In this article I will show you how to translate, or “transpile”, one high level language into another high level language. Translating one language into another seems like a daunting task at first, but when you break it up into smaller, more manageable tasks, you will realize it is not so hard.

Our goal in this article is to understand how to make a program that will transpile a made-up language called “Minilisp” into JavaScript. Minilisp is just a very simple version of Lisp, which is a high level programming language famous for its use of parenthesis.

Here’s an example of Minilisp. The code below will give us the answer to the simple math proble 3 + 4 * 2:

(+ 3 (* 4 2))

Our transpiler will take this code and translate it into a JavaScript program that does the exact same thing:

add(3, multiply(4, 2));

We will also go a little bit beyond this to allow our Minilisp user to compose basic functions into higher level functions. An example of this is combining addition and division to make function that returns the average of two numbers. That would look like this in Minilisp:

(defn average
  (x, y) 
  (/ 2 (+ x y) ) 
)

The code above uses the “defn” function call which takes three parameters. The first parameter is the name of the function, which in this case is “average”. The second parameter to defn defines what kind of parameters the function we are defining will accept, and the third parameter is the code that will be executed when we call our new “average” function. If we read over this code in English it would be something like “I am invoking defn to define a new function called average that takes two parameters x and y and returns the value obtained by adding x and y and then diving the result by two.”

Now that we are familiar with Minilisp, lets start working on our Minilisp transpiler. The first step is to take the process of transpiling Minilisp in Javascript and break it into smaller chunks each of which performs just one step in the overall process. Below is a diagram that shows you how we are going to break up this task. We start with a block of text that contains our Minilisp code and then that code gets passed through a series of intermediate steps before it comes out as functional JavaScript.

( text ) -> ( tokenizer ) -> ( parser ) -> ( code generator ) -> ( JavaScript )

Lets take a look at the three steps in the middle, because those are what we will focus on:

1) Tokenizer
The tokenizer takes the Minilisp text as input and breaks it into an array of smaller chunks of text called tokens. In the context of computer language translation, token is just a word for a small unit of text that has some kind of meaning in the language. It’s really a pretty flexible concept, but for us a token is any unit of Minilisp that has some kind of significance like the function call “defn”, the number “3” or the open parens “(“. Our tokenizer will break up the original string of Minilisp code into an array of these tokens.
2) Parser
Instead of translating our Minilisp code directly into JavaScript, we first use a parser to generate an abstract representation of the Minilisp program called an abstract syntax tree or AST for short. While JavaScript and Minilisp are languages used to write out programs concretely, an AST is a way to represent a program abstractly. The parser takes the tokens from the tokenizer and creates the AST, which can then be easily converted into JavaScript.
3) Code Generator
The code generator takes the AST input and generates the JavaScript program that the abstract syntax stree describes. To be specific, it takes a tree data structure as input and outputs text. This sounds complicated but it’s really not. A simple example of an AST is a function with two arguments that is represented as a tree with one root node and two child nodes. Lets say the root node in this example is a function (+) and the child nodes are numbers (3) and (4). So if our code generator got this tree as input we would start at the root and write out our add function “add(” and then iterate over the children adding each one as arguments to the add function “add(3, 4”. Once we have finished iterating over the children we know this function is done and we add the closing parenthasis “add(3, 4)”, and that’s how you convert a simple AST into code!

Constants

In our introduction we talked about the three steps we will take to transpile Minilisp into Javascript: tokenizing, parsing, and code generating. Before we do tha there is some simple simple set up we need to do that will make our job easier later on.

Constants are extremely important for applications in general but especially for parsers. One of the fundamental things that languages need is a vocabulary of symbols and rules for their combinations. So we must write a small library of constants that provides a standard way to refer to those symbols throughout the rest of the code. Constants is also a place where I personally like to put helper functions that determine whether or not a token is part of a certain class of characters, but some people might want to break those functions out into a separate file, it’s up to you. Lets first take a look at my constants file for Minilisp, and then we can examine what we have:

var _ = require('lodash');

module.exports = {
    openParens: '(',
    closeParens: ')',
    plus: '+',
    minus: '-',
    times: '*',
    divide: '/',
    quote: '"',
    coreFunctions: ['print', 'defn']
    functionMap: {
        '+': 'core.add',
        '-': 'core.subtract',
        '*': 'core.multiply',
        '/': 'core.divide',
        'print': 'core.print'
    },
    isALetter: _.partial(_.contains,'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'),
    isANumber: _.partial(_.contains,'0123456789'),
    isAToken: _.partial(_.contains,'()*+-/')
}

This file exports a single object that contains references to a variety of values. The key value pairs at the top, starting with openParens and ending with quote, are simple constants that refer to the basic characters that make up Minilisp. Next comes “core functions” which is an array of all the core function names in Minilisp. The next property is “functionMap” which will be used by the code generator to convert Minilisp function operators into JavaScript functions.

If you are unfamiliar with currying the three functions at the end might look a little funny. Passing contains and a string into _.partial will return a function that takes one argument and checks whether or not that argument is contained in the string passed in with contains. It’s a convenient way to create functions that check if a character is part of a certain class of characters. Here is an example for clarity:

var isANumber = _.partial(_.contains, '1234567890');

The call to _.partial above will return a function that looks something this:

function(arg){
 return _.contains('1234567890', arg);
}

So that every time we invoke isANumber with an argument that number will be passed into _.contains along with the string we passed into the original _.partial function:

isANumber('1') //=&gt; true because _.contains('0123456789', '1') is true
isANumber('b') //=&gt; false because _.contains('0123456789', 'b') is not true

This constants file is a nice repository of all the information we need about what kinds of characters make up Minilisp. When we look here we see all the basic characters as well as functional operators and core function names, and we have a standard way of referencing them in the rest of our code.

Tokenizer

Now that we’ve defined the basic units that make up Minilisp we can tackle the first part of our transpiler: the tokenizer. The tokenizer takes a string of Minilisp code and turns it into an array of token objects. But first we have to do a little bit of set up so that our tokenizer code will be clean and readable.

Each part of our program has one or more helper classes that take responsibility for small tasks and allow us to keep our main functions clean and focused. Before we make a tokenizer we will make a helper class called “TokenStream” that will manage the raw text input. This class will help us advance through the text and tell us when we have reached the end. Take a look at the code for that class below and note how it allows us to write cleaner code in the main tokenizer module. Questions to ask yourself in this section are: would I use a helper class at all? If so, would I put more or less logic in the helper class? Instead of a main module and a helper class, is there a way to make a totally modular tokenizer?

The TokenStream class is initialized with the text that we want to convert into tokens. It maintains a kind of counter called “index” that we can increment to move one character at a time through the text. It also has a boolean property called “done” which is set to true in the advance method if we have reached the end of the text.

var TokenSteam = function(text){
    this.text = text; // input text
    this.index = 0;   // our current place in the stream 
    this.done = false;// whether or not we've reached the end
}

//this method advances the index which moves us to the next character in the input text
TokenSteam.prototype.advance = function(){
    this.index++;
    if(this.index === this.text.length){
        this.done = true;
    }
}

//this method returns the character at the current index of our input text
TokenSteam.prototype.currentToken = function(){
    return this.text[this.index];
}

// returns the done property
TokenSteam.prototype.isDone = function(){
    return this.done;
}

// allows us to peak ahead at the next token
TokenSteam.prototype.nextToken = function(){
    return this.text[this.index+1];
}

Now we can mode on to the main module. The tokenizer module simply checks each character of the text stream, adds a token of the appropriate type to a results array based on the current character, and then moves on to the next token until it reaches the end of the stream. This code is fairly straightforward so I encourage you to read through it line by line using the comments to guide your understanding.

// the tokenizer is a function that takes a string of Minilisp text as input
function tokenizer(text){
    // first we create a results array to store the tokens
    var result = [];
    // then we create an instance of our TokenStream helper class
    var tokenStream = new TokenSteam(text);

    // when TokenStream reaches the end of the text its done property
    // will be switched to true, so we create a while loop that advances 
    // to the next character as long as the isDone method returns false
    while(!tokenStream.isDone()){

        // if isDone is true, that means we have a character
        // at the current index, so we use the currentToken()
        // method which will return that character to us
        var token = tokenStream.currentToken();

        // if the current character is one of the minilisp 
        // core symbol we add a token of type "operator" to our results
        if(constants.isAToken(token)){
            result.push({ type: 'operator', token: token });
        }
        // otherwise if the characters is a letter
        else if(constants.isALetter(token)){
            // we continue pulling letters off the stream
            // to build a whole word
            while(constants.isALetter(tokenStream.nextToken())){
                tokenStream.advance();
                token += tokenStream.currentToken();
            }
            // then push a token with type "keyword" onto the results
            result.push({ type: 'keyword', token: token });
        }
        // just like a keyword, if we have a number token
        // we want to take the next token until we get the full number
        else if(constants.isANumber(token)){
            // while the next character is a number
            while(constants.isANumber(tokenStream.nextToken())){
                // advance to it and add it to our current token
                tokenStream.advance();
                token += tokenStream.currentToken();
            }
            // finally, push a token of type "number" onto the result
            result.push({type: 'number', token: token });
        }

        else if(token === constants.quote){
            // if we encounter a quote take letters from the stream until
            // we reach a non-letter token (which should be an endquote)
            while(constants.isALetter(constants.nextToken())){
                tokenStream.advance();
                token += tokenStream.currentToken(); 
            }
            // then we take the endquote off the stream
            tokenStream.advance();
            token += tokenStream.currentToken();
            // and push a "string" type token onto the results
            result.push({ type: 'string', token: token })
        }
        // advance to the next index in the text stream
        tokenStream.advance();
    }
    // return the token array
    return result;
}

If we feed this text into our tokenizer: “(+ 1 3)(func (+ 2 3))”
we get this output:

[ { type: 'operator', token: '(' },
  { type: 'operator', token: '+' },
  { type: 'number', token: '1' },
  { type: 'number', token: '3' },
  { type: 'operator', token: ')' },
  { type: 'operator', token: '(' },
  { type: 'keyword', token: 'func' },
  { type: 'operator', token: '(' },
  { type: 'operator', token: '+' },
  { type: 'number', token: '2' },
  { type: 'number', token: '3' },
  { type: 'operator', token: ')' },
  { type: 'operator', token: ')' } ]

Notice the specificity of the “type” property of the token object. This is the property that the parser will use to determine how to deal with this token. This property could be more or less specific depending on how you want to design your parser. Some tokenizers could leave this property out entirely and just return an array of values. On the other end of the spectrum you could have really specific token types.

Parser

The next step is to write a simple parser that will turn our tokens into an abstract syntax tree. Parsing is a complex topic with many interesting strategies to explore, but here we just need a simple recursive parser to demonstrate the prinicples at work. We discussed syntax trees a little bit in the introduction, but here is a simple definition that will work for Minilisp: a syntax tree is a representation of a program in tree form such that a function is a node and each parameter is a child node of the function node. Because functions can take other functions as arguments, it’s possible that a child node is another function that itself child nodes for its arguments.

If that is unclear, here are a couple of simple diagrams that show what a Minilisp statement looks like in tree form:

(+ 3 1)     
        +  &lt;- function node
           / \
          3   1  &lt;- raw value node

(* 3 (+ 4 2))
            *
           / \
          3   +  &lt;- this addition node is an argument of the multiplication node
             / \
            4   2

Parsers are fairly complex, so we want to have a few simple classes to help us out just like our TokenStream helped us write a nice clean tokenizer. The first class we’ll need is a simple tree data structure. This is the class that we will use to build syntax trees like the ones described above. Mine looks like this:

var Tree = function(){
    this.data = {
        type: null,
        value: null
    }
    this.children = []
}

If we converted the first syntax tree example above into a JavaScript tree like this one, it would look something like this:

{
  data: { type: 'function', value: '+' },
  children: [
    { data: { type: 'number', value: '3' },
    { data: { type: 'number', value: '1' }
  ]
}

I also wrote a couple of getters and setters and a method to insert a child tree. Those methods are very simple but I will include them here anyway for your reference:

Tree.prototype.setType = function(val){
    this.data.type = val;
}
Tree.prototype.setValue = function(val){
    this.data.value = val;
}

Tree.prototype.get = function(attr){
    return this.data[attr];
}

Tree.prototype.insert = function(tree){
    this.children.push(tree);
}

The next class that we need is a little bit more complicated. As I go through my tokens and build an AST, I want to be able to keep track of where I am in the current tree. I also want to be able to store ASTs as I make then because mose programs are actually made up of many small trees. To do this I will use a class called AstResults that stores the results of the parser and has a pointer to my position in the syntax tree at any given point in the process of parsing.

var AstResults = function(){
    this.roots = [];
    this.history = [];
    this.pointer = null;
}

As you can see this class has a property called “roots” which is where we store our ASTs, an array called history which I will explain in a moment, and a property called pointer that points to the current node of the current AST.

Every time we create a new node in our AST we will push that node (which is a just javascript object) onto our history array. The result of this is that once we’ve create a few nodes and we are deep in our tree the history array will contain a reference to each node we have created in the order that we created them. That means that the root node of our AST is at history[0], the last node we added to our tree is at history[history.length-1], and if we want to move back in time to the node we were at before the current one we can go to history[history.length-2].

To add a new node onto our AST we will use the AstResults method called “newTree”. Lets take a look at that method now:

AstResults.prototype.newTree = function(tree){
    if(this.pointer === null){
        // if the pointer is null that means we are starting
        // from scratch and have a new tree so we need to
        // add this node onto our "roots" array
        this.roots.push(tree);

        // we also add the root node to our history array and
        // set the AstResults pointer to this node
        this.history.push(tree);
        this.pointer = tree;
    } else {
        // if the pointer is not null then we know we are 
        // adding a new node as a child of the current one

        // the pointer points to the current node so we insert
        // the tree into that node, which puts this tree in the children
        // of the current node
        this.pointer.insert(tree);

        // then we push the tree onto our history array and reset the
        // pointer to point to this node, which is also a child
        // of the previous node
        this.history.push(tree);
        this.pointer = tree;
    }
}

Now that we have one function to build our tree and maintain history we just need a couple more simple helpers that allow us to move back up our tree:

// this function returns the parent of the current node
// or null if we are already at the top of the tree
AstResults.prototype.previous = function(){
    if(!this.history.length){
        return null;
    } else {
        return this.history[this.history.length-2];
    }
}

// this method moves us back in time, which really just means
// moving us back up the syntax tree
AstResults.prototype.back = function(){
    this.history.pop();
    if(!this.history.length){
        this.pointer = null;
    } else {
        this.pointer = this.history[this.history.length-1];
    }
}

Now that we have a simple tree data structure and a class for managing an array of trees we can start on our parser. I will go over the parts of the parser one by one. First we want to look at the entry point to our parser. This is the function that takes the tokens as input and initializes the parsing.

function Parser(tokens){
    var ast = new AstResult();

    var parserMap = {
        'operator': processOperators,
        'keyword':  processKeywords,
        'number':   processValue,
        'string':   processValue
    }
    _.each(tokens, function(token){
        var parsingMethod = parserMap[token.type];
        parsingMethod(token, ast);
    })
    return ast;
}

First we make an instance of our AstResults helper class. Then we make a map of token types to parser functions. This allows us to break up the work of parsing into a set of specialized functions. Finally we use an each to iterate over the tokens. For each token we use our map together with the token type to find the appropriate parsing method and pass the token and the AstResults helper instance into the parsing method.

Now we know that to understand our parser we really just need to understand each of the parser methods. First lets take a look at the processOperators function. If you forgot what “operator” means in the context of Minilisp, go up to the constants section and look at the isAToken function.

function processOperators(token, ast){
    switch(token.value){
        case constants.openParens:
            var tree = new Tree();
            ast.newTree(tree);
            break;
        case constants.closeParens:
            ast.back();
            break;
        case constants.plus:
        case constants.minus:
        case constants.times:
        case constants.divide:
            ast.pointer.setType('function');
            ast.pointer.setValue(token.value);
            break;
    }
}

This function uses a switch block to perform an operation on the AST based on the value of the token. Because there is a relatively small set of operators this function is easy to tackle. The first two cases are the most important. In Lisp syntax, all functions are wrapped in parenthases: an open parenthasis signifies a function call and a close parenthasis signifies the end of a function statement. That means that if we encounter an open parens token we insert a new tree into our syntax tree, and if we encounter a close parens we exit the current tree and jump back up to our parent.

The rest of the operators are just functions. Because each function operator is the second token in a function invocation we know that a function operator must have followed an open parens, and that therefore AstResults is currently pointing to a new tree made specifically for housing data about this function invocation. So when we reach a function operator we just set the current tree type to “function” and set the value to whatever function operator the token contains.

So we know that when we encounter an open parens we create a new tree:

{ data: { type: null, value: null}, children: [] }

And when we get to a function operator we set the data of that tree:

{ data: { type: "function", value: "+" }, children: [] }

But what happens when we get to one of the function arguments like a number or a string? If you look at the parserMap in the main function you will see that strings and numbers will be process with processValue:

function processValue(token, ast){
    var tree = new Tree();
    tree.setType('value');
    tree.setValue(token.value);
    ast.pointer.insert(tree);
}

As you can see, the processValue function creates a new tree and directly inserts it as a child of the node currently pointed to by AstResults. Because raw values are always terminal nodes in a syntax tree we don’t need to use AstResult.newTree which would move our pointer down to new child node. After we encounter our first raw value our AST looks something like this:

{   
    data: { type: "function", value: "+"}, 
    children: [ { data: { type: "value" value: "3"}, children: [] } ]
}

What would happen if we had another function as an argument instead of a raw value? In that case the newTree method on Ast Results would be called in processOperators, it would insert a new tree into the tree we are currently working on and the pointer would move down to point to that tree. The parser would continue to set the function data and arguments of the inner function until we get to the close parens token and then our ast pointer would jump back up to the parent tree ready to add more arguments to our top level function.

Now for the tricky part! Remember how we used a category called “keyword” for things that are not core syntax or raw values? We are going to use that to give our Minilisp the ability to user defined functions. This is how it will work:

Minilisp has a function “defn” that takes three arguments:
1) the name of the user defined function
2) the arguments of the user defined function in parenthases
3) the expression to be evaluated when the function is called

Such that the following function definition

(defn average (x y) (/ (+ x y) 2) )

can be used like this

(average 10 20)

and returns the average of the two arguments.

What you’ll notice is that there are a few things in the “defn” statement, like the x and y, that aren’t operators or values and as a result will be labelled as keywords by the tokenizer. Also, when we call the “average” function the function name does not have quotes around it so it will be classed as a keywork by the tokenizer. As a result the processKeywords function must be able to deal with all these cases. One of the things we will need to handle these cases is a list of all the non-operator functions (like print and defn) that we can add to whenever we make a new function definition. For that we use this line:

var FUNC_NAMES = constants.coreFunctions.slice();

We slice from the constants value which copies the array so that we don’t end up mutating our constants. Now lets take a look at the processKeywords function:

function processKeywords(token, ast){

    if(ast.pointer.get('type') === 'function' &amp;&amp;
       ast.pointer.get('value')=== 'defn'){
       // we're in a defn call, next up is the name of
       // the user defined function:
        var tree = new Tree();
        tree.setType('function_name');
        tree.setValue(token.value);
        FUNC_NAMES.push(token.value)
        ast.pointer.insert(tree);

    } else if(ast.pointer.get('value') === undefined &amp;&amp;
              !_.contains(FUNC_NAMES, token.value)){
        // we're inside parens, the next token is a keyword
        // but it's not a function, we must be the section of a
        // defn that defines the arguments for the custom function
        ast.pointer.setType('arguments'); 
        var tree = new Tree();
        tree.setType('variable');
        tree.setValue(token.value);
        ast.pointer.insert(tree); 

    } else if(_.contains(FUNC_NAMES, token.value)){
        // this keyword is in the FUNC_NAMES array so
        // we set this tree as a function node
        ast.pointer.setType('function');
        ast.pointer.setValue(token.value);

    } else {
        processValue(token, ast);
    }
}

Here is that example defn call again for your reference:

(defn average (x y) (/ (+ x y) 2) )

The first if block checks if we are in a tree with a type of “function” and a value of “defn”. Looking at the example defn call, we can see that the next token is the function name which in this case is the keyword “average”. So we create a new tree with a type of “function_name” and a value “average” and insert it into the children of the current tree. We also push the name “average” into our FUNC_NAMES array so that the next time we encounter this keyword we know to treat it like a regular function.

The next case checks if the value of the current node is undefined and if so checks if the current token is a known function. This part is tricky: there is only one case in Minilisp where there is an open parens that is not followed by a function of some kind, and that is the second parameter of defn. The second parameter of defn is a set of parens that wraps the arguments for the user defined function. Because this is the only case where the AST node has a type but not a value, we can leave the value undefined and add the keywords as child nodes with type “variable”.

The only two possibilites left for keyword tokens are that the keyword is a function, in which case we set the current tree as a function type just like any other function, or that it is a variable in a function definition function body, in which case we process it in the same way we process other raw values. Notice how there is no special treatment for the function body section of our function definition. This is because the function body is just a normal Minilisp function and as such it will be inserted into the children of the defn node the same way that it would be inserted into the children of any other function call.

These three parsing functions are sufficient to parse our entire Minilisp. As you can see, the trick to writing a parser is to understand the unique conditions that pertain to each meaningful unit of the language. Writing a parser requires logical analysis and an understanding of recursion. Once you have a working prototype of a parser, you should write thorough tests for each function. This will save you lots of bug hunting when you go to modify it later.

Code Generator

With our parser we can take an array of tokens and convert it into an array of syntax trees. The final step is to make a code generator that will take the syntax tree and convert it into working JavaScript. To build our code generator we only need one helper class:

var Controller = function(){
    this.result = '';
}

This helper will store the output string that we will build from the syntax tree. It may seem silly to make such an apparently useless class, but it actually has two advantages: it provides a slightly nicer syntax than using raw strings and also gives you a place to store variables or add helper methods if your code generator gets more complicated.

The code generator for our minilisp is actually pretty simple. First of all, a program is not a single expression represented as a tree, but actually many expressions represented as an array of trees. So the entry point into our code generator is a function that takes an array as input and sends each tree in the array into a process that recursively generates the output code.

var generator = function(roots){
    // roots is an array of syntax trees

    // first we make a new instance of our helper
    var controller = new Controller();
    _.each(roots, function(node){
        // each syntax tree gets passed in interpretNode
        // along with the "controller" to store the output
        interpretNode(node, controller);
    })

    // in the end we return the contents of controller
    return controller.result;
}

The first function that each tree enters is interpretNode, a simple function that decides what to do with the tree based on its type. Our language really only has three node types: functions, function definitions, and values. If the tree is a function it goes into the writeFunction function, if it is a function definition it goes into the writeCustomFunction function, and if it is a value we simply add the value to the result string of our current controller.

function interpretNode(node, controller){
    var type = node.get('type');
    var value = node.get('value');
    if(type === 'function'){
        if(value === 'defn'){
            writeCustomFunction(node, controller);
        } else {
            writeFunction(node, controller);
        }
    } else {
        controller.result += node.get('value');
    }
}

Lets take a look at writeFunction first:

function writeFunction(ast, controller){
    // first we get the value of the ast, which is the
    // name of the function
    var value = ast.get('value');
    // we check to see if this function is in our functionMap
    var functionName = constants.functionMap[value];

    if(functionName === undefined){
        // if not then we know it is a user defined function
        // so we use its name directly
        functionName = value;
    }
    // we add the function name and an open parens to our result
    controller.result += functionName + '(';
    _.each(ast.children, function(argument, idx){
        // then we recursively call interpretNode on the children of the function
        interpretNode(argument, controller);
        // and add commas to separate each argument
        if(ast.children.length &gt; 1 &amp;&amp; idx &lt; ast.children.length-1){
            controller.result += ', ';
        }
    })
    // finally we add the close parents to finish the function invocation
    controller.result += ')';
}

This function recieves a tree which is known to be a function tree, so first we extract the value of the tree, which is the name of the function. To get the name of the function in JavaScript we simply use the map of Minilisp function names to JavaScript function names that we put in our constants. Then we check to see if this map returns an undefined value. This is because the value of the tree will only return a function from our map if the function value is one of the Minilisp core functions. If this is a user defined function, it wont exist in our map and we’ll just use its name directly.

Next we use the function name to write the beginning of a function invocation to our output string. So far we have something like this:

add(

and now of course we need to add the arguments. The arguments are also trees, so to render their values we pass them one by one into the interpretNode function where they start the process over again. We also add a comma after we process a child node to make sure that our arguments are comma separated. After that we write the closing parenthasis and we are done!

Lets think through how this works. If we have a tree that looks like this:

First the addition function will enter writeFunction and output this:

add(

Next we iterate over the children, the first child is just a raw value so we add its value directly to the output string and then add a comma:

add(3,

The next child is a multiplication function so it will be sent to into writeFunction and we will get this:

add(3, multiply(

Inside the writeFunction call for the multiplication node we iterate over the children, both of which are raw values, and then add the closing parens:

add(3, multiply(2, 4)

Now the process returns to the previous writeFunction call. The tree in that function has no more children so we add the closing parens and this tree is complete:

add(3, multiply(2, 4))

So functions and values are pretty easy, but what about those user defined functions? Before you go on you might want to stop and think about how you would do this. To help you out, heres a simple reminder of the structure of a function definition tree:

Tree: { [ function name ], [ arguments ], [ function body ] }

And here is a verbose version of a defn call AST:

{
    type: 'function',
    value: 'defn',
    children: [
        {
            type: 'function_name',
            value: 'average'
        },
        {
            type: 'arguments',
            children: [ { value: 'x' }, {value: 'y'}]
        },
        {
            type: 'function',
            value: '/',
            children: [...]
        }
    ]
}

Now lets take a look at the writeCustomFunction code:

function writeCustomFunction(node, controller){

    // extract the children of this AST for clarify and convenience
    var functionName = node.children[0];
    var arguments = node.children[1];
    var functionBody = node.children[2];

    // use the function name to start a function declaration
    controller.result += 'var '+functionName.get('value')+' = function(';

    // here we simple add the arguments to the function declaration
    // which gives us something like: var myFunction = function(X, Y, Z
    var numArgs = arguments.children.length;
    _.each(arguments.children, function(argNode, idx){
        controller.result += argNode.get('value');
        if(numArgs &gt; 1 &amp;&amp; idx &lt; numArgs - 1){
            controller.result += ', ';
        }
    })

    // then we close off the arguments and start the function body 
    controller.result += '){\n';

    // we want to take the code generated by the function body portion of this
    // AST and put it directly into the function we are writing, so we create
    // a new controller class to hold the output
    var customController = new Controller();
    // then we put the function body into the interpretNode function which
    // will output JavaScript code
    interpretNode(functionBody, customController);

    // finally we return the result of running the JavaScript produced
    // by the function body being passed into interpretNode
    controller.result += 'return '+customController.result+ ';';
    controller.result += '\n}\n';
}

Using the above code and the AST for the average function definition, we can see that the first thing we would write is:

var average = function(

Next we will iterate over the “arguments” children and write their values into the function declaration to complete the first line:

var average = function(x, y){

All we have left is the third argument of the function definition, which is a syntax tree that represents the body of this function. Our ultimate goal is to create a function that returns the result of that syntax tree, so we create a new results string with our helper class and pass that along with the function body syntax tree into the interpretNode function. Doing so will recursively interpret the syntax tree of the function body and place the results in our newly create result string. Then we append that string onto a “return” statement and close off the function. The end result looks something like this:

var average = function(x, y){
return divide(add(x, y), 2);
}

As you can see the output does exactly what we want it to: it defines a function called “average” that takes two arguments and returns the result of adding those arguments and dividing them by two.

Now we have accomplished our goal. We have taken a Minilisp input string and broken it into meaningful chunks called “tokens”. Then, we used the tokens to build an abstract syntax tree which is a way of representing a computer program that is not specific to one programming language. Finally, we took the syntax tree and used it to generate some JavaScript code that does the computation described in the AST.

Now our transpiler is complete! I hope this tutorial was helpful and that you continue to explore source code translation.

https://github.com/AngularClass/minilisp

The post Making a Mini-Lisp: Introduction to Transpilers appeared first on AngularClass.

... more @ angularclass.com

angularclass.com