2012-10-28

Compiling with LLVM

LLVM is, among many other things, a reusable compiler backend implemented as a library. In this post, I'll show how the llvmpy Python bindings can be used to write an optimising cross-compiler in a few lines of Python. As a starting point, let's re-use the old parser and interpreter from the Python Parsing Combinators series. Since the grammar only has operators and numbers, we'll need to extend it a bit - there's not much point in spending time compiling something that isn't going to take any input. Let's add named values like so:
identifier = RegEx("[a-zA-Z_][a-zA-Z0-9_]*").apply(Identifier)
simple = number | paren | identifier
Next, we'll need an implementation for this new node in our Abstract Syntax Tree:
class Identifier(object):
    def __init__(self, s):
        self.name=s
    def eval(self, env):
        return env[self.name]
    def __repr__(self):
        return repr(self.name)
As you can see if you compare this eval to the old ones, we now take a dictionary as argument to eval, and that it maps names to numbers. The old eval implementations should be extended to pass along this environment, but Identifier is the only one that has a use for it.

Interpreting is now done thus:

expr.parse("a+1", 0).next()[0].eval({"a":4})
Now, the topic of this post is compilation, not interpretation. Let's start by importing the llvmpy bindings:
from llvm import *
from llvm.core import *
from llvm.ee import *
In LLVM, you have modules (roughly object files) that contain functions, which take arguments and contain a basic block. Basic blocks contain instructions, which can in turn contain things like basic blocks or call functions. Creating a module is simple:
m = Module.new("m")
The function we want to add to the module should correspond to the expression we're compiling. It should return an int, and its arguments should be the identifiers that are used in the expression. We can find the identifiers by adding a method to the AST nodes that return the union of all identifiers in the sub-tree (it's trivial, so I'll spare you the implementation). Let's sort them ASCIIbetically and use that as our arguments. We're going to need a mapping from identifier name to function arguments, so let's build that up as well.
identifiers = sorted(list(ast.identifiers()))
ty_int = Type.int()
ty_func = Type.function(ty_int, [ty_int] * len(identifiers))
f = m.add_function(ty_func, "f")
args = {}
for i in range(len(identifiers)):
    f.args[i].name = identifiers[i]
    args[identifiers[i]] = f.args[i]
Now, function bodies consist of a basic block (i.e. a series of statements). You can add statements to a basic block using a builder.
bb = f.append_basic_block("entry")
builder = Builder.new(bb)
Now we're ready for the interesting parts. For each AST node type, we add a method for compiling to LLVM opcodes. Since LLVM uses Static Single Assignment, each opcode results in the assignment of a variable, and this variable is then never modified. Every method on the builder that appends an instruction to the basic block returns the resulting value, which makes it natural to do the same for our compile method: we return the value that will hold the results of the computation. As arguments, let's take the builder that we use for appending the instructions, the type to use for our numbers, and the argument mapping we built up. Identifiers are simple: we just return the argument that corresponds to the identifier:
class Identifier(object):
    #...
    def compile(self, builder, ty_int, args):
        return args[self.name]
Numbers are also easy - we just return a constant value.
class Number(object):
    #...
    def compile(self, builder, ty_int, args):
        return Constant.int(ty_int, self.n)
Now, how do we do our two remaining AST node types, addition and multiplication? We can do this by first emitting the code for the left hand side of the operation, remembering the value that will hold the results for that. Then we do the same for the right-hand side, and finally, we emit a multiplication instruction that takes the resulting values for the left hand side and the right hand side as its operands.
class Mul(BinOp):
    #...
    def compile(self, builder, ty_int, args):
        a = self.a.compile(builder, ty_int, args)
        b = self.b.compile(builder, ty_int, args)
        return builder.mul(a, b)
Addition is of course the same, except for using builder.add. We can now let the basic block of our function return the value that will hold the result of evaluating our entire AST by doing
builder.ret(ast.compile(builder, ty_int, args))
We can now generate machine code from this, and run it directly:
ee = ExecutionEngine.new(m)
env={
    "a": GenericValue.int(ty_int, 100),
    "b": GenericValue.int(ty_int, 42)
}
args=[]
for param in identifiers:
    args.append(env[param])
retval = ee.run_function(m.get_function_named("f"), args)
This will build up a function in memory, which we can pass some arguments and call. This is really quite amazing. You call some functions to describe how to do something, and then you get a function back that you can execute directly. This can be used for all sorts of interesting things, like partial specialisation of a function at runtime, to take one example. Let's say you have to do a fully unindexed join on two strings in a database. For each value in the first table, you will need to do a string comparison with every single value in the other table. Since none of the strings we're comparing are known at compile time, there's little the compiler can do except hand us its very best version of a general strcmp. With LLVM, we can do better: for each string in one of the tables, generate a specialised string comparison function that compares a string in the other table to that particular one. Such functions can be optimised quite a bit.

The fun doesn't stop here either. In addition to generating functions in memory, we can serialise them to bitcode files using the to_bitcode method of LLVM modules. That generates a .bc file for which we can generate assembler code for some particular architecture, and then use the platform assembler and linker to generate an executable. To do that, however, we first need a main function. Since the function we generated for our expression above takes ordinary integers, we'll have to generate a main function that will take inputs form somewhere (e.g. the command line), convert that to integers, pass them to the function, and then make the result known to the user. To make things easy, let's just call the libc function atoi on argv[1], argv[2],... and use the result of the function as the exit code of our program.

ty_str = Type.pointer(Type.int(8))
ty_main = Type.function(ty_int, [ty_int, Type.pointer(ty_str)])
main = m.add_function(ty_main, "main")
main_bb = main.append_basic_block("entry")
bmain = Builder.new(main_bb)
argv=main.args[1]
int_args=[]
for i in range(1, len(identifiers)+1):
    # atoi(argv[i])
    s = bmain.load(bmain.gep(argv, [Constant.int(ty_int, i)]))
    int_args.append(bmain.call(atoi, [s]))

    bmain.ret(bmain.call(f, int_args))
Here, ty_str is a character pointer, and main is a function from int and pointer to character pointer to int. In its basic block, we add one call to atoi for each identifier that our expression uses. The argument to atoi would in C be written as argv[i], but here, we do it by first getting an element pointer ("gep" is short-hand for Get Element Pointer) to the string we want, and then dereference that. Think of the "load" as the * and the gep as the + in *(argv+i)

We then add a call to our function, and return its result as our exit code. But what does atoi refer to here? We must first tell LLVM that we want to call something from the outside of our program. It will give us back a reference to the function, so that we can build up calls to that something.

ty_atoi = Type.function(ty_int, [ty_str])
atoi = m.add_function(ty_atoi, "atoi")
atoi.linkage = LINKAGE_EXTERNAL
All that is left before we can generate our very own native executables is file handling and calling the right tools. On Mac OS X, it goes a little something like this:
import os

if filename.endswith(".zoom"):
    basename = filename[:-5]
else:
    basename = "a"

bitcode = basename + ".bc"
asm = basename + ".s"
obj = basename + ".o"
executable = basename

f=open(bitcode, "wb")
m.to_bitcode(f)
f.close()

if target=="x86":
    os.system("llc -filetype=obj %s -o %s" % (bitcode, obj))
    os.system("ld -arch x86_64 %s -lc /Developer/SDKs/MacOSX10.6.sdk/usr/lib/crt1.o -o %s" % (obj, executable))
else:
    os.system("llc -mtriple=arm-apple-darwin -filetype=obj %s -o %s" % (bitcode, obj))
    os.system("ld -arch_multiple -arch arm %s -o %s -L/Developer/Platforms/iPhoneOS.platform/DeviceSupport/4.2/Symbols/usr/lib /Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS4.3.sdk/usr/lib/crt1.o -lSystem " % (obj, executable))
In order to build for other targets, you can specify -mtriple=xxx on the llc command line. You will of course have to run a matching version ld in order to generate your executable.

So, we can build native, and even cross-compiled, executables, but what about optimisation? Turns out we get that too. The LLVM bitcode we generate for the expression 1+1+1+1+a+a+a+a is

define i32 @f(i32 %a) {
entry:
  %0 = add i32 %a, %a
  %1 = add i32 %a, %0
  %2 = add i32 %a, %1
  %3 = add i32 1, %2
  %4 = add i32 1, %3
  %5 = add i32 1, %4
  %6 = add i32 1, %5
  ret i32 %6
}
whereas the resulting x86 assembler code is
_f:                                     ## @f
Ltmp0:
## BB#0:                                ## %entry
        leal    (%rdi,%rdi), %eax
        addl    %edi, %eax
        leal    4(%rdi,%rax), %eax
        ret
Clearly, llvm has figured out what's wrong and folded the constant additions and lowered the multiplication to a left shift.

So, what's the conclusion from this? That building optimizing cross-compilers is a weekend project for a small DSL, and you can do it all in Python - and if that doesn't zoom, nothing does.