I've just quickly tried out gradle as a build system. I have three observations so far.

1) Netbeans have a pretty good integration

2) Repository configuration is in the build script. That is in my book a pretty big f*ckup, maven got rid of that with maven 2. I know I can write my own build script but a new build system where there is no convention for configuration of things that are specific to the build machine and not the source code... The irony of using Ivy and obscuring everything that is good with Ivy.

3) The run task... It is currently not possibly (said the forum 1 year ago and that's all I could find on this subject) to pass command line arguments to the application. You offer a "run" task but it is unusable.

If I have to write those things myself I might as well stick to bare Ant + Ivy.


Designing OO APIs

I'm trying to find a good pattern to designing an API/Framework that let's clients subclass to inherit convenient behavior.
These are my requirements:

The framework owns the collection of "Things". Like this:
class Framework {
  void addAThing(Thing t)...

A "Thing" is probably a composition of other stuffs. So it must have a Stuff getStuff() method. The framework uses a Things Stuff sometimes.

interface Thing {
   Stuff getStuff() ;

So one implementation of Thing is the simple version that gets stuff from the outside.
class Thing1 implements Thing {
   private final Stuff myStuff ;
   public Thing(final Stuff myStuff) { this.myStuff = checkNotNull(myStuff) ; }
   public Stuff getStuff() { return this.myStuff }

The problem I have is that there can be many different Things, and the actual creation of Stuff is something that I want the subclasses to be able to implement. So I'll help them along with this:

abstract class ParentThing implements Thing {
   private final Stuff myStuff ;
   public ParentThing() { 
        this.myStuff = checkNotNull(createStuff()); 
   public Stuff getStuff() { return this.myStuff }
   protected abstract Stuff createStuff() ;

The problem with that is that I'm relying on subclasses to be very nice about the implementation of createStuff() since the method is called in the constructor. It shouldn't for example register "this" as a callback to some other thread in createStuff since "this"-instance might not have left the constructor when the callback occurs.

So maybe this then:

interface Thing {
    void initialize() ;
    Stuff getStuff() ;

and then I'll fix it so that framework calls initialize before it uses the Thing, like so
class Framework {
  void addAThing(Thing t){ t.initialize(); ... }

But then I can't use final in the ParentThing class anymore, leaving me with another state of Things, "created but not initialized yet". 

So maybe I actually should have 

interface ThingBuilder {
   Thing createAThing() ;

and then

class Framework {
  void addAThing(ThingBuilder t)...

Might not be so bad with closures, but without closures/lambda it gets a bit messy on the client side.

Ideas are welcome. How to design an OO API? 


Debian Squeeze, chrome and firefox woes

Ok so I am guilty of running stable software, I use Debian Squeeze. It is rather conservative in updating versions of things, doing a lot of testing, and for that reason everything "just works", even if I can't run the latest games and so on.

Google chrome has another strategy, they update their version of the browser all the time. Recently I got updated to Version 26.0.1410.43.
Now chrome says that I am running on an unsupported version of my operating system (Debian 6). It has a link to a page that states what versions are supported. That page says that Debian 6 is supported.

So given the recent public relations stunts that Google has pulled (like ditching google reader - anyone remember orkut, still running....) I wasn't to keen on cutting them any slack, I'll just go back to firefox despite the abbysmal WebGL performance.

Firefox isn't included in Debian because of ... principles. Firefox will not license their logo in a Debian friendly manner because it's like trademark and their property and stuffs. Debian will not include software that have restricted licenses and trademarks and stuffs.
Principles, I can actually dig that. I'll just download it myself.

Happy like a little child finding a fresh puddle of mud on a spring day I headed over to mozilla.org to get me some firefox goodness.
Downloaded the version 19, unpack, click binary, wait for things to break (they usually do). But it ran. That's pretty nice of them.
Only it looked really crappy but I didn't have time to look into what that was about.
Until now, I'm home with a cold and between sneezing and other unpleasantness I tried to get Java-applets running. Only to find out that the reason firefox looks like, pardon the language, shite, and that the java plugin doesn't even get recognized is because it is a 32-bit build.
This is the year of our grace 2013.
I went 64-bit before my niece was born and she is in school now. I mean, come on!
The stupid, stupid, stupid, download page automatically detects I'm on linux and starts a download without asking and it picks the wrong arch. Ok, accidents happen, so where is the download link then? There freaking isn't any!
The solution was to google my way through countless questions until finding ftp://ftp.mozilla.org/pub/firefox/releases/
Klicking my way to the 64 bit version 19 beta build. Unpack that, then let firefox auto-update to version 20 and THEN the plugins work.
It looks good now also because it doesn't try to load GTK from the wrong 32-bit libraries that it obviously can't find since this is, as said, a 64-bit clean machine.


Minimum viable web framework

In some projects I've been working on, there's a small part of common code and thinking that one could call a minimum viable web framework (well, technically, it's a library and not a framework, but I digress). There's absolutely nothing new in it, and if you've ever done even one Servlet app, you know it already. I see that as an advantage.

My minimum viable web framework consists of 8 lines of code and one principle. The code (quoted below) is a static method (on some utility class) that takes a path to which to forward a request, the request and response objects, and then a vararg list of alternating names and values to put as request attributes. The principle is: if something produces any HTML, then that's all that thing does (inverted: if you don't produce HTML, you don't produce HTML). Anything that produces HTML (in my cases, JSPs) only reads values prepared for it from the request attributes (and if you have something that produces some output that isn't HTML, then it's by definition a special case and is exempt from this rule). It's OK to have lists and other simple data structures as request attributes, and it's OK to use simple looping constructs (e.g. a JSTL core forEach) in the rendering, but you can't do things like calling methods from there.

public static void forward(String path, HttpServletRequest request, HttpServletResponse response, Object... attributes)
throws ServletException, IOException 
        for(int i=0; i<attributes.length; i+=2) {
                request.setAttribute((String)attributes[i], attributes[i+1]);
        request.getRequestDispatcher(path).forward(request, response);

Together, the code and the principle give a simple way of keeping application code and page formatting separate, and I've been quite happy with it in my projects. There's a lot of things missing that you might expect a framework to provide (e.g. input validation or data access). I see those as nice to have and that the only essential part of a web framework is facilitating the separation of logic and rendering.

So, what do you think - is this web framework both minimal and viable? If it's not minimal - how can we minimise it? If it's not viable, then what's missing? And last but not least: does it zoom?



At my daytime gig I have come into contact with two applications that really zooms.

One is Splunk and I'm sure the developers of that application drink awesome-sauce for breakfast. It is so awesome I won't even write more about it here because I just don't know what words to use. Suffice it to say that it is a log analysis tool that actually beats 'find, xargs, awk, grep' and all of those.

The other application I've came into contact with is OpenTSDB. OpenTSDB is a time series database, that means you put metrics into it that you want to plot over time. OpenTSDB uses hbase as a database. That is the Apache hadoop database.
OpenTSDB has a web front end for plotting the data points that uses gnuplot. It is rather simplistic as a web front but it is mostly bug free and despite a few little quirks it "just works" exactly like you want it to. It does one thing (send a query to OpenTSDB and plot a PNG-image) and it does it good.

We use OpenTSDB to monitor our servers and applications. We put numbers into it like, how much heap has the server, what is the number of busy threads, how many messages was put on the message queue, what was the response time for each web service call.
It is extremely helpful in monitoring and post-mortem analysis of application behaviour. I mean like, really useful. I can correlate exactly the number of packets the load balancer sends to a certain host at 5s intervals with the number of busy threads, the heap size, the cpu load and a load of application specific metrics extracted from the JVMs using JMX.
The JMX-collector, developed in house, is written by some pretty clever guys to be fast but you can also write one, it isn't that hard. Remember to do everything async. and use caching and you're good to go.

I just can not stress enough how incredibly useful OpenTSDB is for not only monitoring what happens now but what happened that sunday when a couple of servers want haywire and didn't respond. Given the precision of the correlation it is very easy to find the relevant log entries from the time stamps.

To put in to perspective how good opentsdb & hbase is. We dump I would say more than 1 metric/second (usually we poll a specific metric each 5 sec or so - depending on what it is).
From each server, and we have > 35 servers.
To a single opentsdb + hbase server.

And it just freaking works. I can actually see exactly what the heap size was for host X on christmas eve.

Just tonight I installed OpenTSDB on my local machine/server, I don't need it at home but just to pay tribute to it.


Tracking down memory corruption by mprotecting your ADTs

In C, it's customary to design your code around Abstract Data Types, that is, modules that consist of a header file that declares the external interface of the module (consisting of an opaque struct and a set of functions operating on that struct), and an implementation file (which has the full declaration of the structure, the definitions of the functions in the header and any helper functions). The header would be something like this:

#include <stddef.h>
#include <stdbool.h>

struct stack;

struct stack* stack_create(size_t size);
bool stack_push(struct stack* stack, int i);
bool stack_pop(struct stack* stack, int* i);
void stack_destroy(struct stack* stack);


The implementation, then, is:

#include "stack.h"

#include <stdlib.h>

struct stack
  int* elements;
  size_t used;
  size_t allocated;

struct stack* stack_create(size_t size)
  struct stack* stack = malloc(sizeof(struct stack));
  stack->elements = malloc(size * sizeof(int));
  stack->used = 0;
  stack->allocated = size;
  return stack;

bool stack_push(struct stack* stack, int i)
  if (stack->used == stack->allocated) {
    goto error_full;
  stack->elements[stack->used++] = i;
  return true;
  return false;

I'll leave the implementation of the rest of the functions to your imagination. Since only the forward declaration of the struct is in the header, no code outside the implementation can access the members of the struct.

Now, assume we have a memory corruption fault somewhere in the rest of the program which when triggered corrupts the elements pointer but doesn't have any other effects. Our program them seems to be working fine until some later time when it suddenly crashes due to an invalid memory access in stack_push. We'd really like to get the program to abort at the point of the original corruption of the elements pointer, but how can we do that?

One way of solving that is to use the fact that since the structure is opaque, there is no way that any code outside our implementation file has any legitimate use of touching any of the memory to which struct stack* points. Since no other code has any business accessing that memory, then maybe we can have the OS help us preventing it from doing that? Enter mprotect.

The mprotect function lets us control what types of access should be permitted to a region of memory. If we access the memory in any other way, the OS is free (and in some cases even required) to abort our program at the spot. If we keep the memory inaccessible at all times except for when we use it inside our implementation functions, then chances are we can catch the memory corruption as it happens. The mprotect man page does say that the memory it protects has to be page aligned, though. How do we do that? Via posix_memalign and getpagesize, like so:

#include "stack.h"
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>

struct stack* stack_create(size_t size)
  struct stack* stack;
  posix_memalign((void**)&stack, getpagesize(), sizeof(struct stack));
  posix_memalign((void**)&stack->elements, getpagesize(), size * sizeof(int));
  stack->used = 0;
  stack->allocated = size;
  return stack;

Now we just have to implement and use the protect function mentioned above and its inverse, unprotect:

static void protect(struct stack* stack)
  mprotect(stack->elements, stack->allocated * sizeof(int), PROT_NONE);
  mprotect(stack, sizeof(*stack), PROT_NONE);
static void unprotect(struct stack* stack)
  /* Unprotect in the reverse order, or we crash and burn trying to read stack->elements */
  mprotect(stack, sizeof(*stack), PROT_READ|PROT_WRITE);
  mprotect(stack->elements, stack->allocated * sizeof(int), PROT_READ|PROT_WRITE);
bool stack_push(struct stack* stack, int i)
  if (stack->used == stack->allocated) {
    goto error_full;
  stack->elements[stack->used++] = i;
  return true;
  return false;

Since this ends up modifying bits in the MMU, it may not be suitable to have enabled on performance critical code, so an #ifdef NDEBUG switch that selects an empty implementation of protect and unprotect for non-debug builds could be advisable.

So does protecting your ADTs via mprotect zoom? Well, it does come in handy at times, and there's not much disadvantage to using it, so my verdict is: Zooms!


Compiling with LLVM

LLVM is, among many other things, a reusable compiler backend implemented as a library. In this post, I'll show how the llvmpy Python bindings can be used to write an optimising cross-compiler in a few lines of Python. As a starting point, let's re-use the old parser and interpreter from the Python Parsing Combinators series. Since the grammar only has operators and numbers, we'll need to extend it a bit - there's not much point in spending time compiling something that isn't going to take any input. Let's add named values like so:
identifier = RegEx("[a-zA-Z_][a-zA-Z0-9_]*").apply(Identifier)
simple = number | paren | identifier
Next, we'll need an implementation for this new node in our Abstract Syntax Tree:
class Identifier(object):
    def __init__(self, s):
    def eval(self, env):
        return env[self.name]
    def __repr__(self):
        return repr(self.name)
As you can see if you compare this eval to the old ones, we now take a dictionary as argument to eval, and that it maps names to numbers. The old eval implementations should be extended to pass along this environment, but Identifier is the only one that has a use for it.

Interpreting is now done thus:

expr.parse("a+1", 0).next()[0].eval({"a":4})
Now, the topic of this post is compilation, not interpretation. Let's start by importing the llvmpy bindings:
from llvm import *
from llvm.core import *
from llvm.ee import *
In LLVM, you have modules (roughly object files) that contain functions, which take arguments and contain a basic block. Basic blocks contain instructions, which can in turn contain things like basic blocks or call functions. Creating a module is simple:
m = Module.new("m")
The function we want to add to the module should correspond to the expression we're compiling. It should return an int, and its arguments should be the identifiers that are used in the expression. We can find the identifiers by adding a method to the AST nodes that return the union of all identifiers in the sub-tree (it's trivial, so I'll spare you the implementation). Let's sort them ASCIIbetically and use that as our arguments. We're going to need a mapping from identifier name to function arguments, so let's build that up as well.
identifiers = sorted(list(ast.identifiers()))
ty_int = Type.int()
ty_func = Type.function(ty_int, [ty_int] * len(identifiers))
f = m.add_function(ty_func, "f")
args = {}
for i in range(len(identifiers)):
    f.args[i].name = identifiers[i]
    args[identifiers[i]] = f.args[i]
Now, function bodies consist of a basic block (i.e. a series of statements). You can add statements to a basic block using a builder.
bb = f.append_basic_block("entry")
builder = Builder.new(bb)
Now we're ready for the interesting parts. For each AST node type, we add a method for compiling to LLVM opcodes. Since LLVM uses Static Single Assignment, each opcode results in the assignment of a variable, and this variable is then never modified. Every method on the builder that appends an instruction to the basic block returns the resulting value, which makes it natural to do the same for our compile method: we return the value that will hold the results of the computation. As arguments, let's take the builder that we use for appending the instructions, the type to use for our numbers, and the argument mapping we built up. Identifiers are simple: we just return the argument that corresponds to the identifier:
class Identifier(object):
    def compile(self, builder, ty_int, args):
        return args[self.name]
Numbers are also easy - we just return a constant value.
class Number(object):
    def compile(self, builder, ty_int, args):
        return Constant.int(ty_int, self.n)
Now, how do we do our two remaining AST node types, addition and multiplication? We can do this by first emitting the code for the left hand side of the operation, remembering the value that will hold the results for that. Then we do the same for the right-hand side, and finally, we emit a multiplication instruction that takes the resulting values for the left hand side and the right hand side as its operands.
class Mul(BinOp):
    def compile(self, builder, ty_int, args):
        a = self.a.compile(builder, ty_int, args)
        b = self.b.compile(builder, ty_int, args)
        return builder.mul(a, b)
Addition is of course the same, except for using builder.add. We can now let the basic block of our function return the value that will hold the result of evaluating our entire AST by doing
builder.ret(ast.compile(builder, ty_int, args))
We can now generate machine code from this, and run it directly:
ee = ExecutionEngine.new(m)
    "a": GenericValue.int(ty_int, 100),
    "b": GenericValue.int(ty_int, 42)
for param in identifiers:
retval = ee.run_function(m.get_function_named("f"), args)
This will build up a function in memory, which we can pass some arguments and call. This is really quite amazing. You call some functions to describe how to do something, and then you get a function back that you can execute directly. This can be used for all sorts of interesting things, like partial specialisation of a function at runtime, to take one example. Let's say you have to do a fully unindexed join on two strings in a database. For each value in the first table, you will need to do a string comparison with every single value in the other table. Since none of the strings we're comparing are known at compile time, there's little the compiler can do except hand us its very best version of a general strcmp. With LLVM, we can do better: for each string in one of the tables, generate a specialised string comparison function that compares a string in the other table to that particular one. Such functions can be optimised quite a bit.

The fun doesn't stop here either. In addition to generating functions in memory, we can serialise them to bitcode files using the to_bitcode method of LLVM modules. That generates a .bc file for which we can generate assembler code for some particular architecture, and then use the platform assembler and linker to generate an executable. To do that, however, we first need a main function. Since the function we generated for our expression above takes ordinary integers, we'll have to generate a main function that will take inputs form somewhere (e.g. the command line), convert that to integers, pass them to the function, and then make the result known to the user. To make things easy, let's just call the libc function atoi on argv[1], argv[2],... and use the result of the function as the exit code of our program.

ty_str = Type.pointer(Type.int(8))
ty_main = Type.function(ty_int, [ty_int, Type.pointer(ty_str)])
main = m.add_function(ty_main, "main")
main_bb = main.append_basic_block("entry")
bmain = Builder.new(main_bb)
for i in range(1, len(identifiers)+1):
    # atoi(argv[i])
    s = bmain.load(bmain.gep(argv, [Constant.int(ty_int, i)]))
    int_args.append(bmain.call(atoi, [s]))

    bmain.ret(bmain.call(f, int_args))
Here, ty_str is a character pointer, and main is a function from int and pointer to character pointer to int. In its basic block, we add one call to atoi for each identifier that our expression uses. The argument to atoi would in C be written as argv[i], but here, we do it by first getting an element pointer ("gep" is short-hand for Get Element Pointer) to the string we want, and then dereference that. Think of the "load" as the * and the gep as the + in *(argv+i)

We then add a call to our function, and return its result as our exit code. But what does atoi refer to here? We must first tell LLVM that we want to call something from the outside of our program. It will give us back a reference to the function, so that we can build up calls to that something.

ty_atoi = Type.function(ty_int, [ty_str])
atoi = m.add_function(ty_atoi, "atoi")
atoi.linkage = LINKAGE_EXTERNAL
All that is left before we can generate our very own native executables is file handling and calling the right tools. On Mac OS X, it goes a little something like this:
import os

if filename.endswith(".zoom"):
    basename = filename[:-5]
    basename = "a"

bitcode = basename + ".bc"
asm = basename + ".s"
obj = basename + ".o"
executable = basename

f=open(bitcode, "wb")

if target=="x86":
    os.system("llc -filetype=obj %s -o %s" % (bitcode, obj))
    os.system("ld -arch x86_64 %s -lc /Developer/SDKs/MacOSX10.6.sdk/usr/lib/crt1.o -o %s" % (obj, executable))
    os.system("llc -mtriple=arm-apple-darwin -filetype=obj %s -o %s" % (bitcode, obj))
    os.system("ld -arch_multiple -arch arm %s -o %s -L/Developer/Platforms/iPhoneOS.platform/DeviceSupport/4.2/Symbols/usr/lib /Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS4.3.sdk/usr/lib/crt1.o -lSystem " % (obj, executable))
In order to build for other targets, you can specify -mtriple=xxx on the llc command line. You will of course have to run a matching version ld in order to generate your executable.

So, we can build native, and even cross-compiled, executables, but what about optimisation? Turns out we get that too. The LLVM bitcode we generate for the expression 1+1+1+1+a+a+a+a is

define i32 @f(i32 %a) {
  %0 = add i32 %a, %a
  %1 = add i32 %a, %0
  %2 = add i32 %a, %1
  %3 = add i32 1, %2
  %4 = add i32 1, %3
  %5 = add i32 1, %4
  %6 = add i32 1, %5
  ret i32 %6
whereas the resulting x86 assembler code is
_f:                                     ## @f
## BB#0:                                ## %entry
        leal    (%rdi,%rdi), %eax
        addl    %edi, %eax
        leal    4(%rdi,%rax), %eax
Clearly, llvm has figured out what's wrong and folded the constant additions and lowered the multiplication to a left shift.

So, what's the conclusion from this? That building optimizing cross-compilers is a weekend project for a small DSL, and you can do it all in Python - and if that doesn't zoom, nothing does.