2013-04-04

Debian Squeeze, chrome and firefox woes

Ok so I am guilty of running stable software, I use Debian Squeeze. It is rather conservative in updating versions of things, doing a lot of testing, and for that reason everything "just works", even if I can't run the latest games and so on.

Google chrome has another strategy, they update their version of the browser all the time. Recently I got updated to Version 26.0.1410.43.
Now chrome says that I am running on an unsupported version of my operating system (Debian 6). It has a link to a page that states what versions are supported. That page says that Debian 6 is supported.
Yeah
Testing

So given the recent public relations stunts that Google has pulled (like ditching google reader - anyone remember orkut, still running....) I wasn't to keen on cutting them any slack, I'll just go back to firefox despite the abbysmal WebGL performance.

Firefox isn't included in Debian because of ... principles. Firefox will not license their logo in a Debian friendly manner because it's like trademark and their property and stuffs. Debian will not include software that have restricted licenses and trademarks and stuffs.
Principles, I can actually dig that. I'll just download it myself.

Happy like a little child finding a fresh puddle of mud on a spring day I headed over to mozilla.org to get me some firefox goodness.
Downloaded the version 19, unpack, click binary, wait for things to break (they usually do). But it ran. That's pretty nice of them.
Only it looked really crappy but I didn't have time to look into what that was about.
Until now, I'm home with a cold and between sneezing and other unpleasantness I tried to get Java-applets running. Only to find out that the reason firefox looks like, pardon the language, shite, and that the java plugin doesn't even get recognized is because it is a 32-bit build.
This is the year of our grace 2013.
I went 64-bit before my niece was born and she is in school now. I mean, come on!
The stupid, stupid, stupid, download page automatically detects I'm on linux and starts a download without asking and it picks the wrong arch. Ok, accidents happen, so where is the download link then? There freaking isn't any!
The solution was to google my way through countless questions until finding ftp://ftp.mozilla.org/pub/firefox/releases/
Klicking my way to the 64 bit version 19 beta build. Unpack that, then let firefox auto-update to version 20 and THEN the plugins work.
It looks good now also because it doesn't try to load GTK from the wrong 32-bit libraries that it obviously can't find since this is, as said, a 64-bit clean machine.
Yeah
Testing

2013-01-30

Minimum viable web framework

In some projects I've been working on, there's a small part of common code and thinking that one could call a minimum viable web framework (well, technically, it's a library and not a framework, but I digress). There's absolutely nothing new in it, and if you've ever done even one Servlet app, you know it already. I see that as an advantage.

My minimum viable web framework consists of 8 lines of code and one principle. The code (quoted below) is a static method (on some utility class) that takes a path to which to forward a request, the request and response objects, and then a vararg list of alternating names and values to put as request attributes. The principle is: if something produces any HTML, then that's all that thing does (inverted: if you don't produce HTML, you don't produce HTML). Anything that produces HTML (in my cases, JSPs) only reads values prepared for it from the request attributes (and if you have something that produces some output that isn't HTML, then it's by definition a special case and is exempt from this rule). It's OK to have lists and other simple data structures as request attributes, and it's OK to use simple looping constructs (e.g. a JSTL core forEach) in the rendering, but you can't do things like calling methods from there.

public static void forward(String path, HttpServletRequest request, HttpServletResponse response, Object... attributes)
throws ServletException, IOException 
{
        for(int i=0; i<attributes.length; i+=2) {
                request.setAttribute((String)attributes[i], attributes[i+1]);
        }
        request.getRequestDispatcher(path).forward(request, response);
}

Together, the code and the principle give a simple way of keeping application code and page formatting separate, and I've been quite happy with it in my projects. There's a lot of things missing that you might expect a framework to provide (e.g. input validation or data access). I see those as nice to have and that the only essential part of a web framework is facilitating the separation of logic and rendering.

So, what do you think - is this web framework both minimal and viable? If it's not minimal - how can we minimise it? If it's not viable, then what's missing? And last but not least: does it zoom?

2013-01-22

OpenTSDB

At my daytime gig I have come into contact with two applications that really zooms.

One is Splunk and I'm sure the developers of that application drink awesome-sauce for breakfast. It is so awesome I won't even write more about it here because I just don't know what words to use. Suffice it to say that it is a log analysis tool that actually beats 'find, xargs, awk, grep' and all of those.

The other application I've came into contact with is OpenTSDB. OpenTSDB is a time series database, that means you put metrics into it that you want to plot over time. OpenTSDB uses hbase as a database. That is the Apache hadoop database.
OpenTSDB has a web front end for plotting the data points that uses gnuplot. It is rather simplistic as a web front but it is mostly bug free and despite a few little quirks it "just works" exactly like you want it to. It does one thing (send a query to OpenTSDB and plot a PNG-image) and it does it good.

We use OpenTSDB to monitor our servers and applications. We put numbers into it like, how much heap has the server, what is the number of busy threads, how many messages was put on the message queue, what was the response time for each web service call.
It is extremely helpful in monitoring and post-mortem analysis of application behaviour. I mean like, really useful. I can correlate exactly the number of packets the load balancer sends to a certain host at 5s intervals with the number of busy threads, the heap size, the cpu load and a load of application specific metrics extracted from the JVMs using JMX.
The JMX-collector, developed in house, is written by some pretty clever guys to be fast but you can also write one, it isn't that hard. Remember to do everything async. and use caching and you're good to go.

I just can not stress enough how incredibly useful OpenTSDB is for not only monitoring what happens now but what happened that sunday when a couple of servers want haywire and didn't respond. Given the precision of the correlation it is very easy to find the relevant log entries from the time stamps.

To put in to perspective how good opentsdb & hbase is. We dump I would say more than 1 metric/second (usually we poll a specific metric each 5 sec or so - depending on what it is).
From each server, and we have > 35 servers.
To a single opentsdb + hbase server.

And it just freaking works. I can actually see exactly what the heap size was for host X on christmas eve.

Just tonight I installed OpenTSDB on my local machine/server, I don't need it at home but just to pay tribute to it.

2012-11-21

Tracking down memory corruption by mprotecting your ADTs

In C, it's customary to design your code around Abstract Data Types, that is, modules that consist of a header file that declares the external interface of the module (consisting of an opaque struct and a set of functions operating on that struct), and an implementation file (which has the full declaration of the structure, the definitions of the functions in the header and any helper functions). The header would be something like this:
#ifndef INCLUDE_GUARD_STACK_H
#define INCLUDE_GUARD_STACK_H

#include <stddef.h>
#include <stdbool.h>

struct stack;

struct stack* stack_create(size_t size);
bool stack_push(struct stack* stack, int i);
bool stack_pop(struct stack* stack, int* i);
void stack_destroy(struct stack* stack);

#endif

The implementation, then, is:

#include "stack.h"

#include <stdlib.h>

struct stack
{
  int* elements;
  size_t used;
  size_t allocated;
};

struct stack* stack_create(size_t size)
{
  struct stack* stack = malloc(sizeof(struct stack));
  stack->elements = malloc(size * sizeof(int));
  stack->used = 0;
  stack->allocated = size;
  return stack;
}

bool stack_push(struct stack* stack, int i)
{
  if (stack->used == stack->allocated) {
    goto error_full;
  }
  stack->elements[stack->used++] = i;
  return true;
error_full:
  return false;
}

I'll leave the implementation of the rest of the functions to your imagination. Since only the forward declaration of the struct is in the header, no code outside the implementation can access the members of the struct.

Now, assume we have a memory corruption fault somewhere in the rest of the program which when triggered corrupts the elements pointer but doesn't have any other effects. Our program them seems to be working fine until some later time when it suddenly crashes due to an invalid memory access in stack_push. We'd really like to get the program to abort at the point of the original corruption of the elements pointer, but how can we do that?

One way of solving that is to use the fact that since the structure is opaque, there is no way that any code outside our implementation file has any legitimate use of touching any of the memory to which struct stack* points. Since no other code has any business accessing that memory, then maybe we can have the OS help us preventing it from doing that? Enter mprotect.

The mprotect function lets us control what types of access should be permitted to a region of memory. If we access the memory in any other way, the OS is free (and in some cases even required) to abort our program at the spot. If we keep the memory inaccessible at all times except for when we use it inside our implementation functions, then chances are we can catch the memory corruption as it happens. The mprotect man page does say that the memory it protects has to be page aligned, though. How do we do that? Via posix_memalign and getpagesize, like so:

#include "stack.h"
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>

struct stack* stack_create(size_t size)
{
  struct stack* stack;
  posix_memalign((void**)&stack, getpagesize(), sizeof(struct stack));
  posix_memalign((void**)&stack->elements, getpagesize(), size * sizeof(int));
  stack->used = 0;
  stack->allocated = size;
  protect(stack);
  return stack;
}

Now we just have to implement and use the protect function mentioned above and its inverse, unprotect:

static void protect(struct stack* stack)
{
  mprotect(stack->elements, stack->allocated * sizeof(int), PROT_NONE);
  mprotect(stack, sizeof(*stack), PROT_NONE);
}
static void unprotect(struct stack* stack)
{
  /* Unprotect in the reverse order, or we crash and burn trying to read stack->elements */
  mprotect(stack, sizeof(*stack), PROT_READ|PROT_WRITE);
  mprotect(stack->elements, stack->allocated * sizeof(int), PROT_READ|PROT_WRITE);
}
bool stack_push(struct stack* stack, int i)
{
  unprotect(stack);
  if (stack->used == stack->allocated) {
    goto error_full;
  }
  stack->elements[stack->used++] = i;
  protect(stack);
  return true;
error_full:
  protect(stack);
  return false;
}

Since this ends up modifying bits in the MMU, it may not be suitable to have enabled on performance critical code, so an #ifdef NDEBUG switch that selects an empty implementation of protect and unprotect for non-debug builds could be advisable.

So does protecting your ADTs via mprotect zoom? Well, it does come in handy at times, and there's not much disadvantage to using it, so my verdict is: Zooms!

2012-10-28

Compiling with LLVM

LLVM is, among many other things, a reusable compiler backend implemented as a library. In this post, I'll show how the llvmpy Python bindings can be used to write an optimising cross-compiler in a few lines of Python. As a starting point, let's re-use the old parser and interpreter from the Python Parsing Combinators series. Since the grammar only has operators and numbers, we'll need to extend it a bit - there's not much point in spending time compiling something that isn't going to take any input. Let's add named values like so:
identifier = RegEx("[a-zA-Z_][a-zA-Z0-9_]*").apply(Identifier)
simple = number | paren | identifier
Next, we'll need an implementation for this new node in our Abstract Syntax Tree:
class Identifier(object):
    def __init__(self, s):
        self.name=s
    def eval(self, env):
        return env[self.name]
    def __repr__(self):
        return repr(self.name)
As you can see if you compare this eval to the old ones, we now take a dictionary as argument to eval, and that it maps names to numbers. The old eval implementations should be extended to pass along this environment, but Identifier is the only one that has a use for it.

Interpreting is now done thus:

expr.parse("a+1", 0).next()[0].eval({"a":4})
Now, the topic of this post is compilation, not interpretation. Let's start by importing the llvmpy bindings:
from llvm import *
from llvm.core import *
from llvm.ee import *
In LLVM, you have modules (roughly object files) that contain functions, which take arguments and contain a basic block. Basic blocks contain instructions, which can in turn contain things like basic blocks or call functions. Creating a module is simple:
m = Module.new("m")
The function we want to add to the module should correspond to the expression we're compiling. It should return an int, and its arguments should be the identifiers that are used in the expression. We can find the identifiers by adding a method to the AST nodes that return the union of all identifiers in the sub-tree (it's trivial, so I'll spare you the implementation). Let's sort them ASCIIbetically and use that as our arguments. We're going to need a mapping from identifier name to function arguments, so let's build that up as well.
identifiers = sorted(list(ast.identifiers()))
ty_int = Type.int()
ty_func = Type.function(ty_int, [ty_int] * len(identifiers))
f = m.add_function(ty_func, "f")
args = {}
for i in range(len(identifiers)):
    f.args[i].name = identifiers[i]
    args[identifiers[i]] = f.args[i]
Now, function bodies consist of a basic block (i.e. a series of statements). You can add statements to a basic block using a builder.
bb = f.append_basic_block("entry")
builder = Builder.new(bb)
Now we're ready for the interesting parts. For each AST node type, we add a method for compiling to LLVM opcodes. Since LLVM uses Static Single Assignment, each opcode results in the assignment of a variable, and this variable is then never modified. Every method on the builder that appends an instruction to the basic block returns the resulting value, which makes it natural to do the same for our compile method: we return the value that will hold the results of the computation. As arguments, let's take the builder that we use for appending the instructions, the type to use for our numbers, and the argument mapping we built up. Identifiers are simple: we just return the argument that corresponds to the identifier:
class Identifier(object):
    #...
    def compile(self, builder, ty_int, args):
        return args[self.name]
Numbers are also easy - we just return a constant value.
class Number(object):
    #...
    def compile(self, builder, ty_int, args):
        return Constant.int(ty_int, self.n)
Now, how do we do our two remaining AST node types, addition and multiplication? We can do this by first emitting the code for the left hand side of the operation, remembering the value that will hold the results for that. Then we do the same for the right-hand side, and finally, we emit a multiplication instruction that takes the resulting values for the left hand side and the right hand side as its operands.
class Mul(BinOp):
    #...
    def compile(self, builder, ty_int, args):
        a = self.a.compile(builder, ty_int, args)
        b = self.b.compile(builder, ty_int, args)
        return builder.mul(a, b)
Addition is of course the same, except for using builder.add. We can now let the basic block of our function return the value that will hold the result of evaluating our entire AST by doing
builder.ret(ast.compile(builder, ty_int, args))
We can now generate machine code from this, and run it directly:
ee = ExecutionEngine.new(m)
env={
    "a": GenericValue.int(ty_int, 100),
    "b": GenericValue.int(ty_int, 42)
}
args=[]
for param in identifiers:
    args.append(env[param])
retval = ee.run_function(m.get_function_named("f"), args)
This will build up a function in memory, which we can pass some arguments and call. This is really quite amazing. You call some functions to describe how to do something, and then you get a function back that you can execute directly. This can be used for all sorts of interesting things, like partial specialisation of a function at runtime, to take one example. Let's say you have to do a fully unindexed join on two strings in a database. For each value in the first table, you will need to do a string comparison with every single value in the other table. Since none of the strings we're comparing are known at compile time, there's little the compiler can do except hand us its very best version of a general strcmp. With LLVM, we can do better: for each string in one of the tables, generate a specialised string comparison function that compares a string in the other table to that particular one. Such functions can be optimised quite a bit.

The fun doesn't stop here either. In addition to generating functions in memory, we can serialise them to bitcode files using the to_bitcode method of LLVM modules. That generates a .bc file for which we can generate assembler code for some particular architecture, and then use the platform assembler and linker to generate an executable. To do that, however, we first need a main function. Since the function we generated for our expression above takes ordinary integers, we'll have to generate a main function that will take inputs form somewhere (e.g. the command line), convert that to integers, pass them to the function, and then make the result known to the user. To make things easy, let's just call the libc function atoi on argv[1], argv[2],... and use the result of the function as the exit code of our program.

ty_str = Type.pointer(Type.int(8))
ty_main = Type.function(ty_int, [ty_int, Type.pointer(ty_str)])
main = m.add_function(ty_main, "main")
main_bb = main.append_basic_block("entry")
bmain = Builder.new(main_bb)
argv=main.args[1]
int_args=[]
for i in range(1, len(identifiers)+1):
    # atoi(argv[i])
    s = bmain.load(bmain.gep(argv, [Constant.int(ty_int, i)]))
    int_args.append(bmain.call(atoi, [s]))

    bmain.ret(bmain.call(f, int_args))
Here, ty_str is a character pointer, and main is a function from int and pointer to character pointer to int. In its basic block, we add one call to atoi for each identifier that our expression uses. The argument to atoi would in C be written as argv[i], but here, we do it by first getting an element pointer ("gep" is short-hand for Get Element Pointer) to the string we want, and then dereference that. Think of the "load" as the * and the gep as the + in *(argv+i)

We then add a call to our function, and return its result as our exit code. But what does atoi refer to here? We must first tell LLVM that we want to call something from the outside of our program. It will give us back a reference to the function, so that we can build up calls to that something.

ty_atoi = Type.function(ty_int, [ty_str])
atoi = m.add_function(ty_atoi, "atoi")
atoi.linkage = LINKAGE_EXTERNAL
All that is left before we can generate our very own native executables is file handling and calling the right tools. On Mac OS X, it goes a little something like this:
import os

if filename.endswith(".zoom"):
    basename = filename[:-5]
else:
    basename = "a"

bitcode = basename + ".bc"
asm = basename + ".s"
obj = basename + ".o"
executable = basename

f=open(bitcode, "wb")
m.to_bitcode(f)
f.close()

if target=="x86":
    os.system("llc -filetype=obj %s -o %s" % (bitcode, obj))
    os.system("ld -arch x86_64 %s -lc /Developer/SDKs/MacOSX10.6.sdk/usr/lib/crt1.o -o %s" % (obj, executable))
else:
    os.system("llc -mtriple=arm-apple-darwin -filetype=obj %s -o %s" % (bitcode, obj))
    os.system("ld -arch_multiple -arch arm %s -o %s -L/Developer/Platforms/iPhoneOS.platform/DeviceSupport/4.2/Symbols/usr/lib /Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS4.3.sdk/usr/lib/crt1.o -lSystem " % (obj, executable))
In order to build for other targets, you can specify -mtriple=xxx on the llc command line. You will of course have to run a matching version ld in order to generate your executable.

So, we can build native, and even cross-compiled, executables, but what about optimisation? Turns out we get that too. The LLVM bitcode we generate for the expression 1+1+1+1+a+a+a+a is

define i32 @f(i32 %a) {
entry:
  %0 = add i32 %a, %a
  %1 = add i32 %a, %0
  %2 = add i32 %a, %1
  %3 = add i32 1, %2
  %4 = add i32 1, %3
  %5 = add i32 1, %4
  %6 = add i32 1, %5
  ret i32 %6
}
whereas the resulting x86 assembler code is
_f:                                     ## @f
Ltmp0:
## BB#0:                                ## %entry
        leal    (%rdi,%rdi), %eax
        addl    %edi, %eax
        leal    4(%rdi,%rax), %eax
        ret
Clearly, llvm has figured out what's wrong and folded the constant additions and lowered the multiplication to a left shift.

So, what's the conclusion from this? That building optimizing cross-compilers is a weekend project for a small DSL, and you can do it all in Python - and if that doesn't zoom, nothing does.

2012-09-22

Goto Checker

One of the cool new features of the upcoming Clang release is the tooling infrastructure. This Clang tooling infrastructure provides an easy way of writing style checkers, source-to-source rewriters, analyzers and all sorts of other tools that need to understand C and related languages.

Especially nice is the AST matching support that landed recently. With the AST matchers, you can describe patterns of code that you're interested in in a declarative DSL, and get a callback called for all instances of that pattern in the code you're parsing. It's a bit like XPath for C.

As a demonstration of this, I've written a little code checker that checks that code follows a particular error handling style, and warns of any transgressions. The error handling style I chose is the one I prefer in C, which I call the reverse label style. The reverse label style has been made popular though its use in the Linux kernel, and looks like the below example:

int read_file_normal(const char* filename)
{
 FILE* f = fopen(filename, "rt");
 char buf[100];

 if (f == NULL) {
  fprintf(stderr, "Failed to open %s", filename);
  goto error_open;
 }

 if (fread(buf, sizeof(buf), 1, f) == 0) {
  fprintf(stderr, "Failed to read data from %s\n", filename);
  goto error_read;
 }

 if (fclose(f) == EOF) {
  fprintf(stderr, "Failed to close %s\n", filename);
 }

 return 0;

error_read:
 fclose(f);
error_open:
 return -1;
}
For each error case, we have a goto to a unique label. Starting at that label is all cleanup that should be done in the current case. The labels are therefore listed in reverse order to the gotos. A bit more discussion can be found at the staila blog. As rightly pointed out there, it results in readable code that unfortunately easily can rot during maintenance.

Automatic code checking to the rescue. We want to find all function definitions, and in each function definition we want to see all uses of goto and all labels, and verify that we first have one section of gotos but no labels, which is then followed by a section of labels, listed in reverse order of their corresponding gotos in the first section, without any jumps in it. We thus set up a MatchFinder and add our three patterns and bind their match results to names we can use in our callback:

        MatchFinder mf;
        GotoChecker handler;

        mf.addMatcher(functionDecl(isDefinition()).bind("func"), &handler);
        mf.addMatcher(gotoStmt().bind("goto"), &handler);
        mf.addMatcher(labelStmt().bind("label"), &handler);
When we run our tool on some code, our GotoChecker::run method will be called with its MatchResult set to one of these three things. So, we check which one it is, and if it's the start of a new function, we note that we're now in the normal section of the function (as opposed to the error handling section), and we clear our stack of labels:
       const clang::FunctionDecl* func =
                Result.Nodes.getNodeAs("func");
        const clang::GotoStmt* g =
                Result.Nodes.getNodeAs("goto");
        const clang::LabelStmt* label =
                Result.Nodes.getNodeAs("label");
        if (func) {
                in_error_section = false;
                gotos.erase(gotos.begin(), gotos.end());
        }
If we get a goto, we first check that we're not in the error handling section, and then append its label to our stack of labels.
        if (g) {
                clang::LabelDecl* label = g->getLabel();
                if (in_error_section) {
                        clang::DiagnosticsEngine& d =
                                Result.Context->getDiagnostics();
                        unsigned int id = d.getCustomDiagID(
                                clang::DiagnosticsEngine::Warning,
                                "Found goto to label %0 inside "
                                "error hanling section"
                        );
                        d.Report(g->getLocStart(), id) << label;
                }
                gotos.push_back(label);
        }
Here we can also see an example of how to produce diagnostics from a Clang tool. The DiagnosticBuilder returned by DiagnosticsEngine::Report accepts a number of different Clang types, and will format them appropriately. In this case, we're giving it a NamedDecl, so it knows to put it in quotes. The source location we give to Report also means that Clang knows to quote from the source and highlight exactly what we're objecting to.

Now, the final part of the puzzle is to handle labels. It would be really nice if we could verify that there's no way for the code to fall though from the normal section to the error handling section, but that would involve scanning backwards though the statements and checking that the last is either a return, a call to a function known not to return, or a conditional where all branches end with a return or a function knon not to return or a conditional where... Let's skip that for now, and just get the name of the label.

       if (label) {
                if (!in_error_section) {
                        // TODO: Check somehow that all paths have returned
                }
                clang::LabelDecl* found = label->getDecl();
                std::string name = found->getNameAsString();
Since we're enforcing that goto is only used for error handling, let's check that the name of the label reflects this:
                if (
                        name.substr(0, 6) != "error_"
                        && name.substr(0, 8) != "cleanup_"
                        && name.substr(0, 4) != "err_"
                        && name != "exit"
                ) {
                        clang::DiagnosticsEngine& d =
                                Result.Context->getDiagnostics();
                        unsigned int id = d.getCustomDiagID(
                                clang::DiagnosticsEngine::Warning,
                                "Illegal label name: %0"
                        );
                        d.Report(label->getLocStart(), id) << found;
                }
This feature could be debated. Maybe it should be optional, and maybe the permitted prefixes should be configurable, but that's for version 2.0.

Now all that is left is to check that the label we found matches the top of the label stack we built up from the gotos in the normal section.

                if (gotos.size() == 0) {
                        return;
                }
                clang::LabelDecl* expected = gotos.back();
                in_error_section = true;

                if (found != expected) {
                        clang::DiagnosticsEngine& d =
                                Result.Context->getDiagnostics();
                        unsigned int id = d.getCustomDiagID(
                                clang::DiagnosticsEngine::Warning,
                                "Error handling sequence mismatch. "
                                "Expected %0, found %1"
                        );
                        d.Report(label->getLocStart(), id) << expected << found;
                }

                gotos.pop_back();
Apart from a handfull of boilderplate lines (~4 lines plus some includes and namespace uses), that's it. What I like is that all the parsing and scanning and other things that would normally get in our way just disappears and we can go straight to implementing our checking logic.

The full source can be had at the project page. Note that it requires LLVM and Clang from trunk, but there are good instructions for getting that.

In summary, the Clang tooling infrastructure definitly zooms. I expect to be using this quite a bit from now on, and if you're working with C or its derivates, I think you should too.

2012-09-09

The Netbeans build system

This is more of a "venting frustration" than informative. I have been using netbeans for a while now, at least the re-branded version that is jMonkeEngine SDK. However, this is about the netbeans part of it and nothing to do with the jME SDK.
I have been thinking for years that I should port Svansprogram to a Rich Client Platform instead of hacking my own. Hacking my own RCP was fun at first but now it is mostly tedious.
So to learn module development on Netbeans I have started out with some small hacks to get a feel for it. Ran the Hello World things and so on.
Eventually came the time to make my very own module, first I let Netbeans create the project, then I put it into my version control and then on to set up my continuous build system.
The build system in netbeans is Ant. Which is all very fine by me, Ant+Ivy is my preference if I can choose whatever I like. It is always good if the IDE understands the real build system, i.e. the headless builds run on the build server. You know, the real build, that produces the installable binaries.
Eclipse has a totally separate build system so you need to keep the IDE and the real build in synch using plugins and other voodoo. So if Netbeans have some build system that I can use in both the IDE and on the CI machine I'm a happy camper. So with a smile on my face I set out to run the netbeans project on the CI machine.
That is when the horror starts.
The way netbeans set up the builds is to have some archane and weirdly mutated Ant scripts, without documentation, importing things from secret directories. Then have some plugins in the IDE that knows the secret locations to insert values here and there. I have never in my life as a professional coder seen a more opaque and strange build system.
For example, it uses an Ant plugin called CopyLibs to, you know, copy JAR-files it needs for the build. It makes me want to grab the engineer by the throat and foaming at the mouth scream at them that ANT already can copy files or why don't you just use Ivy.
So in order for my CI machine to even be able to get all the required dependencies I have to pollute my Ant installation making it harder to have reproducible builds since now my entire build system is dependent on a specific version of Netbeans.
It does not stop there, to make a plugin for Netbeans I must of course build and link and all that against a specific version of Netbeans. This is acheived by including build scripts from a "Harness" directory that is specific to a version of Netbeans. This is all understandable, but the location of this harness-directory is set up in a "private" property file, through one or two ant properties with closely guarded complex names. The private property file is actually hidden in $HOME.
Yeah you heard me, to make the CI machine build it will have to have access to my $HOME to be able to find the harness directory that is from my _installation_ of netbeans, not from some central repository of dependecies.
There are some other ant-voodoo that Netbeans hides in a FAQ to instruct people how to download the harness from the web instead of pointing the CI machine to an actual installation of netbeans. I say it again, an installation, that requires you to run an installer, and click through stuff.
There is more to this story but now I need to go cry in a dark corner. It has taken me several weeks of hobby-coding time to even get this far, and then to realize that netbeans generates images when building so you have to remember to set the JVM as headless or the build fails. And oh yeah, when I did it generated blank images so my menu entries when the module is installed are empty lines.