2012-09-22

Goto Checker

One of the cool new features of the upcoming Clang release is the tooling infrastructure. This Clang tooling infrastructure provides an easy way of writing style checkers, source-to-source rewriters, analyzers and all sorts of other tools that need to understand C and related languages.

Especially nice is the AST matching support that landed recently. With the AST matchers, you can describe patterns of code that you're interested in in a declarative DSL, and get a callback called for all instances of that pattern in the code you're parsing. It's a bit like XPath for C.

As a demonstration of this, I've written a little code checker that checks that code follows a particular error handling style, and warns of any transgressions. The error handling style I chose is the one I prefer in C, which I call the reverse label style. The reverse label style has been made popular though its use in the Linux kernel, and looks like the below example:

int read_file_normal(const char* filename)
{
 FILE* f = fopen(filename, "rt");
 char buf[100];

 if (f == NULL) {
  fprintf(stderr, "Failed to open %s", filename);
  goto error_open;
 }

 if (fread(buf, sizeof(buf), 1, f) == 0) {
  fprintf(stderr, "Failed to read data from %s\n", filename);
  goto error_read;
 }

 if (fclose(f) == EOF) {
  fprintf(stderr, "Failed to close %s\n", filename);
 }

 return 0;

error_read:
 fclose(f);
error_open:
 return -1;
}
For each error case, we have a goto to a unique label. Starting at that label is all cleanup that should be done in the current case. The labels are therefore listed in reverse order to the gotos. A bit more discussion can be found at the staila blog. As rightly pointed out there, it results in readable code that unfortunately easily can rot during maintenance.

Automatic code checking to the rescue. We want to find all function definitions, and in each function definition we want to see all uses of goto and all labels, and verify that we first have one section of gotos but no labels, which is then followed by a section of labels, listed in reverse order of their corresponding gotos in the first section, without any jumps in it. We thus set up a MatchFinder and add our three patterns and bind their match results to names we can use in our callback:

        MatchFinder mf;
        GotoChecker handler;

        mf.addMatcher(functionDecl(isDefinition()).bind("func"), &handler);
        mf.addMatcher(gotoStmt().bind("goto"), &handler);
        mf.addMatcher(labelStmt().bind("label"), &handler);
When we run our tool on some code, our GotoChecker::run method will be called with its MatchResult set to one of these three things. So, we check which one it is, and if it's the start of a new function, we note that we're now in the normal section of the function (as opposed to the error handling section), and we clear our stack of labels:
       const clang::FunctionDecl* func =
                Result.Nodes.getNodeAs("func");
        const clang::GotoStmt* g =
                Result.Nodes.getNodeAs("goto");
        const clang::LabelStmt* label =
                Result.Nodes.getNodeAs("label");
        if (func) {
                in_error_section = false;
                gotos.erase(gotos.begin(), gotos.end());
        }
If we get a goto, we first check that we're not in the error handling section, and then append its label to our stack of labels.
        if (g) {
                clang::LabelDecl* label = g->getLabel();
                if (in_error_section) {
                        clang::DiagnosticsEngine& d =
                                Result.Context->getDiagnostics();
                        unsigned int id = d.getCustomDiagID(
                                clang::DiagnosticsEngine::Warning,
                                "Found goto to label %0 inside "
                                "error hanling section"
                        );
                        d.Report(g->getLocStart(), id) << label;
                }
                gotos.push_back(label);
        }
Here we can also see an example of how to produce diagnostics from a Clang tool. The DiagnosticBuilder returned by DiagnosticsEngine::Report accepts a number of different Clang types, and will format them appropriately. In this case, we're giving it a NamedDecl, so it knows to put it in quotes. The source location we give to Report also means that Clang knows to quote from the source and highlight exactly what we're objecting to.

Now, the final part of the puzzle is to handle labels. It would be really nice if we could verify that there's no way for the code to fall though from the normal section to the error handling section, but that would involve scanning backwards though the statements and checking that the last is either a return, a call to a function known not to return, or a conditional where all branches end with a return or a function knon not to return or a conditional where... Let's skip that for now, and just get the name of the label.

       if (label) {
                if (!in_error_section) {
                        // TODO: Check somehow that all paths have returned
                }
                clang::LabelDecl* found = label->getDecl();
                std::string name = found->getNameAsString();
Since we're enforcing that goto is only used for error handling, let's check that the name of the label reflects this:
                if (
                        name.substr(0, 6) != "error_"
                        && name.substr(0, 8) != "cleanup_"
                        && name.substr(0, 4) != "err_"
                        && name != "exit"
                ) {
                        clang::DiagnosticsEngine& d =
                                Result.Context->getDiagnostics();
                        unsigned int id = d.getCustomDiagID(
                                clang::DiagnosticsEngine::Warning,
                                "Illegal label name: %0"
                        );
                        d.Report(label->getLocStart(), id) << found;
                }
This feature could be debated. Maybe it should be optional, and maybe the permitted prefixes should be configurable, but that's for version 2.0.

Now all that is left is to check that the label we found matches the top of the label stack we built up from the gotos in the normal section.

                if (gotos.size() == 0) {
                        return;
                }
                clang::LabelDecl* expected = gotos.back();
                in_error_section = true;

                if (found != expected) {
                        clang::DiagnosticsEngine& d =
                                Result.Context->getDiagnostics();
                        unsigned int id = d.getCustomDiagID(
                                clang::DiagnosticsEngine::Warning,
                                "Error handling sequence mismatch. "
                                "Expected %0, found %1"
                        );
                        d.Report(label->getLocStart(), id) << expected << found;
                }

                gotos.pop_back();
Apart from a handfull of boilderplate lines (~4 lines plus some includes and namespace uses), that's it. What I like is that all the parsing and scanning and other things that would normally get in our way just disappears and we can go straight to implementing our checking logic.

The full source can be had at the project page. Note that it requires LLVM and Clang from trunk, but there are good instructions for getting that.

In summary, the Clang tooling infrastructure definitly zooms. I expect to be using this quite a bit from now on, and if you're working with C or its derivates, I think you should too.

2012-09-09

The Netbeans build system

This is more of a "venting frustration" than informative. I have been using netbeans for a while now, at least the re-branded version that is jMonkeEngine SDK. However, this is about the netbeans part of it and nothing to do with the jME SDK.
I have been thinking for years that I should port Svansprogram to a Rich Client Platform instead of hacking my own. Hacking my own RCP was fun at first but now it is mostly tedious.
So to learn module development on Netbeans I have started out with some small hacks to get a feel for it. Ran the Hello World things and so on.
Eventually came the time to make my very own module, first I let Netbeans create the project, then I put it into my version control and then on to set up my continuous build system.
The build system in netbeans is Ant. Which is all very fine by me, Ant+Ivy is my preference if I can choose whatever I like. It is always good if the IDE understands the real build system, i.e. the headless builds run on the build server. You know, the real build, that produces the installable binaries.
Eclipse has a totally separate build system so you need to keep the IDE and the real build in synch using plugins and other voodoo. So if Netbeans have some build system that I can use in both the IDE and on the CI machine I'm a happy camper. So with a smile on my face I set out to run the netbeans project on the CI machine.
That is when the horror starts.
The way netbeans set up the builds is to have some archane and weirdly mutated Ant scripts, without documentation, importing things from secret directories. Then have some plugins in the IDE that knows the secret locations to insert values here and there. I have never in my life as a professional coder seen a more opaque and strange build system.
For example, it uses an Ant plugin called CopyLibs to, you know, copy JAR-files it needs for the build. It makes me want to grab the engineer by the throat and foaming at the mouth scream at them that ANT already can copy files or why don't you just use Ivy.
So in order for my CI machine to even be able to get all the required dependencies I have to pollute my Ant installation making it harder to have reproducible builds since now my entire build system is dependent on a specific version of Netbeans.
It does not stop there, to make a plugin for Netbeans I must of course build and link and all that against a specific version of Netbeans. This is acheived by including build scripts from a "Harness" directory that is specific to a version of Netbeans. This is all understandable, but the location of this harness-directory is set up in a "private" property file, through one or two ant properties with closely guarded complex names. The private property file is actually hidden in $HOME.
Yeah you heard me, to make the CI machine build it will have to have access to my $HOME to be able to find the harness directory that is from my _installation_ of netbeans, not from some central repository of dependecies.
There are some other ant-voodoo that Netbeans hides in a FAQ to instruct people how to download the harness from the web instead of pointing the CI machine to an actual installation of netbeans. I say it again, an installation, that requires you to run an installer, and click through stuff.
There is more to this story but now I need to go cry in a dark corner. It has taken me several weeks of hobby-coding time to even get this far, and then to realize that netbeans generates images when building so you have to remember to set the JVM as headless or the build fails. And oh yeah, when I did it generated blank images so my menu entries when the module is installed are empty lines.