2012-09-22

Goto Checker

One of the cool new features of the upcoming Clang release is the tooling infrastructure. This Clang tooling infrastructure provides an easy way of writing style checkers, source-to-source rewriters, analyzers and all sorts of other tools that need to understand C and related languages.

Especially nice is the AST matching support that landed recently. With the AST matchers, you can describe patterns of code that you're interested in in a declarative DSL, and get a callback called for all instances of that pattern in the code you're parsing. It's a bit like XPath for C.

As a demonstration of this, I've written a little code checker that checks that code follows a particular error handling style, and warns of any transgressions. The error handling style I chose is the one I prefer in C, which I call the reverse label style. The reverse label style has been made popular though its use in the Linux kernel, and looks like the below example:

int read_file_normal(const char* filename)
{
 FILE* f = fopen(filename, "rt");
 char buf[100];

 if (f == NULL) {
  fprintf(stderr, "Failed to open %s", filename);
  goto error_open;
 }

 if (fread(buf, sizeof(buf), 1, f) == 0) {
  fprintf(stderr, "Failed to read data from %s\n", filename);
  goto error_read;
 }

 if (fclose(f) == EOF) {
  fprintf(stderr, "Failed to close %s\n", filename);
 }

 return 0;

error_read:
 fclose(f);
error_open:
 return -1;
}
For each error case, we have a goto to a unique label. Starting at that label is all cleanup that should be done in the current case. The labels are therefore listed in reverse order to the gotos. A bit more discussion can be found at the staila blog. As rightly pointed out there, it results in readable code that unfortunately easily can rot during maintenance.

Automatic code checking to the rescue. We want to find all function definitions, and in each function definition we want to see all uses of goto and all labels, and verify that we first have one section of gotos but no labels, which is then followed by a section of labels, listed in reverse order of their corresponding gotos in the first section, without any jumps in it. We thus set up a MatchFinder and add our three patterns and bind their match results to names we can use in our callback:

        MatchFinder mf;
        GotoChecker handler;

        mf.addMatcher(functionDecl(isDefinition()).bind("func"), &handler);
        mf.addMatcher(gotoStmt().bind("goto"), &handler);
        mf.addMatcher(labelStmt().bind("label"), &handler);
When we run our tool on some code, our GotoChecker::run method will be called with its MatchResult set to one of these three things. So, we check which one it is, and if it's the start of a new function, we note that we're now in the normal section of the function (as opposed to the error handling section), and we clear our stack of labels:
       const clang::FunctionDecl* func =
                Result.Nodes.getNodeAs("func");
        const clang::GotoStmt* g =
                Result.Nodes.getNodeAs("goto");
        const clang::LabelStmt* label =
                Result.Nodes.getNodeAs("label");
        if (func) {
                in_error_section = false;
                gotos.erase(gotos.begin(), gotos.end());
        }
If we get a goto, we first check that we're not in the error handling section, and then append its label to our stack of labels.
        if (g) {
                clang::LabelDecl* label = g->getLabel();
                if (in_error_section) {
                        clang::DiagnosticsEngine& d =
                                Result.Context->getDiagnostics();
                        unsigned int id = d.getCustomDiagID(
                                clang::DiagnosticsEngine::Warning,
                                "Found goto to label %0 inside "
                                "error hanling section"
                        );
                        d.Report(g->getLocStart(), id) << label;
                }
                gotos.push_back(label);
        }
Here we can also see an example of how to produce diagnostics from a Clang tool. The DiagnosticBuilder returned by DiagnosticsEngine::Report accepts a number of different Clang types, and will format them appropriately. In this case, we're giving it a NamedDecl, so it knows to put it in quotes. The source location we give to Report also means that Clang knows to quote from the source and highlight exactly what we're objecting to.

Now, the final part of the puzzle is to handle labels. It would be really nice if we could verify that there's no way for the code to fall though from the normal section to the error handling section, but that would involve scanning backwards though the statements and checking that the last is either a return, a call to a function known not to return, or a conditional where all branches end with a return or a function knon not to return or a conditional where... Let's skip that for now, and just get the name of the label.

       if (label) {
                if (!in_error_section) {
                        // TODO: Check somehow that all paths have returned
                }
                clang::LabelDecl* found = label->getDecl();
                std::string name = found->getNameAsString();
Since we're enforcing that goto is only used for error handling, let's check that the name of the label reflects this:
                if (
                        name.substr(0, 6) != "error_"
                        && name.substr(0, 8) != "cleanup_"
                        && name.substr(0, 4) != "err_"
                        && name != "exit"
                ) {
                        clang::DiagnosticsEngine& d =
                                Result.Context->getDiagnostics();
                        unsigned int id = d.getCustomDiagID(
                                clang::DiagnosticsEngine::Warning,
                                "Illegal label name: %0"
                        );
                        d.Report(label->getLocStart(), id) << found;
                }
This feature could be debated. Maybe it should be optional, and maybe the permitted prefixes should be configurable, but that's for version 2.0.

Now all that is left is to check that the label we found matches the top of the label stack we built up from the gotos in the normal section.

                if (gotos.size() == 0) {
                        return;
                }
                clang::LabelDecl* expected = gotos.back();
                in_error_section = true;

                if (found != expected) {
                        clang::DiagnosticsEngine& d =
                                Result.Context->getDiagnostics();
                        unsigned int id = d.getCustomDiagID(
                                clang::DiagnosticsEngine::Warning,
                                "Error handling sequence mismatch. "
                                "Expected %0, found %1"
                        );
                        d.Report(label->getLocStart(), id) << expected << found;
                }

                gotos.pop_back();
Apart from a handfull of boilderplate lines (~4 lines plus some includes and namespace uses), that's it. What I like is that all the parsing and scanning and other things that would normally get in our way just disappears and we can go straight to implementing our checking logic.

The full source can be had at the project page. Note that it requires LLVM and Clang from trunk, but there are good instructions for getting that.

In summary, the Clang tooling infrastructure definitly zooms. I expect to be using this quite a bit from now on, and if you're working with C or its derivates, I think you should too.

2 comments:

  1. As 3.2 is out now, there's no longer a need for getting the latest from the svn trunk.

    ReplyDelete

Note: only a member of this blog may post a comment.