Especially nice is the AST matching support that landed recently. With the AST matchers, you can describe patterns of code that you're interested in in a declarative DSL, and get a callback called for all instances of that pattern in the code you're parsing. It's a bit like XPath for C.
As a demonstration of this, I've written a little code checker that checks that code follows a particular error handling style, and warns of any transgressions. The error handling style I chose is the one I prefer in C, which I call the reverse label style. The reverse label style has been made popular though its use in the Linux kernel, and looks like the below example:
For each error case, we have a
int read_file_normal(const char* filename)
{
FILE* f = fopen(filename, "rt");
char buf[100];
if (f == NULL) {
fprintf(stderr, "Failed to open %s", filename);
goto error_open;
}
if (fread(buf, sizeof(buf), 1, f) == 0) {
fprintf(stderr, "Failed to read data from %s\n", filename);
goto error_read;
}
if (fclose(f) == EOF) {
fprintf(stderr, "Failed to close %s\n", filename);
}
return 0;
error_read:
fclose(f);
error_open:
return -1;
}
goto
to a unique label. Starting at that label is all cleanup that should be done in the current case. The labels are therefore listed in reverse order to the gotos. A bit more discussion can be found at the staila blog. As rightly pointed out there, it results in readable code that unfortunately easily can rot during maintenance.
Automatic code checking to the rescue. We want to find all function definitions, and in each function definition we want to see all uses of goto
and all labels, and verify that we first have one section of goto
s but no labels, which is then followed by a section of labels, listed in reverse order of their corresponding goto
s in the first section, without any jumps in it. We thus set up a MatchFinder
and add our three patterns and bind their match results to names we can use in our callback:
When we run our tool on some code, our
MatchFinder mf;
GotoChecker handler;
mf.addMatcher(functionDecl(isDefinition()).bind("func"), &handler);
mf.addMatcher(gotoStmt().bind("goto"), &handler);
mf.addMatcher(labelStmt().bind("label"), &handler);
GotoChecker::run
method will be called with its MatchResult
set to one of these three things. So, we check which one it is, and if it's the start of a new function, we note that we're now in the normal section of the function (as opposed to the error handling section), and we clear our stack of labels:
If we get a
const clang::FunctionDecl* func =
Result.Nodes.getNodeAs
goto
, we first check that we're not in the error handling section, and then append its label to our stack of labels.
Here we can also see an example of how to produce diagnostics from a Clang tool. The
if (g) {
clang::LabelDecl* label = g->getLabel();
if (in_error_section) {
clang::DiagnosticsEngine& d =
Result.Context->getDiagnostics();
unsigned int id = d.getCustomDiagID(
clang::DiagnosticsEngine::Warning,
"Found goto to label %0 inside "
"error hanling section"
);
d.Report(g->getLocStart(), id) << label;
}
gotos.push_back(label);
}
DiagnosticBuilder
returned by DiagnosticsEngine::Report
accepts a number of different Clang types, and will format them appropriately. In this case, we're giving it a NamedDecl
, so it knows to put it in quotes. The source location we give to Report
also means that Clang knows to quote from the source and highlight exactly what we're objecting to.
Now, the final part of the puzzle is to handle labels. It would be really nice if we could verify that there's no way for the code to fall though from the normal section to the error handling section, but that would involve scanning backwards though the statements and checking that the last is either a return, a call to a function known not to return, or a conditional where all branches end with a return or a function knon not to return or a conditional where... Let's skip that for now, and just get the name of the label.
Since we're enforcing that
if (label) {
if (!in_error_section) {
// TODO: Check somehow that all paths have returned
}
clang::LabelDecl* found = label->getDecl();
std::string name = found->getNameAsString();
goto
is only used for error handling, let's check that the name of the label reflects this:
This feature could be debated. Maybe it should be optional, and maybe the permitted prefixes should be configurable, but that's for version 2.0.
if (
name.substr(0, 6) != "error_"
&& name.substr(0, 8) != "cleanup_"
&& name.substr(0, 4) != "err_"
&& name != "exit"
) {
clang::DiagnosticsEngine& d =
Result.Context->getDiagnostics();
unsigned int id = d.getCustomDiagID(
clang::DiagnosticsEngine::Warning,
"Illegal label name: %0"
);
d.Report(label->getLocStart(), id) << found;
}
Now all that is left is to check that the label we found matches the top of the label stack we built up from the goto
s in the normal section.
Apart from a handfull of boilderplate lines (~4 lines plus some includes and namespace uses), that's it. What I like is that all the parsing and scanning and other things that would normally get in our way just disappears and we can go straight to implementing our checking logic.
if (gotos.size() == 0) {
return;
}
clang::LabelDecl* expected = gotos.back();
in_error_section = true;
if (found != expected) {
clang::DiagnosticsEngine& d =
Result.Context->getDiagnostics();
unsigned int id = d.getCustomDiagID(
clang::DiagnosticsEngine::Warning,
"Error handling sequence mismatch. "
"Expected %0, found %1"
);
d.Report(label->getLocStart(), id) << expected << found;
}
gotos.pop_back();
The full source can be had at the project page. Note that it requires LLVM and Clang from trunk, but there are good instructions for getting that.
In summary, the Clang tooling infrastructure definitly zooms. I expect to be using this quite a bit from now on, and if you're working with C or its derivates, I think you should too.
As 3.2 is out now, there's no longer a need for getting the latest from the svn trunk.
ReplyDeleteUpdate: Moved to Gitlab
ReplyDelete