System-level component testing

One of my favourite parts of development is writing automated blackbox functional tests in the component scope. That is: writing automated tests that use a component as it is ment to be used in a system. The definition of "component" that I tend to use in this context is "as much of the code that my team is working on as makes sense and as little else as possible".

If your code just uses other libraries etc., then it's usually very simple to just mock those parts, especially if you're using a dynamic language. I won't waste your time by talking about that. Instead, I'll talk about when it gets interesting - when you're writing system-level tools in a static language.

Since the tool support (and by "tool support" I mean Valgrind) is best on Linux, I first try to make the code build as a self-contained executable in Linux, no matter what the target OS is. For small embedded OSes, you can probably just reimplement the OS functions and you're set. For Windows, suit your self. That leaves POSIX-like OSes, in which case the code should be fairly easily buildable in Linux. Here's where the fun starts.

So you have some code that calls open(2) on devices that don't exist on your workstation, connects sockets using address families unknown to civilization, or does all kinds of strange things that requires root privileges and can't be easily chrooted. How can you possibly write a harness for that? LD_PRELOAD, that's how.

LD_PRELOAD is a little-used feature of the GNU dynamic linker (and others, e.g. the one in Mac OS X) that lets you specify a dynamic library that is injected into an executable before other libraries are loaded. That, in combination with the rule that whoever first defines a symbol wins, means that you can reimplement any function you like in a library that is part of your harness and have those functions be used instead of the versions defined in, say, libc. If the component you're testing opens /dev/thingamajig, then just add a function that looks like open(2), but instead of actually opening the device node just tells your test scripts about it. One useful way of doing that is to have the test running as a separate process and have a Unix Domain Socket (or pair of pipes if you prefer) where you can send messages about what the component tried to do and receive instructions about what to do with the call.

Since reimplementing everything can be both tedious and slow, you may want to forward uninteresting calls to the normal versions of the functions you've overridden. This can be done using dlsym(RTLD_NEXT, "some_function"). That will make the dynamic linker look up the next library that has a symbol called "some_function" and give you a pointer to that. Assigning that to a function pointer variable gives you a way to call, say, the plain old libc open(2) from your magic open(2) for any file that the test scripts deem uninteresting. Something like this:

static int (*real_open)(const char *path, int oflag, ... );

void init_hackery(void)
  real_open = dlsym(RTLD_NEXT, "open");

int open(const char *path, int oflag, ... )
  int mode = 0;
  va_list va;
  va_start(va, oflag);
  if(oflag & O_CREAT)
    mode = va_arg(va, int);

  inform_scripts_about_open(path, oflag, mode);
    case HandleInScripts:
      return hanlde_open_in_scripts(path, oflag, mode);
    } break;
    case PassOnToLibc:
      if(oflag & O_CREAT) return real_open(path, oflag, mode);
      else return real_open(path, oflag);
With this in place, the test scripts will be informed about every single call to open, and when they decide to, they can take over and do something else (but you probably do want it to end up opening some kind of file descriptor in the process you're testing, as you may need to follow the usual rules for file descriptor numbering and reuse between both the files you fake and the ones you pass on to libc). Override the read, write, ioctl and close in the same manner (mapping the file descriptor to some mock object in your scripts), and the component can get its devices and whatnot and you get your tests. Have fun!


Ada Server Pages

It seems to me that sometimes, shoe-horning web applications into an Object Oriented design just for the sake of it just doesn't make sense. A class that doesn't have both state and operations is not a class. A framework built on classes that aren't really classes isn't really using the language as it should.

When building a small site, don't you sometimes secretly think that just plain JSP and some static helper methods wasn't all that bad? You're only half wrong. As long as you have a way out to a real language when you grow out of the shell you started in, it's not wrong to use simple tools for a simple end. It's just that for a purely procedural program, maybe Java isn't the right language.

So what's the solution? I'll tell you: Ada Server Pages. Some might call it The Thing That Should Not Be. I call it my first Ada program in almost 15 years. It compiles AdaSP pages into ada source, builds that into shared libraries, loads them into the server and calls them to serve the pages. And it works! At least on my particular OS X box.

Ada Server Pages. It's here. Tell your friends.


Defence-in-depth for data-base backed websites: Connections

As we've seen, connections can on some database/os combinations be scarce resources. If we're keeping one connection for each session, then won't that limit the site to something like 1000 sessions? Well, there is a way around that.

In a typical site, there will be a large number of users logged in, who all have sessions in the application (especially with long session timeouts), but not that many are actually active at the same time. If we could disconnect the connections for uses who are logged in but idle, and then reconnect when the users become active, then we'd get by using only as many connections as we expect to have simultaneously active users. That number is likely to be much lower, and a site that has 1000 users actively clicking on things in a, say, 5-minute period should probably run Oracle on Solaris anyway.

Basically we get a most-recently-used cache the size of the number of connections that our database and OS can provide for us, and the way we use it looks pretty much the same as a connection pool: get a connection, use it, and release it. The difference is that instead of blocking in the get until there is a connection for us, we may be reconnecting to the database (possibly after waiting, in case the active set is full and we're thrashing).

In a traditional setup, the web app of course knows how to log in to the database, since it's always using the same username etc. In the setup I'm proposing, only the users themselves know how to log on to the database. The web app could technically store the passwords, but that's madness from a security perspective. Cleartext passwords are to be discarded as soon as possible. Thus, we have to try some other way to log on to the database.

One solution that's pretty much made for this scenario is Kerberos. When a user logs in to the site, authenticate to the Authentication Service, get a Ticket-Granting Ticket, and store that one. Whenever you need to connect to the database on the behalf of the user, use the user's TGT to get a ticket for the database. Should work in theory, but in practice it can be a nightmare both to set up and to get working in Java. It's possible that this would be smoother in Windows, where I imagine you could put the users in Active Directory and be done with the first part, but whether it will work still depends on the Kerberos APIs and how your database drivers use them.

So, if we're passing on Kerberos, we can go for PAM or just roll our own solution. With PAM we could build a module that will let us use one-time passwords, so that we authenticate with the password via PAM on login, get a cookie, and then use that cookie when reconnecting. On logout, the cookie gets invalidated.

For my prototype, I've skipped even that and gone for pre-salted passwords. What I do is that before I send passwords to the database (including on enrollment), I hash the passwords together with a random per-user salt. That salted password is what the database sees when authenticating the users, and never the cleartext one. The salted passwords are then stored in the sessions of the users, and used when reconnecting. Thus, cleartext passwords are never stored, so an attack that would show the contents of session variables of other users would not immediately give a way passwords that the users potentially could be using for other sites.

Now that the prototype implementation can have very large numbers of simultaneous users, the real load testing and performance comparison of using a view filter versus working directly on the tables can commence.


1000 connections in PostgreSQL on OS X

PostgreSQL is not really geared towards more than a few dozen simultaneous connections on desktop operating systems. In this post, I'll show you how to push PostgreSQL 9 on Max OS X 10.6 to handle 1000 connections on a single system running both the webserver and the database.

The first step to getting to 1000 connections is to make PostgreSQL actually try to do it. By default, it only allows 100 simultaneous connections. Change /Library/PostgreSQL/9.0/data/postgresql.conf from max_connections = 100 to max_connections = 1000.

PostgreSQL creates one process per connection, and the web server will need one TCP connection to each one. Thus, to have 1000 connections, we need to allow at least 1000 processes per user (for the PostgreSQL processes) and at least 1000 open file descriptors per process (for the app server). Each PostgreSQL process seems to use 32 fds, and with some margin in case it grows, we should permitt 35000 fds in the system, plus some for other uses (in case you want to do other things on your machine, like logging in to it). The kernel seems to be picky about power-of-two increments, so I'll round the values up to things that it will accept.

Change /etc/sysctl.conf from




This will make sure the kernel reserves enough space etc. In addition, we need to allow processes to create enough sub-processes (as reported by ulimit -u). This limit is set by OS X's equivivalent of init, called launchd. Change /etc/launchd.conf (or create if you don't have it already) to say limit maxproc 2000 2000.

The changes to /etc/sysctl.conf and /etc/launchd.conf both take effect on bootup, so reboot the system and have fun with your 1000 connections!


Thee-tiered testing

The three-tiered program structure has been used to great benefit in many types of programs. I'm using it for my test script. It has been of great help, and I'll show you how I've done it.

The top tier in my tests is the test cases. These read like a high-level description of the steps of the test case, like this:

def test_create_thread():
session=login("p1", "p1")
other=get_forum_id(session, "Other")
before=get_num_threads(session, other)
create_thread(session, other, "New Thread", "Newly added thread post")
after=get_num_threads(session, other)
return before+1==after

Log in, pick a forum and check how many threads there are. Post a thread and verify that the thread count has increased by one. Easy stuff. All the details about how a thread is created in the application are abstracted away, and what's left is just the code related to the test case. So what do the get_num_threads etc. functions in the middle layer (application adaptation might be a name for it) look like?

def get_num_threads(session, forum_id):
return int(fetch_data(session, "forum?id=" + forum_id, "count(//html:tr[@class='thread'])"))

These functions tell the bottom layer which URL to fetch, and what parts of the results they are interested in. As you can see, there's an XPath query there. In some, like get_thread_id, where the text of a DOM node is too much, a regex can also be used to pick out parts of the text:

def get_thread_id(session, forum_id, title):
return fetch_data(session, "forum?id=" + forum_id, "//html:th[@class='subject']/html:a[text()='%s']/@href" % title, "id=(\\d+)")[0].group(1)

Since I'm testing a web application, the functions in the adaptation layer are of course implemented by fetching and parsing web pages. For other applications, I expect this layer to be implemented by composing and decomposing structured messages sent on some link, direct functions calls etc., but the role of the layer is the same: provide functions that correspond to functionality in the application, so that the upper layer can talk about things like reading a post instead of details about where the post is read from etc.

The bottom layer then is the workhorse functions like fetch_data and post. Here, I'll show fetch_data, which has proven itself to be very useful:

def fetch_data(session, url, xpath_query, regex=None, params={}):
conn=httplib.HTTPConnection("localhost", 8080)
encoded_params = urllib.urlencode(params)
headers = {"cookie":session}
conn.request("GET", "/myapp/" + url, encoded_params, headers)
response = conn.getresponse()
if response.status != 200:
print response.read()
raise Exception("Failed to fetch data. status=%d, reason=%s" % (response.status, response.reason))
doc=html5lib.parse(html, treebuilder="dom")
context = xpath.XPathContext()
context.namespaces['html'] = 'http://www.w3.org/1999/xhtml'
results=context.find(xpath_query, doc)
if regex:
results=map(lambda node:r.search(node.value), results)
return results

Build the request using the URL from the caller. Send it, verify that it was OK, and parse the response. Pick out the part that the caller is interested in, and return it. In the old days, parsing HTML was practically impossible. Only a handfull of companies had the resources necessary to write an HTML parser that could parse HTML as it is, not as it should. Even though I'm attempting to have my application only send out valid HTML, it may fail (and we should assume it does here: this is the test suite after all!), so having a parser that can handle anything would of course be nice.

Enter the new HTML spec, where Hixie has done an astounding job in specifying how tagsoup can be parsed in a way that is compatible with the major browsers. Since we now have a spec for parsing HTML, little parsing libs based on this keep popping up everywhere. I'm using the Python html5lib, which can produce DOM trees, which in turn support XPath queries.

That's it for the how of three-tiered tests. Now the why: In addition to having easily readable test cases for the functionality test, it also helped with the load test. Having the middle layer in place meant that the load test I'm writing has simply been a joy to write. Had the functionality testing been done using copy-pasted HTTP/HTML-related code (as it was before I started restructuring it), I'd have to start over. Now, I had almost every function I needed, with names that make sense. Just look at it!

for i in range(num_users):
username="load_user_%s_%d" % (instance, i)
signup(username, password, "%s@example.org" % username)
session=login(username, password)

for i in range(num_actions):

if random.randint(1,10)<10:
post_to_existing_thread(session, forum_id)
create_new_thread(session, forum_id)

for session in sessions:

Here, post_to_existing_thread and create_new_thread are functions similar to the test cases in the functionality test. All in all, I had to add two new functions to the adaptation layer. The rest was reused, and the load test is (at least to me) plainly readable.

So: the three-layered approach to writing tests definitely zooms. Not only should you use it for your next project: you should apply it to the tests in your current one as soon as possible!