2012-11-21

Tracking down memory corruption by mprotecting your ADTs

In C, it's customary to design your code around Abstract Data Types, that is, modules that consist of a header file that declares the external interface of the module (consisting of an opaque struct and a set of functions operating on that struct), and an implementation file (which has the full declaration of the structure, the definitions of the functions in the header and any helper functions). The header would be something like this:
#ifndef INCLUDE_GUARD_STACK_H
#define INCLUDE_GUARD_STACK_H

#include <stddef.h>
#include <stdbool.h>

struct stack;

struct stack* stack_create(size_t size);
bool stack_push(struct stack* stack, int i);
bool stack_pop(struct stack* stack, int* i);
void stack_destroy(struct stack* stack);

#endif

The implementation, then, is:

#include "stack.h"

#include <stdlib.h>

struct stack
{
  int* elements;
  size_t used;
  size_t allocated;
};

struct stack* stack_create(size_t size)
{
  struct stack* stack = malloc(sizeof(struct stack));
  stack->elements = malloc(size * sizeof(int));
  stack->used = 0;
  stack->allocated = size;
  return stack;
}

bool stack_push(struct stack* stack, int i)
{
  if (stack->used == stack->allocated) {
    goto error_full;
  }
  stack->elements[stack->used++] = i;
  return true;
error_full:
  return false;
}

I'll leave the implementation of the rest of the functions to your imagination. Since only the forward declaration of the struct is in the header, no code outside the implementation can access the members of the struct.

Now, assume we have a memory corruption fault somewhere in the rest of the program which when triggered corrupts the elements pointer but doesn't have any other effects. Our program them seems to be working fine until some later time when it suddenly crashes due to an invalid memory access in stack_push. We'd really like to get the program to abort at the point of the original corruption of the elements pointer, but how can we do that?

One way of solving that is to use the fact that since the structure is opaque, there is no way that any code outside our implementation file has any legitimate use of touching any of the memory to which struct stack* points. Since no other code has any business accessing that memory, then maybe we can have the OS help us preventing it from doing that? Enter mprotect.

The mprotect function lets us control what types of access should be permitted to a region of memory. If we access the memory in any other way, the OS is free (and in some cases even required) to abort our program at the spot. If we keep the memory inaccessible at all times except for when we use it inside our implementation functions, then chances are we can catch the memory corruption as it happens. The mprotect man page does say that the memory it protects has to be page aligned, though. How do we do that? Via posix_memalign and getpagesize, like so:

#include "stack.h"
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>

struct stack* stack_create(size_t size)
{
  struct stack* stack;
  posix_memalign((void**)&stack, getpagesize(), sizeof(struct stack));
  posix_memalign((void**)&stack->elements, getpagesize(), size * sizeof(int));
  stack->used = 0;
  stack->allocated = size;
  protect(stack);
  return stack;
}

Now we just have to implement and use the protect function mentioned above and its inverse, unprotect:

static void protect(struct stack* stack)
{
  mprotect(stack->elements, stack->allocated * sizeof(int), PROT_NONE);
  mprotect(stack, sizeof(*stack), PROT_NONE);
}
static void unprotect(struct stack* stack)
{
  /* Unprotect in the reverse order, or we crash and burn trying to read stack->elements */
  mprotect(stack, sizeof(*stack), PROT_READ|PROT_WRITE);
  mprotect(stack->elements, stack->allocated * sizeof(int), PROT_READ|PROT_WRITE);
}
bool stack_push(struct stack* stack, int i)
{
  unprotect(stack);
  if (stack->used == stack->allocated) {
    goto error_full;
  }
  stack->elements[stack->used++] = i;
  protect(stack);
  return true;
error_full:
  protect(stack);
  return false;
}

Since this ends up modifying bits in the MMU, it may not be suitable to have enabled on performance critical code, so an #ifdef NDEBUG switch that selects an empty implementation of protect and unprotect for non-debug builds could be advisable.

So does protecting your ADTs via mprotect zoom? Well, it does come in handy at times, and there's not much disadvantage to using it, so my verdict is: Zooms!