2010-07-29

Defence-in-depth for data-base backed websites: Introduction

Many years ago a college of mine (we cal call him Dr. Kroll, because that's what we used to call him) asked over beer a question that stuck with me for a long while.

When we build web applications that talk to databases, we generally let them do any action that they might ever need, at all times. Isn't that a bit reckless? Is there really a need to let the database permit the webserver to access Alice's data when it's processing a request from Bob?

Oracle has been talking about eliminating the One Big Application User (see e.g. Introducing Database Security for Application Developers), and OWASP talks about having the webapp use more than one account to limit what the application can do in their Guide to Authorization, but this is all about limiting what different parts of the code are allowed to do, in essence to the set of all operations that one piece of code could ever need. A more dynamic approach is suggested by Microsoft for application service providers that want to keep data from different customers separate by what they call Multi-Tenant Data Architecture.

In the approach Microsoft describes, the application connects to the database as the principal on whose behalf it's acting, so that the database can take that into account when deciding whether an operation should be authorized. In their example, the principal is the customer, and the filter is there so that users that belong to one customer cannot read data from other customers. I suggest that we extend that model to ordinary users for ordinary web sites.

I've been trying out a way of doing this for a little toy web application, and it seems to work quite well. The way it works is that we create one schema for holding the tables for the application, and another that holds views that are set up so that they structurally look exactly like the tables, but only show data that the application user is allowed to see. This requires one database user per application user, and that the webapp stops using a pool of generic connections, and instead uses one connection per logged-in application user. This is not as bad as it may first sound. It can even be added to an existing application with minimal changes - some changes to signup, login, and connection management, but nothing at all on a per-page basis. I'll start of with an easy example, and then go though different parts of the solution in a few upcoming posts.

The web app I built to test these ideas is a forum. One of the functions that commonly exists in fora is private messages. In my app, the messages are stored in a table that looks like this:


CREATE TABLE impl.messages
(
id serial not null,
"from" integer NOT NULL,
"to" integer NOT NULL,
posted timestamp without time zone NOT NULL,
subject character(80),
"text" text,
CONSTRAINT pk_pm PRIMARY KEY (id),
CONSTRAINT fk_from FOREIGN KEY ("from")
REFERENCES impl.users (id),
CONSTRAINT fk_to FOREIGN KEY ("to")
REFERENCES impl.users (id)
);

(I'm using PostgreSQL, by the way. The ideas should be possible to implement in any competent database, but I haven't tested on any other yet).

This table would hold every private message sent by any user to any other user. When the application tries to read a message, however, we should limit it to only reading messages sent to or from the currently logged in user. Thus, in the schema that the application users use, we have a view that enforces that rule:


CREATE VIEW kroll_user.messages AS
SELECT id, "from", "to", posted, subject, text
FROM impl.messages
WHERE "to"=(SELECT id FROM kroll_user.current_app_user) OR "from"=(SELECT id FROM kroll_user.current_app_user);

This gives us a view of the messages table that looks like that the current user is the center of the universe. Every message in the table is either to or from that user. The (SELECT id FROM kroll_user.current_app_user) will be explained in a later post, but for now, let's just assume that it will return the ID of the application user in the application's users table.

So what does this give us? In a perfect webapp, then not much. If every part of the application properly filters the data it reads so that no user can ever get away with something like replacing the ID parameter in the URL to the message display page to that of a message meant for another user, and no parameter is ever pasted into an SQL statement without proper escaping, etc. then wouldn't need this. We also would never have any successful SQL injection attacks or news about stolen database records.

If we apply the principle of least privilege to how the application accesses the database, then we only permit the application to do things that the user on whose behalf it's operating is authorized for. That means that no matter how bad the application is at filtering out requests for other user's data, the database will still not leak anything that the user was not meant to see. As a matter of principles, this is how we should do things. I'm not even sure why we aren't doing it already. As I'll show in the next few posts, it isn't even hard to do.