2010-07-29

Defence-in-depth for data-base backed websites: Introduction

Many years ago a college of mine (we cal call him Dr. Kroll, because that's what we used to call him) asked over beer a question that stuck with me for a long while.

When we build web applications that talk to databases, we generally let them do any action that they might ever need, at all times. Isn't that a bit reckless? Is there really a need to let the database permit the webserver to access Alice's data when it's processing a request from Bob?

Oracle has been talking about eliminating the One Big Application User (see e.g. Introducing Database Security for Application Developers), and OWASP talks about having the webapp use more than one account to limit what the application can do in their Guide to Authorization, but this is all about limiting what different parts of the code are allowed to do, in essence to the set of all operations that one piece of code could ever need. A more dynamic approach is suggested by Microsoft for application service providers that want to keep data from different customers separate by what they call Multi-Tenant Data Architecture.

In the approach Microsoft describes, the application connects to the database as the principal on whose behalf it's acting, so that the database can take that into account when deciding whether an operation should be authorized. In their example, the principal is the customer, and the filter is there so that users that belong to one customer cannot read data from other customers. I suggest that we extend that model to ordinary users for ordinary web sites.

I've been trying out a way of doing this for a little toy web application, and it seems to work quite well. The way it works is that we create one schema for holding the tables for the application, and another that holds views that are set up so that they structurally look exactly like the tables, but only show data that the application user is allowed to see. This requires one database user per application user, and that the webapp stops using a pool of generic connections, and instead uses one connection per logged-in application user. This is not as bad as it may first sound. It can even be added to an existing application with minimal changes - some changes to signup, login, and connection management, but nothing at all on a per-page basis. I'll start of with an easy example, and then go though different parts of the solution in a few upcoming posts.

The web app I built to test these ideas is a forum. One of the functions that commonly exists in fora is private messages. In my app, the messages are stored in a table that looks like this:


CREATE TABLE impl.messages
(
id serial not null,
"from" integer NOT NULL,
"to" integer NOT NULL,
posted timestamp without time zone NOT NULL,
subject character(80),
"text" text,
CONSTRAINT pk_pm PRIMARY KEY (id),
CONSTRAINT fk_from FOREIGN KEY ("from")
REFERENCES impl.users (id),
CONSTRAINT fk_to FOREIGN KEY ("to")
REFERENCES impl.users (id)
);

(I'm using PostgreSQL, by the way. The ideas should be possible to implement in any competent database, but I haven't tested on any other yet).

This table would hold every private message sent by any user to any other user. When the application tries to read a message, however, we should limit it to only reading messages sent to or from the currently logged in user. Thus, in the schema that the application users use, we have a view that enforces that rule:


CREATE VIEW kroll_user.messages AS
SELECT id, "from", "to", posted, subject, text
FROM impl.messages
WHERE "to"=(SELECT id FROM kroll_user.current_app_user) OR "from"=(SELECT id FROM kroll_user.current_app_user);

This gives us a view of the messages table that looks like that the current user is the center of the universe. Every message in the table is either to or from that user. The (SELECT id FROM kroll_user.current_app_user) will be explained in a later post, but for now, let's just assume that it will return the ID of the application user in the application's users table.

So what does this give us? In a perfect webapp, then not much. If every part of the application properly filters the data it reads so that no user can ever get away with something like replacing the ID parameter in the URL to the message display page to that of a message meant for another user, and no parameter is ever pasted into an SQL statement without proper escaping, etc. then wouldn't need this. We also would never have any successful SQL injection attacks or news about stolen database records.

If we apply the principle of least privilege to how the application accesses the database, then we only permit the application to do things that the user on whose behalf it's operating is authorized for. That means that no matter how bad the application is at filtering out requests for other user's data, the database will still not leak anything that the user was not meant to see. As a matter of principles, this is how we should do things. I'm not even sure why we aren't doing it already. As I'll show in the next few posts, it isn't even hard to do.

3 comments:

  1. To right, this is a better way to do it than One Big Application User. However I think that you sell Oracle a bit short, their Fine Grain Access Control and Virtual Private Database achieves exactly this (although in another manner than views).

    Anyway, my current thinking on this subject is that each tier (and maybe layer within a tier) should include an authorization mechanism. The idea is that even if one tier exposes more data than it should the next tier might get it right, making it harder to corrupt/expose data.

    So I would say that what you are doing is, in a very clever way, building an authorization mechanism into the database service (data tier). The only true separation of data is if the physical user stores his private data on a personal physical device. Everything else is just code making informed decisions about what you are allowed to see/modify - what I call an authentication mechanism.

    Another interesting aspect to this is authentication and how to federate Principals across (even remotely) tiers. I would propose that Java JAAS gets this pretty well although I haven't actually built anything of value using it so I might just be impressed with the sales pitch :-)

    Looking forward to the other posts!

    ReplyDelete
  2. Sure, most databases have ways of doing row-level access control, but using them would still require logging application users in to the database.

    The approach I'm taking here is to avoid any scary database things as far as possible, to make it palatable also for database-averse web developers.

    The ideas here could of course be taken further - instead of logging the user on to the database, you log them on to the Kerberos server, and then have all the support systems use that. In effect, you'd welcome the user in to your data center, and have the web app be their client software. Or, as you say, this could be taken to use different sites via federation. But those are different stores for different times ;-)

    ReplyDelete
  3. I do believe that you are right, good use of views will prevent unauthorized access to data, however, there is a problem with this kind of setup. I think the reason you usually end up with "one application user" is management of users with access to the database.
    It has nothing to do with if developers are scared of using the authorization mechanism that came with the database or not.
    Identity management is IMO the key problem. If you solve that I think it will be easy to implement many authorization mechanism along the wire between user & raw data storage.

    ReplyDelete

Note: only a member of this blog may post a comment.