2010-08-19

Defence-in-depth for data-base backed websites: Reading

Now that we can create users and log them in, let's try to let them read some data. The actual tables are locked away in the impl schema to which regular users have no access. Selected parts of the tables can be made available though views. As we saw in the introduction, a table containing private messages between users can be exposed as a view that selects only the messages to or from the current user like so:


CREATE VIEW kroll_user.messages AS
SELECT id, "from", "to", posted, subject, text
FROM impl.messages
WHERE "to"=(SELECT id FROM kroll_user.current_app_user)
OR "from"=(SELECT id FROM kroll_user.current_app_user);

So what's this SELECT id FROM kroll_user.current_app_user? current_app_user simply a view containing the appuser/dbuser mapping for the current user:


CREATE VIEW kroll_user.current_app_user AS
(SELECT appuser as id FROM mapping.users WHERE dbuser=current_user);

current_user is a special SQL-standardized function that is called without the usual parentheses and returns the login name of the current data base user as a string. Or, that's what Postgresql returns. There seems to be some disagreement on whether it should be the login name or the current security context, with Postgresql, DB2, and Firebird in the first camp and MS-SQL, MySQL and Oracle in the second. As long as we don't call it from a privileged stored procedure, that should not matter, however.

If we would like to filter out columns instead of rows, then this can be done by selecting null instead of the data for the column. Row-based and column-based filtering can also be combined. For the users table, I want to hide the email address of all users, except the logged-in user's own address. UNION to the rescue:


CREATE VIEW kroll_user.users AS (
select id, "name", null as email, moderator from impl.users
where id <> kroll_user.get_current_app_user()
) UNION (
select id, "name", email, moderator from impl.users
where id = kroll_user.get_current_app_user()
);
grant select on kroll_user.users to kroll_role_user;

Setting up read-only views of tables is pretty much straight-forward: decide what cells users should be allowed to see in each table, come up with SELECT statements that pick those, and make them into views.

Sequences are a bit trickier. Postgresql defines a sequnce to be "special single-row table" that can be used only via the functions nextval, currval, and setval. Unfortunately, Postgresql only allows currval to be run on actual sequences, and not on views of sequences. Since the sequences are data and thus stored in the impl schema to which users don't have access, we have to make a little workaround. Instead of virtualizing the sequence, we can virtualize the accessor functions:


create function kroll_user.currval_threads_id_seq() returns bigint as
'select pg_catalog.currval(''impl.threads_id_seq'');'
language sql security definer;

create function kroll_user.currval(unknown) returns bigint as
'select case
when CAST($1 as text)=''threads_id_seq''
then kroll_user.currval_threads_id_seq()
else pg_catalog.currval(CAST($1 as text))
end;'
language sql;

This creates a new function called currval in the user schema (which is the first schema in the users' search path) that checks if the requested sequence is the one we want to virtualize (threads_id_seq). If it is, then we use a privileged access function for that specific sequence (which is a database object that we can explicitly grant access to). If it isn't, then we delegate to the built-in currval (which in Postgesql is stored in the pg_catalog schema). The delegation isn't strictly necessary if users don't have access to non-virtualized sequences, but it ensures we got correct error reporting at least.

That's about all there is to reading and filtering data, which is the majority of what most web apps do, and most of the code is plain old standard SQL. Next up will be writing data, which will lead us in to heavy vendor-specific terrain.

4 comments:

  1. This also has the added benefit of moving more complex queries from the application into the database in the form of views. Hopefully leading to that developers that know SQL can focus on writing that and let the rest of us less knowledgable coders write other stuff.
    So not only does it lead to reduced risk of leaking data it also increases division of labour.

    ReplyDelete
  2. Well, yes and no. Yes, views can simplify things a bit, and they have a more flexible interface than stored procedures. I think it's important to keep views used for convenience separate from views used for implementing virtual private databases though. That can be done by having a virtualization layer directly on top of the tables like how I describe in these posts, and then put a convenience layer on top.

    When I first started writing my example app, I conflated the two and had things like an inbox view that contained all messages sent to the currently logged in user. That kind of worked, but I redid it the current way when I realized that I could achieve a schema that looks identical to the table schema, since that both makes the virtualization easier to audit, and makes it possible to retrofit this scheme to existing applications.

    ReplyDelete
  3. For sure, I was thinking of the virtual database; used this way the database implements the authentication in SQL/views. The usual alternative is that the SQL lives in the application with the WHERE clause embedded in application code.
    But as you say, virtualization and convinience views have different goals and a good "layering" or division of labour makes for a cleaner implementation.

    ReplyDelete
  4. Right. Both can be seen as moving parts of the WHERE clauses to the database.

    Since I'm pitching this a s defence in depth, I still keep the WHERE clauses in the app. That way, if there's some aspect that gets forgotten when looking at it from a table perspective, then maybe it will be remembered when looking at it from a query perspective.

    But convenience views would be good. SQL queries are a bit like having to say something in a single breath. I have some giant queries in the app (e.g. the thread list, which includes parts of both the first and the latest posts to each thread) that could have benefitted from being able to breathe a bit in between saying things.

    ReplyDelete