2010-03-11

The most natural of all keys

I sometimes hear, as an argument for using surrogate keys, that there is
no natural key for a table of persons. There is. It's a compound key
consisting of your mother's primary key and a sequence number telling
which of her chldren you are. In SQL:

create table person
(
parent integer[],
sequence int,
name varchar(100),
primary key (parent, sequence)
);

insert into person (parent, sequence, name) values ({}, 0, 'Lucy');
insert into person (parent, sequence, name) values ({0}, 0, 'Albert');
insert into person (parent, sequence, name) values ({0}, 1, 'Betty');
insert into person (parent, sequence, name) values ({1,0}, 0, 'Zim');


Here, Lucy is the first human ever (she has no human ancestor). Albert
and Betty are her children, and Betty has a son, Zim.

The only problem with this is that the foreign key cannot be
described in SQL. It actually has nothing to do with it being a
self-referencing key, but that the parent and
sequence columns of the referenced row should be
parts of the parent column of the referencing row. Something
like the following: foreign key (parent[1], parent[2:]) references person
(sequence, parent)
. Unfortunatly, only column names are permitted
in foreign key constranits, not arbitrary expressions. Implementing
the constraint as a CHECK instead is left as an excercise to
the reader.

2010-03-09

Const correctnes and callback interfaces in C

From time to time, I use and/or design callback interfaces in C. Usually, the signature of the callback will be similar to

typedef int (*callback)(int some_argument, void* user_data);


and the function that calls the callback is

int do_something_with_callback(int another_argument, callback f, void* user_data);


This is all well and good (as long as the documentation for do_something_with_callback specifies exactly how if will call f). But consider the following example

#include
int add(int a, void* user_data)
{
int* b=(int*)user_data;
return a + (*b);
}

typedef int (*callback)(int some_argument, void* user_data);

int apply_arithmetic_function(int initial_value, callback f, void* user_data)
{
return f(initial_value, user_data);
}

int main()
{
int v=1000;
printf("%d\n", apply_arithmetic_function(123, add, &v));
return 0;
}


This works, but as you can see, the add function does not write though its user_data pointer. Therefore, in isolation, it could be marked const, which is ususally a good thing. But is int add(int a, const void* user_data) compatible with the callback interface?


At first glance, it would seem like it, since something that takes a const pointer can be passed a non-const pointer (and gcc 4.3.4 with -Wall -Wextra -Werror -std=c99 -pedantic permitts it). But then again: what if the calling convention is different for const and non-const pointers? In a plain function call, it will work, since the compiler sees the signature of the actual function that is being called, but though the function pointer, it does not.


Looking at the (draft) spec, it is not permitted.
6.7.5.3§15 says:

For two function types to be compatible, both shall specify compatible return types. [127]


Moreover, the parameter type lists, if both are present, shall agree in the number of parameters and in use of the ellipsis terminator; corresponding parameters shall have compatible types.


127) If both function types are ‘‘old style’’, parameter types are not compared.


So, all the parameters in the signatures have to be "compatible". What does that mean? 6.7.3§9 says:
For two qualified types to be compatible, both shall have the identically qualified version of a compatible type ...

Ergo, the function signatures are not compatible, and 6.5.2.2§9 says:
If the function [that is to be called] is defined with a type that is not compatible with the type (of the expression) pointed to by the expression that denotes the called function, the behavior is undefined.


Conclusion: be careful with the types of callback functions. Copy-paste verbatim from the declaration, and write a comment at the site of the callback implementations saying that they are supposed to conform to a specific interface and that the parameter declarations may not be modified unless the interface is also changed.

2009-11-27

get an iso-8601 string in Java

Will they ever get it right. The Date handling in Java is a funny story but here is one way to get an ISO-8601 compatible date/time string.


GregorianCalendar cal = new GregorianCalendar(new Locale("sv", "SE"));
SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ssZ");
String tmpTime = formatter.format(cal.getTime());
tmpTime = tmpTime.replaceAll("\\+0000$", "Z");
tmpTime = tmpTime.replaceAll("(\\d\\d)$", ":$1");

2009-11-13

Zooming with SCons

I tried out Scons on a non-java project (i.e a project where it has a chance to succeed) to see if it really does zoom. And zooms it does! I am a bit picky with build systems, and I can't say SCons is without flaws, but just look at this (mildly anonymized) example:


I'm building a small middleware program. The environment I'm doing it in requires me to write my own stubs. I can't do much about that, but I want my build to ensure that each part of the client, server, middleware core, and stubs for the interfaces only include things from the directories they are allowed to depend on.


With GNU Make (my long-time favourite, since it in contrast to most other build tools actually performs its job), this is a mess. Since there can only be one pattern in a rule, you end up with something like


CLIENT_INCLUDES=-Iservice2_interface -Iservice2_stubs
SERVER_INCLUDES=-Iservice1_interface -Iservice2_interface
#...
$(BUILD)/client/%.o: client/%.c | $(BUILD)/client
$(CC) -c -o $ $^ $(CLIENT_INCLUDES) $(CFLAGS)
$(BUILD)/server/%.o: server/%.c | $(BUILD)/server
$(CC) -c -o $ $^ $(SERVER_INCLUDES) $(CFLAGS)
#...
$(BUILD)/client:
mkdir -p $@
$(BUILD)/server:
mkdir -p $@
#...


This is just for two sub-directories. And whenever something that is common to all changes, the rule for each sub-directory has to be updated. Not much fun. And no, a makefile for each directory is not a solution.


In SCons, it becomes (for all six sub-directories):


env=Environment()
VariantDir('build', '.', duplicate=0)·
objs=[]
for (dir, includes) in [ \
("server", ["service1_interface", "service2_interface"]), \
("client", ["service2_interface", "service2_stubs"]), \
("util", []), \
("middleware", ["service1_interface", "service1_stubs", "service2_interface", "service2_stubs"]), \
("service1_stubs", ["service1_interface"]), \
("service2_stubs", ["service2_interface"]) \
]:
local_env=env.Clone()
local_env["CPPPATH"].extend(includes)
sources=map(lambda s:"build/%s" % s, Glob("%s/*.c" % dir))
objs.extend(local_env.StaticObject(sources))
env.Program('my_test.exe', objs)


This code just loops over pairs of directory names and permitted includes, copies the global build environment (which is set up at the top, and can include useful things like env['CCFLAGS']="-Wall -Wextra -Werror"), appends the permitted includes to the list of include directories for this sub-directory. Next, it takes each C file in the sub-directory, and prefixes it with "build/" (this is one aspect I'm not that happy with), sets up rules for building object files for them, and then appends the names of those object files to the big list of all objects for the project. Finally, another rule is set up to build the binary from the objects.


A neat thing with SCons is that even though the rules are set up step by step in an imperative fashion (as can be expected from a system that uses Python as its configuration language), nothing is actually built directly by the StaticObject and Program calls above. Instead, what happens is that rules are appended to the environment. When the script terminates, SCons takes over and analyses the produced rule set. From that, it can build dependency chains, check what is already built, and re-build what is necessary. Just like it should.


That said, I haven't tried it on any more demanding jobs where intermediate files are built (which is what ant and other shell-script replacements that some mistake for build systems fail at), but the documentation for it looks promising, and the developers (and the Cons designers before them) seem to know what they are doing.


Veridct: Zooms!

2009-09-08

head in the cloud

My computer died and with it my mercurial repo, Trac and maven sites. So I figured I'll put my stuff "in the cloud". To try something new I decided not to use sourceforge or ops4j but look at google or Kenai. Since Kenai won't let you create a project without waiting for their approval I went with code.google.com
So far I've been pretty happy, the mercurial repo works ok, the issue tracker is fine for my humble needs (too bad there is no integration in NetBeans or Eclipse) and the Wiki is good enough.
Check it out at http://code.google.com/p/zoom-desktop/

To further the cloud experience I wanted to write some REST+JSON stuff and put it in google app engine, just for kicks. So I tried my hand at the google app engine plugin for eclipse and also the one for NetBeans. Both deliver what I expected and works quite nicely, thank you.
Since I haven't found my favourite framework for REST stuff in java I and a pal decided to start the Put JSON to REST project at http://code.google.com/p/pj2r/
It's chugging along, REST is fun and I'm learning that I didn't know half of what I thought I knew about HTTP.
The conclusion: google zooms. Jersey and JAX-RS zooms if you deal with XML but try to fine tune the JSON strings that JAXB spits out and you'll be pretty confused.

2009-08-18

How to punish competence

Here is a recurring pattern I've noticed. As soon as a coder grows a little bit competent at coding he or she usually starts to have oppinions about the structure supporting coding. Programmers are by nature problem solvers and strive to make things more effective and less repetetive.
Many programmers, as they mature, starts to form ideas on how, for example, project leadership could be improved (Scrum, RUP, XP), tools could be better (IDEas, build systems, languages) or infrastructure could be improved (SOAP, Application servers).
What then happens is that "the man" assignes these persons to improve the stuff above in projects. The problem is that "the man" has no understanding for that these things costs a lot of money to improve. "The man" has no knowledge at all at providing the necessary preconditions to succeed at these tasks.
I don't know how many projects I've been in where the best programmer (and for the record I'm not talking about myself) has been assigned to fix the build system, install the issue tracking and holding the hands of noobs and loosers to try to make them more productive.This for the simple reason that "no one else knows these things" but the more experienced developers.
This leads to project staffing where the noobs and general loosers do the actual coding, the experienced programmers do maintenance tasks and no one does what they are good at.
Usually the more experienced a programmer is the less they have to do with the actual craft of coding. They spend time in meetings with third party providers endlessly hashing over protocol details, following up on orders of insfrastructure equipment or explaining to money people that "things cost money".

It kills the joy, the spirit and the quality in any programmer that gets the least bit competent.

2009-06-02

Indexed regex searches

Just did a little Python implementation of the ideas in A Fast Regular Expression Indexing Engine. I only did the naïve thing and search the whole corpus each time in sel, but it works and it's cool.


Since mucking around with the regex library to get it to search an index for documents that an expression might match seemed to be too big a task (and I'm afraid of big, scary real-world regex engine implementations) I wrote a little parsing library modeled around Parsing Combinators. Easy to implement, easy to use, and very powerful. You should definitely use them the next time you need an ad-hoc parser.


Anyway, back to the searching. The proper way of implementing sel is probably to compute all the sel results for all k-grams for a given k in one swell foop. This would reduce the number of whole-corpus searches from approximately 100000 times in my current toy example to around 20 for any corpus. That would even make it usable for real-world applications. Maybe it's time to play with Lucene again...