2008-08-19

Packet Decoding and the Limits of (mainstream) Static Typing

From time to time, I write some code to decode packets that are on the form (type, length, data), where type is some field (usually an int) specifying what kind of packet this is, length is the number of data bytes in the packet, and data is a sequence of bytes containing the meat of the packet. This entry is about how to parse them into objects in a clean and type-safe manner.


Naturally, when dealing with these things, I want to have a class for each packet type. Since the mapping between what's in the data field for a packet type and what the correspoding class contains is fixed, it seems natural to me to place the code for decoding and encoding is the class itself: a constructor for building an instance from the data field and a method for encoding a packet into a data byte array.


So given this, I want to write a little decoding loop that reads the type and then creates a new packet of the correct type using the length and data fields. This is where the problems start. Ideally, I would just have a map from type to class, index with the type I just read and then call the constructor of the class, but in the statically typed languages I usually dabble in there is no way of specifying a contract that talks about constructors. This means that there is no way of describing the type of the map. Factory methods are also out of the question, since static methods can't be part of the contract either. Apart from talking about the instances of the types and the relations between types, there is not much more the type systems alow you to say (well, except in spec-abusing C++ and languages with dependent types).


That seems to leave me with one option left: using instances as spring-boards for creating classes. Something like the following would work:


abstract class Builder
{
public abstract Packet Build(byte[] data);
}
class FooBuilder : Builder
{
public override Packet Build(byte[] data)
{
return new FooPacket(data);
}
}
Then the type of the map would be Dictionary<int,Builder>, and then all that is necessary is to create a builder class per packet type and then register the builders.


First, let's look at registering the builders. Since the builders don't contain any fields (and thus aren't really objects IMNSHO), the can be singletons, and classes could register themselves with the map in a static constructor. Except that static constructors only are run when the class is loaded, and since no code is supposed to refer to them directly (which is kind of the point of having the map to begin with), they will never be loaded, and thus never register. So again, we have to abandon doing things statically and have someone (e.g. the class holding the map) fill the map with items, one by one. So much for isolating the instantiation.


So, now we're left with something like


abstract class Builder {/*...*/}
class FooBuilder {/*...*/}
class PacketFactory
{
private Dictionary<int,Builder> builders=new Dictionary<int,Builder>();
public PacketFactory()
{
builders.Add(FooPacket.Id, new FooBuilder());
}
public Packet Build(int type, data[] data)
{
return builders[type].Build(data);
}
}

which is (sort-of) fine, except for all the extra code that was necessary for just calling new. Well, lambda/delegate/closure/anonymous-inner-class/whatever to the rescue. In current Java, you could do it something like

interface Builder
{
Packet build(byte[] data);
}
//...
buiders.add(FooPacket.Id, new Builder(){public Packet build(byte[] data) {return new FooPacket(data);}});

which is still a bit too much noise for my taste, or in C#

delegate Packet Builder(byte[]);
//...
builders.Add(FooPacket.Id, delegate(byte[]) {return new FooPacket(r);});

which is better. Still, being able to say that a Packet class is required to have a static member that holds the ID that the described packets have and has a constructor that takes a byte array would have removed the need for the builder interface, a (possibly anonymous) class for just delegating the call, and all the resulting code noise.

So some questions then: Why is it we cannot say that a FooPacket is a Packet with 42 as its type Id or that all Packet subtypes can be constructed from byte arrays? Also, why do type parameters only get to say things about the types the parametric type refers to, and not the names it uses when doing so? The Pair<A,B> kind of types would be so much more usefull if it would let the subclass decide what the accessors are called. Answers on a postcard.

2008-08-17

bzr commit -m "zoom"

So I've switched to using Bazaar VCS instead of subversion. So far I haven't got the eclipse plug-in to work. It complains about not finding the xml-output bazaar plugin. Bazaar o.t.o.h lists it as a plug in after I've finally installed it correctly. Since my coding is a solo-project I've not really run into any complicated merges so I do not know if bazaar lives up to it's own hype. To be really honest I think I've still to understand what all the concepts like, trees, branches, commit and so on hangs together. But so far and from what I've read I really like bazaar. I hope to put it to the test in some more complicated merges in the future. Specifically the refactoring stuff you do in java, move a file and change its content while merging in other textual changes in the file. This is something that I've done a lot of in ClearCase which handles the merges pretty well but sometimes it really sucks. There are lots of other reasons to really hate clearcase but branching and merging is something it does pretty darn good all things considered.
My non-blogging friend, which I've managed to meet for a friendly game of drunken zombies has hinted that he will start blogging. Looking forward to it :-)