get an iso-8601 string in Java

Will they ever get it right. The Date handling in Java is a funny story but here is one way to get an ISO-8601 compatible date/time string.

GregorianCalendar cal = new GregorianCalendar(new Locale("sv", "SE"));
SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ssZ");
String tmpTime = formatter.format(cal.getTime());
tmpTime = tmpTime.replaceAll("\\+0000$", "Z");
tmpTime = tmpTime.replaceAll("(\\d\\d)$", ":$1");


Zooming with SCons

I tried out Scons on a non-java project (i.e a project where it has a chance to succeed) to see if it really does zoom. And zooms it does! I am a bit picky with build systems, and I can't say SCons is without flaws, but just look at this (mildly anonymized) example:

I'm building a small middleware program. The environment I'm doing it in requires me to write my own stubs. I can't do much about that, but I want my build to ensure that each part of the client, server, middleware core, and stubs for the interfaces only include things from the directories they are allowed to depend on.

With GNU Make (my long-time favourite, since it in contrast to most other build tools actually performs its job), this is a mess. Since there can only be one pattern in a rule, you end up with something like

CLIENT_INCLUDES=-Iservice2_interface -Iservice2_stubs
SERVER_INCLUDES=-Iservice1_interface -Iservice2_interface
$(BUILD)/client/%.o: client/%.c | $(BUILD)/client
$(CC) -c -o $ $^ $(CLIENT_INCLUDES) $(CFLAGS)
$(BUILD)/server/%.o: server/%.c | $(BUILD)/server
$(CC) -c -o $ $^ $(SERVER_INCLUDES) $(CFLAGS)
mkdir -p $@
mkdir -p $@

This is just for two sub-directories. And whenever something that is common to all changes, the rule for each sub-directory has to be updated. Not much fun. And no, a makefile for each directory is not a solution.

In SCons, it becomes (for all six sub-directories):

VariantDir('build', '.', duplicate=0)·
for (dir, includes) in [ \
("server", ["service1_interface", "service2_interface"]), \
("client", ["service2_interface", "service2_stubs"]), \
("util", []), \
("middleware", ["service1_interface", "service1_stubs", "service2_interface", "service2_stubs"]), \
("service1_stubs", ["service1_interface"]), \
("service2_stubs", ["service2_interface"]) \
sources=map(lambda s:"build/%s" % s, Glob("%s/*.c" % dir))
env.Program('my_test.exe', objs)

This code just loops over pairs of directory names and permitted includes, copies the global build environment (which is set up at the top, and can include useful things like env['CCFLAGS']="-Wall -Wextra -Werror"), appends the permitted includes to the list of include directories for this sub-directory. Next, it takes each C file in the sub-directory, and prefixes it with "build/" (this is one aspect I'm not that happy with), sets up rules for building object files for them, and then appends the names of those object files to the big list of all objects for the project. Finally, another rule is set up to build the binary from the objects.

A neat thing with SCons is that even though the rules are set up step by step in an imperative fashion (as can be expected from a system that uses Python as its configuration language), nothing is actually built directly by the StaticObject and Program calls above. Instead, what happens is that rules are appended to the environment. When the script terminates, SCons takes over and analyses the produced rule set. From that, it can build dependency chains, check what is already built, and re-build what is necessary. Just like it should.

That said, I haven't tried it on any more demanding jobs where intermediate files are built (which is what ant and other shell-script replacements that some mistake for build systems fail at), but the documentation for it looks promising, and the developers (and the Cons designers before them) seem to know what they are doing.

Veridct: Zooms!


head in the cloud

My computer died and with it my mercurial repo, Trac and maven sites. So I figured I'll put my stuff "in the cloud". To try something new I decided not to use sourceforge or ops4j but look at google or Kenai. Since Kenai won't let you create a project without waiting for their approval I went with code.google.com
So far I've been pretty happy, the mercurial repo works ok, the issue tracker is fine for my humble needs (too bad there is no integration in NetBeans or Eclipse) and the Wiki is good enough.
Check it out at http://code.google.com/p/zoom-desktop/

To further the cloud experience I wanted to write some REST+JSON stuff and put it in google app engine, just for kicks. So I tried my hand at the google app engine plugin for eclipse and also the one for NetBeans. Both deliver what I expected and works quite nicely, thank you.
Since I haven't found my favourite framework for REST stuff in java I and a pal decided to start the Put JSON to REST project at http://code.google.com/p/pj2r/
It's chugging along, REST is fun and I'm learning that I didn't know half of what I thought I knew about HTTP.
The conclusion: google zooms. Jersey and JAX-RS zooms if you deal with XML but try to fine tune the JSON strings that JAXB spits out and you'll be pretty confused.


How to punish competence

Here is a recurring pattern I've noticed. As soon as a coder grows a little bit competent at coding he or she usually starts to have oppinions about the structure supporting coding. Programmers are by nature problem solvers and strive to make things more effective and less repetetive.
Many programmers, as they mature, starts to form ideas on how, for example, project leadership could be improved (Scrum, RUP, XP), tools could be better (IDEas, build systems, languages) or infrastructure could be improved (SOAP, Application servers).
What then happens is that "the man" assignes these persons to improve the stuff above in projects. The problem is that "the man" has no understanding for that these things costs a lot of money to improve. "The man" has no knowledge at all at providing the necessary preconditions to succeed at these tasks.
I don't know how many projects I've been in where the best programmer (and for the record I'm not talking about myself) has been assigned to fix the build system, install the issue tracking and holding the hands of noobs and loosers to try to make them more productive.This for the simple reason that "no one else knows these things" but the more experienced developers.
This leads to project staffing where the noobs and general loosers do the actual coding, the experienced programmers do maintenance tasks and no one does what they are good at.
Usually the more experienced a programmer is the less they have to do with the actual craft of coding. They spend time in meetings with third party providers endlessly hashing over protocol details, following up on orders of insfrastructure equipment or explaining to money people that "things cost money".

It kills the joy, the spirit and the quality in any programmer that gets the least bit competent.


Indexed regex searches

Just did a little Python implementation of the ideas in A Fast Regular Expression Indexing Engine. I only did the naïve thing and search the whole corpus each time in sel, but it works and it's cool.

Since mucking around with the regex library to get it to search an index for documents that an expression might match seemed to be too big a task (and I'm afraid of big, scary real-world regex engine implementations) I wrote a little parsing library modeled around Parsing Combinators. Easy to implement, easy to use, and very powerful. You should definitely use them the next time you need an ad-hoc parser.

Anyway, back to the searching. The proper way of implementing sel is probably to compute all the sel results for all k-grams for a given k in one swell foop. This would reduce the number of whole-corpus searches from approximately 100000 times in my current toy example to around 20 for any corpus. That would even make it usable for real-world applications. Maybe it's time to play with Lucene again...


Multi-threaded tests

Here's a little trick I used once to do some tests on threaded code: Wrap the thread library, and have the wrapper run one thread at a time (cooperative threading style, except yield() is implicit in all thread library calls) in a deterministic fashion. That makes the test repeatable and can detect any deadlocks that depend on which thread is scheduled at which yield() if you just re-run the test for all permutations of the scheduling order. On the minus side, you will never find any bugs that depend on scheduling another thread in the middle of something else (a side effect of the determinism you get from the cooperative threading), but my guess is that it's worth it in most cases.


topless objects

In connection to Magnus last post I mentioned some thoughts I have had for a long time regarding web applications and OO. Since I started in earnest doing web applications in java I have never felt really comfortable with the "common" design pattern. It kind of felt a lot better when doing Tapestry and Wicket but the unease never reall went away. In this post I'll try to put to words something that is still pretty vague in my mind so don't expect any kind of consistent story.

A common wisdom some years ago was that web applications was built in three tiers. The GUI layer with JSP, struts or some stuff like that. Som sort of controller layer or business logic layer, and finally a data access layer. The lower layers might be EJB or Spring. Whichever framework the layers usually are pretty isolated. Data is passed between them in simple structs. This usually results in a lot of mindless copying of data from one struct named 'somethingTO' into another struct called 'somethingFormBean'.
The GUI handlers, the controllers and the data access code are usually stateless, even if an ORM is used the state they hold are pretty simple. In other words, not a lot of object orientation going on.
The problem I think with this pattern is that it does not lend itself to reuse, it contains a lot of boilerplate copying of data transfer objects with no apparent gain. Sure, the layers are isolated from knowing implementation details of other layers. But any form of encapsulation has this goal so the tierd design pattern isn't unique in this.

One 'reaction' to this is naked objects (http://en.wikipedia.org/wiki/Naked_objects). Where basically the domain model is translated to classes. The business logic lives in the domain classes and operations on domain objects are exposed on the web. This sure sounds like a lot more OO-design applied to the web. I have no real experience working with this kind of framework so I might have everything backwards about them but I think the domain model here are related to business rules. Operating on entities like 'customer', 'books' or what nouns that sound relevant to the domain. I would like to name this the bottom-up approach. Desinging from the data layer upwards to the GUI layer.

The thing I would really like to see is a top-down approach to naked objects, maybe topless objects :-) That is, the domain model is not the nouns used to describe some data in a DB or back-end system. Rather I would suggest that the domain in web applications is request and response. So the basic class is request and response, each request is modelled regarding the input data. That is request and post parameters, session parameters and so on. The response is modelled with regards to the information that is to be published from the web server. I think this is more related to the interaction description on a web site, use cases or something similar. So you start with modelling the information that is to be sent to the server and what is to be retrieved. Similar interactions with respect to dataflow can for example be implemented as related classes.

So, relating it to Magnus post about lifecycles of objects. One could model the request/response as a state machine. The start state is the request coming in to the server. Proceeds through states like validating input, deciding where to go next, run business rules, retreive/update data, format for display, render html and then end. In this way the same state machine (might be the same implementing class) collects information and decides where to go. So a web framework would end up as something that has a repository of state machines that is triggered from the HttpServletRequest.

Yeah, I don't know. Perhaps that would lend itself to more OO practices in implementing web stuff. I guess I have to build it and see :-)

Some completely unrelated notices:
A remainder about writing JDK proxies and handling InvocationTargetException

Check out the fork of HiveMind called Gaderian


Object states, roles, and access modifiers

Johan sometimes says that he thinks objects should have their life cycles explicit. When you first create an object, only some calls are valid (e.g. for a socket object, only calls for setting up how it should connect and the connect method). Later on, more functionality is available (in our case, methods for reading, writing, and closing the socket). This should be explicit. Something like this:

public interface SocketFactory
UnconnectedSocket createSocket()
public interface UnconnectedSocket
void bind(IPEndPoint localAddress);
ConnectedSocket connect(IPEndPoint remote);
public interface ConnectedSocket
int read(byte[] buf, int offset, int count);
int write(byte[] buf, int offset, int count);
ClosedSocket close();
public interface ClosedSocket
class Socket : UnconnectedSocket, ConnectedSocket, ClosedSocket
// ...

This is similar to a way of writing three-layer web applications that I've used in the past: the presentation layer requests an obejct from the logic layer, in which it writes the parameters for the query. It then passes it to the logic layer, which can augment the query, cache, etc. If necessary, the query is sent to the data layer along with a response object. The data layer uses the request object to look up some data, and writes it into the response object and returns. The logic layer then augments, caches (etc.) the response, and gives it to the presentation layer. The easy way of doing it it to just have a single class for creating the query (presentation), for reading the query (data), for writing the response (data), and for reading the response (presentation). In some cases, you might need to use different classes. A solution is to have the class in the presentation layer, and let it implement four interfaces for the different ways of using it. Something like:

interface WritableUserRequest
void setUsername(String username);
interface ReadableUserRequest
String getUsername();
interface WritableUserResponse
void setRealName(String realName);
interface ReadableUserResponse
String getRealName();
class User implements WritableUserRequest, ReadableUserRequest, WritableUserResponse, ReadableUserResponse
private String userName;
private String realName;
public void SetUserName(String userName)
// Similar for all getters and setters

In essence, this is about having one object let different objects interact with it in different ways at different times. We can group the methods in different little packets and hand some packets to some clients, and others to others. Some methods will be in many packets, some in only one.

Which brings us to the final part of the title: access modifiers. Access modifiers lets us do something like this, but in a very primitive way. We get four different groups of methods, one for everyone, one for only the same class, one for all the subclasses, and the funny, nameless exact-same-package-only group. A more flexible mechanism for grouping methods would save us from all the needless typing that creating interfaces for each method group. Come to think of it, there is another way: annotations. A follow-up with some annotation processing goodness forthcoming, some time after I've written about error handling and published my other little hacks which are the reason Johan got me into writing here in the first place.


Yet another reason to avoid Windows development

In the Java world, we're used to having a (sometimes long) list of exceptions specified, indicating which kinds of errors a method may report. I have some issues with the implementation of this in Java (more on this when I can be bothered to write a few pages), but at least the errors are part of the interface. This makes it possible to write correct code. Not so in Windows:

"The error codes returned by a function are not part of the Windows API specification and can vary by operating system or device driver. For this reason, we cannot provide the complete list of error codes that can be returned by each function. There are also many functions whose documentation does not include even a partial list of error codes that can be returned."

Avoid. Trust me.


jar indexing and hivemind

I have only read about jar-indexing but never used it. It is supposed to speed up classloading when using several JARs. I can certainly tell you it speeded up my program since it went from working to failing very fast.
Let me explain. I have a Swing-application that uses hivemind as IoC-container. Hivemind is sweet, it discovers injection information automatically from all jar-files on the classpath. It uses a pretty simple getResource to get all files called META-INF/hivemodule.xml regardless of hte JAR-file they live in.
So I have an executable jar-file (you know, with the main-class attribute in the manifest file). It has a Class-Path attribute that points to some other JAR-files. This has always worked before I started converting my build scripts to maven.
Since I mindlessly copied the example from the jar-plugin page (http://maven.apache.org/plugins/maven-jar-plugin/examples/manifest-customization.html) the maven JAR-plugin created the jar-index file INDEX.LST.
With this index file in the jar hivemind failes almost directly.
I guess it is because the index file did not contain the dynamically loaded resources, at least that is the spec for INDEX.LST. If it exists it is trusted to contain everything needed. Since the jar-plugin did not index my dynamically loaded resources they simply do not exist as far as the classloader goes.

You learn something new every day.


maven sites

So I decided to give maven another go for my small home projects. Last time I tried it was Maven 1 and I ran crying back to Ant+Ivy.

I wanted all those cool maven sites for my projects since the reporting stuff is really nice. Now, I have a multi-module build system. In my old Ant builds that was convinient for producing the web site and get the navigation linked between parent and child projects. So I used the same directory layout and that's when the strange stuff started. As is usual with maven the documentation is lacking here and there although it is much better now than in the Maven 1 days. But I understood from the documentation that maven would produce module links for the reactor build.
Having come to grips with the site:staging-funny stuff I noticed that it did generate a module menu. However it was empty. Try as I might I just could not get it to generate a link to the module project.

I "solved" it by letting the child project have the parent project as parent in the pom.
Now, this is mighty strange to me. I actually do not want the project in the higher directories to be parent pom:s. I do not want to inherit settings that way. Actually I do not want to inherit anything. I want the projects to be self contained apart from the tags.

I guess what I really want is for the site mojo to find the tag. Understand that it should generate a module link. Then go to the module itself and look at the URL for that project and use that for a link target. It might even go as far as looking at the URL and find out if it is a relative link. If not it can try to make it relative. But I do not understand why the child must point to the parent to generate the module links.
When I ran into problems with maven:stage when one if the modules was in itself a pom-pom with modules itself I gave up and made a flat directory structure. Maven site is cool, or rather, the reporting is cool and links to that is nice. Maven site in itself does not zoom.

Funny that my Ant-version of the site mojo is actually two pretty simple XSLT-files :-) I know that isn't a fair comparison but still.

So since I grew tired of subversion I tried bazaar. The IDE integration is lacking and the trac-plugin sort of works halfway I decided to try Mercurial. So far I've noticed that it works for simple use cases and that Ubuntu Intrepid packs version 1.0.1 and not the 1.1.2 release of mercurial. Hope they start using the newer stuff but I guess Canonical is more interested in pushing bazaar. Can't blame them since bazaar is sweet.