2017-09-13

pid 1 rage

Ok so most of the linux world have gone bonkers and converted to systemd (except slackware - the original rude boy).
I tried to like systemd (well, not really) but it is impossible. It is a gigantic tire fire of badness, reminds me of the clusterf*** that is pulse-audio (hint, hint, know what I mean).

So I thought to myself; if I must have a gigantic monolithic opaque pid 1 process (flying in the face of all that is unix) then by jove it must be something else. Enter RancherOS as one of the dists that do not use systemd. It has a glorious solution, it runs docker as pid 1 and most of the OS services are actually docker containers. It is so fantastically insane that it is really brilliant.

So I'm in the process of converting "the cluster", i.e. my 3 raspberry pi machines to rancher OS. It is not straight forward but this is the process I use. So caveat # 1. My workstation is a windows machine for the reason that I must play starcraft 2 and that does not run on linux. So I got a cheap USB connected SD card reader to write my SD cards. I use Rufus as the utility of choice to write images to the SD card. It has worked really well.

However, the RancherOS on the pi does not auto-expand the root partition. There is also the little snag of having to write the initial cloud-config.yml file with the SSH-keys so I can get at the pi after it has booted. In dire need of a linux machine that can hack the SD-card I tried virtualization. Luckily VirtualBox can expose USB devices to a guest. So I booted up my virtual linux, mounted the SD card reader through USB and by magic it appears. So I can run gparted on the sd card from the virtual linux machine and resize the rootfs. After that it is just a matter of creating a /var/lib/rancher/conf/cloud-config.yml with all the SSH keys on the SD card. Plug it into the pi and boot.

And it freaking works. After a while the pi snagged an IP from the dhcp-server and I could SSH into it. Now to do this a couple of more times and then all my servers are on RancherOS and systemd is just a bad memory on the bare metal servers. It will still haunt many of the docker images but maybe I can force supervisord to be pid 1 :-)

2013-11-12

gradle

I've just quickly tried out gradle as a build system. I have three observations so far.

1) Netbeans have a pretty good integration

2) Repository configuration is in the build script. That is in my book a pretty big f*ckup, maven got rid of that with maven 2. I know I can write my own build script but a new build system where there is no convention for configuration of things that are specific to the build machine and not the source code... The irony of using Ivy and obscuring everything that is good with Ivy.

3) The run task... It is currently not possibly (said the forum 1 year ago and that's all I could find on this subject) to pass command line arguments to the application. You offer a "run" task but it is unusable.

If I have to write those things myself I might as well stick to bare Ant + Ivy.

2013-10-30

Designing OO APIs

I'm trying to find a good pattern to designing an API/Framework that let's clients subclass to inherit convenient behavior.
These are my requirements:

The framework owns the collection of "Things". Like this:
class Framework {
  void addAThing(Thing t)...
}

A "Thing" is probably a composition of other stuffs. So it must have a Stuff getStuff() method. The framework uses a Things Stuff sometimes.

interface Thing {
   Stuff getStuff() ;
}

So one implementation of Thing is the simple version that gets stuff from the outside.
class Thing1 implements Thing {
   private final Stuff myStuff ;
   public Thing(final Stuff myStuff) { this.myStuff = checkNotNull(myStuff) ; }
   public Stuff getStuff() { return this.myStuff }
}

The problem I have is that there can be many different Things, and the actual creation of Stuff is something that I want the subclasses to be able to implement. So I'll help them along with this:

abstract class ParentThing implements Thing {
   private final Stuff myStuff ;
   public ParentThing() { 
        this.myStuff = checkNotNull(createStuff()); 
   }
   public Stuff getStuff() { return this.myStuff }
   protected abstract Stuff createStuff() ;
}

The problem with that is that I'm relying on subclasses to be very nice about the implementation of createStuff() since the method is called in the constructor. It shouldn't for example register "this" as a callback to some other thread in createStuff since "this"-instance might not have left the constructor when the callback occurs.

So maybe this then:

interface Thing {
    void initialize() ;
    Stuff getStuff() ;
}

and then I'll fix it so that framework calls initialize before it uses the Thing, like so
 
class Framework {
  void addAThing(Thing t){ t.initialize(); ... }
}

But then I can't use final in the ParentThing class anymore, leaving me with another state of Things, "created but not initialized yet". 

So maybe I actually should have 

interface ThingBuilder {
   Thing createAThing() ;
}

and then

class Framework {
  void addAThing(ThingBuilder t)...
}

Might not be so bad with closures, but without closures/lambda it gets a bit messy on the client side.

Ideas are welcome. How to design an OO API? 

2013-04-04

Debian Squeeze, chrome and firefox woes

Ok so I am guilty of running stable software, I use Debian Squeeze. It is rather conservative in updating versions of things, doing a lot of testing, and for that reason everything "just works", even if I can't run the latest games and so on.

Google chrome has another strategy, they update their version of the browser all the time. Recently I got updated to Version 26.0.1410.43.
Now chrome says that I am running on an unsupported version of my operating system (Debian 6). It has a link to a page that states what versions are supported. That page says that Debian 6 is supported.
Yeah
Testing

So given the recent public relations stunts that Google has pulled (like ditching google reader - anyone remember orkut, still running....) I wasn't to keen on cutting them any slack, I'll just go back to firefox despite the abbysmal WebGL performance.

Firefox isn't included in Debian because of ... principles. Firefox will not license their logo in a Debian friendly manner because it's like trademark and their property and stuffs. Debian will not include software that have restricted licenses and trademarks and stuffs.
Principles, I can actually dig that. I'll just download it myself.

Happy like a little child finding a fresh puddle of mud on a spring day I headed over to mozilla.org to get me some firefox goodness.
Downloaded the version 19, unpack, click binary, wait for things to break (they usually do). But it ran. That's pretty nice of them.
Only it looked really crappy but I didn't have time to look into what that was about.
Until now, I'm home with a cold and between sneezing and other unpleasantness I tried to get Java-applets running. Only to find out that the reason firefox looks like, pardon the language, shite, and that the java plugin doesn't even get recognized is because it is a 32-bit build.
This is the year of our grace 2013.
I went 64-bit before my niece was born and she is in school now. I mean, come on!
The stupid, stupid, stupid, download page automatically detects I'm on linux and starts a download without asking and it picks the wrong arch. Ok, accidents happen, so where is the download link then? There freaking isn't any!
The solution was to google my way through countless questions until finding ftp://ftp.mozilla.org/pub/firefox/releases/
Klicking my way to the 64 bit version 19 beta build. Unpack that, then let firefox auto-update to version 20 and THEN the plugins work.
It looks good now also because it doesn't try to load GTK from the wrong 32-bit libraries that it obviously can't find since this is, as said, a 64-bit clean machine.
Yeah
Testing

2013-01-30

Minimum viable web framework

In some projects I've been working on, there's a small part of common code and thinking that one could call a minimum viable web framework (well, technically, it's a library and not a framework, but I digress). There's absolutely nothing new in it, and if you've ever done even one Servlet app, you know it already. I see that as an advantage.

My minimum viable web framework consists of 8 lines of code and one principle. The code (quoted below) is a static method (on some utility class) that takes a path to which to forward a request, the request and response objects, and then a vararg list of alternating names and values to put as request attributes. The principle is: if something produces any HTML, then that's all that thing does (inverted: if you don't produce HTML, you don't produce HTML). Anything that produces HTML (in my cases, JSPs) only reads values prepared for it from the request attributes (and if you have something that produces some output that isn't HTML, then it's by definition a special case and is exempt from this rule). It's OK to have lists and other simple data structures as request attributes, and it's OK to use simple looping constructs (e.g. a JSTL core forEach) in the rendering, but you can't do things like calling methods from there.

public static void forward(String path, HttpServletRequest request, HttpServletResponse response, Object... attributes)
throws ServletException, IOException 
{
        for(int i=0; i<attributes.length; i+=2) {
                request.setAttribute((String)attributes[i], attributes[i+1]);
        }
        request.getRequestDispatcher(path).forward(request, response);
}

Together, the code and the principle give a simple way of keeping application code and page formatting separate, and I've been quite happy with it in my projects. There's a lot of things missing that you might expect a framework to provide (e.g. input validation or data access). I see those as nice to have and that the only essential part of a web framework is facilitating the separation of logic and rendering.

So, what do you think - is this web framework both minimal and viable? If it's not minimal - how can we minimise it? If it's not viable, then what's missing? And last but not least: does it zoom?

2013-01-22

OpenTSDB

At my daytime gig I have come into contact with two applications that really zooms.

One is Splunk and I'm sure the developers of that application drink awesome-sauce for breakfast. It is so awesome I won't even write more about it here because I just don't know what words to use. Suffice it to say that it is a log analysis tool that actually beats 'find, xargs, awk, grep' and all of those.

The other application I've came into contact with is OpenTSDB. OpenTSDB is a time series database, that means you put metrics into it that you want to plot over time. OpenTSDB uses hbase as a database. That is the Apache hadoop database.
OpenTSDB has a web front end for plotting the data points that uses gnuplot. It is rather simplistic as a web front but it is mostly bug free and despite a few little quirks it "just works" exactly like you want it to. It does one thing (send a query to OpenTSDB and plot a PNG-image) and it does it good.

We use OpenTSDB to monitor our servers and applications. We put numbers into it like, how much heap has the server, what is the number of busy threads, how many messages was put on the message queue, what was the response time for each web service call.
It is extremely helpful in monitoring and post-mortem analysis of application behaviour. I mean like, really useful. I can correlate exactly the number of packets the load balancer sends to a certain host at 5s intervals with the number of busy threads, the heap size, the cpu load and a load of application specific metrics extracted from the JVMs using JMX.
The JMX-collector, developed in house, is written by some pretty clever guys to be fast but you can also write one, it isn't that hard. Remember to do everything async. and use caching and you're good to go.

I just can not stress enough how incredibly useful OpenTSDB is for not only monitoring what happens now but what happened that sunday when a couple of servers want haywire and didn't respond. Given the precision of the correlation it is very easy to find the relevant log entries from the time stamps.

To put in to perspective how good opentsdb & hbase is. We dump I would say more than 1 metric/second (usually we poll a specific metric each 5 sec or so - depending on what it is).
From each server, and we have > 35 servers.
To a single opentsdb + hbase server.

And it just freaking works. I can actually see exactly what the heap size was for host X on christmas eve.

Just tonight I installed OpenTSDB on my local machine/server, I don't need it at home but just to pay tribute to it.

2012-11-21

Tracking down memory corruption by mprotecting your ADTs

In C, it's customary to design your code around Abstract Data Types, that is, modules that consist of a header file that declares the external interface of the module (consisting of an opaque struct and a set of functions operating on that struct), and an implementation file (which has the full declaration of the structure, the definitions of the functions in the header and any helper functions). The header would be something like this:
#ifndef INCLUDE_GUARD_STACK_H
#define INCLUDE_GUARD_STACK_H

#include <stddef.h>
#include <stdbool.h>

struct stack;

struct stack* stack_create(size_t size);
bool stack_push(struct stack* stack, int i);
bool stack_pop(struct stack* stack, int* i);
void stack_destroy(struct stack* stack);

#endif

The implementation, then, is:

#include "stack.h"

#include <stdlib.h>

struct stack
{
  int* elements;
  size_t used;
  size_t allocated;
};

struct stack* stack_create(size_t size)
{
  struct stack* stack = malloc(sizeof(struct stack));
  stack->elements = malloc(size * sizeof(int));
  stack->used = 0;
  stack->allocated = size;
  return stack;
}

bool stack_push(struct stack* stack, int i)
{
  if (stack->used == stack->allocated) {
    goto error_full;
  }
  stack->elements[stack->used++] = i;
  return true;
error_full:
  return false;
}

I'll leave the implementation of the rest of the functions to your imagination. Since only the forward declaration of the struct is in the header, no code outside the implementation can access the members of the struct.

Now, assume we have a memory corruption fault somewhere in the rest of the program which when triggered corrupts the elements pointer but doesn't have any other effects. Our program them seems to be working fine until some later time when it suddenly crashes due to an invalid memory access in stack_push. We'd really like to get the program to abort at the point of the original corruption of the elements pointer, but how can we do that?

One way of solving that is to use the fact that since the structure is opaque, there is no way that any code outside our implementation file has any legitimate use of touching any of the memory to which struct stack* points. Since no other code has any business accessing that memory, then maybe we can have the OS help us preventing it from doing that? Enter mprotect.

The mprotect function lets us control what types of access should be permitted to a region of memory. If we access the memory in any other way, the OS is free (and in some cases even required) to abort our program at the spot. If we keep the memory inaccessible at all times except for when we use it inside our implementation functions, then chances are we can catch the memory corruption as it happens. The mprotect man page does say that the memory it protects has to be page aligned, though. How do we do that? Via posix_memalign and getpagesize, like so:

#include "stack.h"
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>

struct stack* stack_create(size_t size)
{
  struct stack* stack;
  posix_memalign((void**)&stack, getpagesize(), sizeof(struct stack));
  posix_memalign((void**)&stack->elements, getpagesize(), size * sizeof(int));
  stack->used = 0;
  stack->allocated = size;
  protect(stack);
  return stack;
}

Now we just have to implement and use the protect function mentioned above and its inverse, unprotect:

static void protect(struct stack* stack)
{
  mprotect(stack->elements, stack->allocated * sizeof(int), PROT_NONE);
  mprotect(stack, sizeof(*stack), PROT_NONE);
}
static void unprotect(struct stack* stack)
{
  /* Unprotect in the reverse order, or we crash and burn trying to read stack->elements */
  mprotect(stack, sizeof(*stack), PROT_READ|PROT_WRITE);
  mprotect(stack->elements, stack->allocated * sizeof(int), PROT_READ|PROT_WRITE);
}
bool stack_push(struct stack* stack, int i)
{
  unprotect(stack);
  if (stack->used == stack->allocated) {
    goto error_full;
  }
  stack->elements[stack->used++] = i;
  protect(stack);
  return true;
error_full:
  protect(stack);
  return false;
}

Since this ends up modifying bits in the MMU, it may not be suitable to have enabled on performance critical code, so an #ifdef NDEBUG switch that selects an empty implementation of protect and unprotect for non-debug builds could be advisable.

So does protecting your ADTs via mprotect zoom? Well, it does come in handy at times, and there's not much disadvantage to using it, so my verdict is: Zooms!