Monday, January 16, 2012

Marriage Sort, a visualization

You'll need a modern browser that handles SVG (a recent Chrome, Firefox, or IE9 will do) to properly see this post.

On Saturday, Mike Bostock posted a link on Hacker News to an explanation of the Fisher-Yates Shuffle he had written. Alongside the explanation he included a very slick visualization where you could see it in action. Now, at a fundamental level, shuffle algorithms and sorting algorithms are simply the reverse of each other. So if this visualization does well on shuffle algorithms, why not a sorting algorithm as well?

A couple years ago I wrote a sorting algorithm, which I gave the tongue-in-cheek name Marriage Sort (an homage to the problem that inspired it - see the introduction post for more details). I was looking for a project to play with on the weekend, so I thought, why not try the visualization on that? Mike has kindly let me re-use (butcher?) his code, so here's the same visualization of Marriage Sort.

Disclaimer: The visualization I'm building off looks great across platforms, so if there is anything wrong with this one, it's from my changes. Go see the original if you don't believe me.

For those that don't remember the algorithm, here's a synopsis. There are two stages:
  • Stage 1:
    1. Take a prefix of size √m, where m is the size of the working set.
    2. Pick the maximum (in this case, most-right-leaning) of the prefix, and take it as the pivot.
    3. Walk through the list, pulling out anything greater than the pivot and moving it to the end of the working set.
    4. Move the pivot to the end of the working set.
    5. Go back to step 1 and repeat.
  • Stage 2: a bog-standard insertion sort.
As the passes proceed the pivots decrease in size and more values are matched, allowing the array to be put into a "mostly sorted" state. On average, every element is within √n spots of the correct position, which the insertion sort then corrects. Overall complexity O(n1.5) time, O(1) extra space.

Code for the visualization is at https://gist.github.com/1628602 for easy perusing.

Looks good to me! What do you think?

Tuesday, May 10, 2011

The Game of Life, part 2: HashLife

Last time I wrote I gave a brief introduction to the Game of Life and a very simple Python implementation for visualizing it. I will freely admit that was a teaser post; this post gets into the real meat of the topic with an overview of the HashLife algorithm and a much more interesting implementation.life 570

Introduction

This entry has taken me an embarrassingly long time to post. As is my habit, I wrote the code and 90% of the post, and then left it for months and months. Whoops!

If you haven’t played with a Game of Life viewer before they are legitimately fun to toy around with - I encourage you to check this one out (code is here). Since the last version everything is much improved. The viewer supports a larger set of controls (see the README for details) and basic file reading is implemented so it’s possible to try new starting patterns on the fly. And, as promised, I’ve implemented the HashLife algorithm to massively speed up iterations, so enormous patterns billions of generations forward are easily within your reach.

Algorithm

HashLife is a simple yet interesting algorithm. Invented in 1984 by Bill Gosper (of Gosper glider gun fame), it exploits repeated patterns to dramatically cut down the work required to support large patterns over vast numbers of iterations. Between the Wikipedia page and the enigmatically named “An Algorithm for Compressing Space and Time” in Dr. Dobb’s Journal I think it’s decently well explained, but it took me a couple read-throughs to really wrap my head around so I’m going to try to give an overview of the key insights it utilizes.

quadtree

At it’s heart, HashLife is built around the concept of a quadtree. If you’re unfamiliar with it, a quadtree takes a square region and breaks it into four quadrants, each a quarter the size of the original. Each quadrant is further broken down into quadrants of its own, and on down. At the bottom, in squares of some minimum size like 2x2, actual points are stored. This structure is usually used to make spatial queries like “what points intersect this bounding box” efficient, but in this case two other properties are taken advantage of. First, nodes at any level are uniquely defined by the points within their region, which means duplicated regions can be backed by the same node in memory. For the Game of Life, where there are repeated patterns and empty regions galore, this can drastically reduce the space required. Second, in the Game of Life a square of  (n)x(n) points fully dictates the inner (n-2)x(n-2) core one generation forward, the inner (n/2)x(n/2) core n/4 generations forward, irrespective of what cells are adjacent to it. So the future core of a node can be calculated once and will apply at any future point in time, anywhere in the tree.

Inner nodesTogether these properties allow for ridiculous speedups. Hashing and sharing nodes drastically reduces the space requirements, with exponentially more sharing the further down the tree you go. There are only 16 possible leaf nodes, after all! From this, calculating the future core for a node requires exponentially less time than a naïve implementation would. It can be done by recursively calculating the inner core of smaller nodes, where the better caching comes into play, and then combining them together into a new node. You might be wondering if the gains from caching are lost to the increasing difficulty of determining which nodes are equal, but with a couple careful invariants we actually get that for free. First, nodes must be immutable - this one’s pretty straightforward. Second, nodes must be unique at all times. This forces us to build the tree from the bottom up, but then checking if a new node duplicates an existing one is simply a matter of checking if there are any existing nodes that point to the same set of quadrants in the same order, a problem that hash tables trivially solve.

def __hash__(self):
# Hash is dependent on cells only, not e.g. _next.
# Required for Canonical(), so cannot be simply the id of the current
# object (which would otherwise work).
return hash((id(self._nw), id(self._ne), id(self._sw), id(self._se)))

def __eq__(self, other):
"""Are two nodes equal? Doesn't take caching _next into account."""
if id(self) == id(other):
return True
return (id(self._nw) == id(other._nw) and
id(self._ne) == id(other._ne) and
id(self._sw) == id(other._sw) and
id(self._se) == id(other._se))

Implementation

As before, the code I’ve written is for Python 2.6 and makes use of PyGame, although neither dependency is terribly sticky. The code lives in a repository on github, and I welcome any contributions you care to make. As the code here is complicated enough to be almost guaranteed a bug or two, there is a basic set of unit tests in life_test.py and the code itself is liberally sprinkled with asserts. Incidentally, removing the asserts nets a 20% performance gain (as measured by the time it takes to run the ‘PerformanceTest’ unit test), although I find the development time saved by having them is easily worth keeping them in forever. As noted later, the performance of the implementation isn’t all that important anyways. Which is a good thing, since I coded it in Python!

A comment on rewrites: during the transition from version 1 - a simple brute force algorithm - to version 2 - the Node class that implements HashLife - I had both algorithms implemented in parallel for a while. This let me have every second frame rendered by the old algorithm so I could ensure that at different times and different render speeds that the algorithms were coming up with the same results. I’ve seen this pattern used at work for migrating to replacement systems and it’s very much worth the extra glue code you have to write or the confidence it gives. John Carmack recently wrote about parallel implementations on his own blog, if you want to hear more on the topic.

Performance

The performance is hard to objectively detail for an algorithm like this. For example, it takes ~1 second to generate the billionth generation of the backrake 3 pattern, which has around 300,000,000 live cells; it takes ~2 seconds to generate the quintillionth generation with 3x10^17 live cells. But this is a perfect patterns to showcase HashLife - a simple spaceship traveling in a straight line, generating a steady stream of gliders. In comparison, a chaotic pattern like Acorn takes almost 25 seconds to generate just 5000 generations with at most 1057 alive at any time. As it stands the properties of the algorithm drastically outweigh the peculiarities of the implementation for anything I care to do. Although I must say, if you want to compare it to another implementation in an apples to apples comparison I’d love to hear the numbers you get.

As always, I’d love to hear what you think!

Saturday, March 12, 2011

The Game of Life, part 1

Update: See part 2 for the implemented HashLife algorithm.

The Game of Life is a fascinating system. It was invented by John Conway in 1970 and has been studied continuously ever since. For those reading who haven’t heard of it before, a brief explanation: The world is an infinite grid of points, all either alive or dead. After each generation – or ‘iteration’ if you’d prefer – cells are updated according to the following three rules:

  1. If a cell is alive and it has two or three live neighbours, it stays alive.
  2. If a cell is dead and it has exactly three live neighbours, it becomes alive (tripartite reproduction?).
  3. Any other cell is dead.

Blinker infinite growth Glider

From these simple rules amazing complexity can arise. Some configurations are stable, like the period two “blinker” [above left], or the period four “glider” [above right] that moves one row over and one row down with every cycle. Other configurations, like the one above centre, grow infinitely – this one spits out two gliders then lays down a zig-zag strip of blocks forever after.

There is more to the Game of Life than pretty patterns and curious growth, I must hasten to add. It has been studied by a host of people in a variety of fields and has gone on to start a new branch of mathematics (cellular automata) and spur discussions on whether a sufficiently complicated pattern could be considered intelligent. It has also been proven to be Turing complete, so any computation your computer can run can be run by simulating the Game of Life with the correct starting state.

I have implemented a basic python program for simulating the Game of Life on GitHub. It allows for infinite patterns, grows the field of view automatically, and allows speed to be controlled with Up/Down, but otherwise is a very simple implementation. The goal here is to eventually implement some of the more interesting algorithms for speeding up the simulation.  There are numerous such algorithms, although the one I find the most interesting is called Hashlife and exploits repeated patterns through space and time to achieve an exponential speedup in running the simulation. More details in part 2, whenever I write it :).

Tuesday, June 8, 2010

New Job!

Today I officially started at Google! Exciting stuff. I will be working on the DoubleClick Ad Exchange doing something-or-other for the time being — however as I am neither a senior engineer nor a corporate figurehead, I do not plan to write about my work. This should, however, explain my recent absence :).

That is all.

Wednesday, April 28, 2010

Robustly hot swapping binaries (with code)

Going down for swap
Coming up from swap!
This is generation 2 running 0.2 software (initial version 0.1)
Running state machine

  A while ago I remember reading an article by Nathan Wiegand on hot swapping binaries. This was a very eye-opening article for me – before reading it, hot swapping was one of those black arts I never really thought about, and certainly wouldn’t have thought was at all easy. I highly recommend you read it for yourself. Go ahead, I’ll wait.

There. Did you notice it? The elephant in the room? One thing the article doesn’t address is how to design programs to use this fancy new ability, without being fragile and crashing and all those bad things. I’ve been mulling this over since reading it, and have settled on a basic design that I’ll present here. But don’t worry, this isn’t just a thought design…I’ve actually coded it up and made sure it works as expected. Feel free to jump down and check it out before reading the design. Still, caveat emptor and all that.

Design Goals

For this design, I am focusing on three key goals:

  • Allow updating to any future version
  • Allow updating to any previous version
  • Make it easy to be crash-free

Simple enough? Updating forward is a pretty obvious goal, as is crash-free code. I want to allow updating backwards as well for the simple expedient that I don’t expect all new code to be bug free, and so it might be desirable to roll back when a bug is introduced.

Design

To achieve this, I’ve settled on a shortlist of constraints for the code:

  1. Use protocol buffers to store all state
  2. Provide suitable defaults for everything, and be robust against weird state
  3. Structure the code as a state machine

I did say the list was short. Let’s look at these in detail:

  1. Protocol buffers are an obvious choice for persisting state, as they are forward and backward compatible by design. Care must be taken to not re-use tag numbers and to never add ‘required’ fields, but this is an easy requirement to satisfy. Now, using protocol buffers to store everything does incur some overhead, but they are quite efficient by design and we really only need to store all state between state machine transitions so local variables are still quick.
  2. Hand in hand with 1), we cannot always expect to have the data we want in the format we want version to version. To accommodate this we must pick suitable defaults for fields, and if necessary be able to get new values at runtime. At the same time, if the meaning or format of a field is changing, it is probably better to use a new field. We will always try to handle weird data that may show up, but this shouldn’t be abused.
  3. Finally, structure the code as a state machine. This surfaces all state machine transitions as potential points for upgrading versions, and forces state to be in the protocol buffer when these transitions are crossed to ensure important data isn’t forgotten. And like everything else, the next state data can be stored in the protocol buffer.

There is one problem with 3), however. What happens when new states are added? Going forward is easy, but if we update to a previous version when we’re in one of these new states, it will have no idea where to start running again. We could try storing fallback states or something like that, but that seems too fragile. Instead, I would recommend not allowing updates to occur when transitioning to these new states. Then, a few versions down the line when you’re sure you won’t need to downgrade past where they were added, remove that restriction.

enum State {
    // Special states supported by the program infrastructure.
    STATE_NONE  = 0;
    STATE_DONE  = 1;
    STATE_ERROR = 2;
    // Program states. Unknown state transitions lead to ERROR and terminate the
    // program, so should be avoided at all costs.
    STATE_INIT          = 3;
    STATE_PROCESS_LINE  = 4;
    STATE_MUTATE_LINE   = 5;
  }
  optional State prev_state = 2 [default = STATE_NONE];
  optional State cur_state  = 3 [default = STATE_ERROR];

What About Threads?

You may have noticed that this design is inherently single-threaded. Threading can be added easily enough if the main thread owns all the work, and can easily and safely wait for or cancel all worker threads without losing anything. In that case, spin down the workers when you’re about to swap, and spin them up again when it completes. If your program doesn’t fit that description, however, this design may not be for you.

Testing?

Of course! I would recommend trying all transitions on a test instance first before upgrading the real process. You could also build in consistency checks that auto-revert if the state doesn’t meet expectations, regression tests for certain upgrade patterns, etc. This design is meant to make it easy to hot swap successfully, but it is no silver bullet.

Let's See the Code!

As always, the code is up on GitHub for you to peruse. It is broken into two demonstration applications, ‘v1’ and ‘v2’, that can be swapped between at will. While looping they respond to ‘u’ and ‘q’ (update and quit), although at times you may be prompted for other input. the makefiles build to the same target location, so build whichever one you want run next and press ‘u’ to swap to it.

The code is structured so you can use it as a framework to play with yourself easily enough. You should only need to write an init method, update the state machine and .proto file, and write the respective state methods to do the real work. The state machine and state methods will look something like this:

ReturnCode runStateMachine(ProgramState& state) {
    cerr << "Running state machine\n";
    // Put stdin into non-blocking, raw mode, so we can watch for character
    // input one keypress at a time.
    setStdinBlocking(false);
    while (true) {
        ProgramState::State next;
        switch (state.cur_state()) {
            case ProgramState::STATE_INIT:
                next = runState_init(state);
                break;
            case ProgramState::STATE_PROCESS_LINE:
                next = runState_process_line(state);
                break;
            case ProgramState::STATE_DONE:
                setStdinBlocking(true);
                return SUCCESS;
            case ProgramState::STATE_NONE:
            case ProgramState::STATE_ERROR:
            default:
                setStdinBlocking(true);
                return FAILURE;
        }

        ProgramState::State cur = state.cur_state();
        state.set_prev_state(cur);
        state.set_cur_state(next);

        // For now, simply let the user decide when to swap and quit. We can
        // always change this later.
        ReturnCode code = checkForUserSignal();
        if (code != CONTINUE) {
            setStdinBlocking(true);
            return code;
        }
    }
}

ProgramState::State runState_init(ProgramState& state) {
    cout << "Please provide a line of text for me to repeat ad-nauseum\n";
    string line;
    setStdinBlocking(true);
    getline(cin, line);
    setStdinBlocking(false);
    state.set_line_text(line);
    cout << "Thanks!\n";
    state.set_line_count(0);

    return ProgramState::STATE_PROCESS_LINE;
}

Easy, right? And here is an example transcript from going forward and then back between the versions in the repository (behind the scenes compiles not shown):

eric:~/code/hotswap/v1$ ../bin/hotswap.out
HotSwap example started - version 0.1
Initial call
Running state machine
Please provide a line of text for me to repeat ad-nauseum
All work and no play makes jack a dull boy
Thanks!
0: All work and no play makes jack a dull boy
1: All work and no play makes jack a dull boy
2: All work and no play makes jack a dull boy
u
Going down for swap
Coming up from swap!
This is generation 2 running 0.2 software (initial version 0.1)
Running state machine
3 mutations: All work and no play makes jack a dull boy
4 mutations: All work and no play maXes jack a dull boy
5 mutations: All workqand no play maXes jack a dull boy
6 mutations: All workqand nL play maXes jack a dull boy
u
Going down for swap
HotSwap example started - version 0.1
Coming up from swap!
Running state machine
7: All workqand nL play maXes jack a dull boy
8: All workqand nL play maXes jack a dull boy
9: All workqand nL play maXes jack a dull boy
q
Terminating with code 0


As you can see, version 0.2 mutates the line as it goes, while version 0.1 simply prints it forever. There are more differences than that, but you can find all that out from the code.

Enjoy! If you do end up playing with it, I’d love to hear about your experiences, or your thoughts on the design even if not.


This will probably be my last post for a while – on Saturday I leave the continent for 2 weeks and the city for 6. I will try to respond to emails and comments while I’m gone, but I may be a bit slower than usual.