Monday, January 30, 2012

On hosting static files

You may have noticed that I have a blog. Shocking, I know. It’s predominantly composed of text, as many are, but that’s not the only content I like to share. Posts are more approachable and memorable if they contain images, sorting visualizations require Javascript frameworks, etcetera. Unfortunately, all these support files need to be stored somewhere.

This blog is hosted on Blogger, a decision I’m very happy with. Blogger doesn’t do arbitrary file hosting (that I know of), but it does support pushing images to Picasa. So images are hosted by Picasa, another Google service, no sweat. Now, what about other files?

Hmm.

It turns out the answer to this question is a lot trickier than I would have hoped. I was hoping for another Google service, to keep my dependencies down, so I started with Sites.

Google Sites

Google Sites is a free and easy way to create and share webpages”, the tagline goes. And it is – we use it internally all the time. One quick form later, I have a site, and another couple clicks puts a single attachment on it. Great! Update the post to reference it, double-check in incognito mode, and my work here is done. Right?

Well…not quite. It turns out that Sites uses some interesting redirect magic to actually link you to your content. Redirect magic that, for reasons I don’t understand, doesn’t actually resolve when you’re using it to hotlink content. For exactly this reason I guess, I don’t know. Anyways, since I had visited this content while on Sites, my browser had it cached, and even incognito mode could access it, but it wouldn’t resolve anywhere else. Which is a good reason to test things on more than one computer, I suppose.

Ok, not sites. App Engine?

Google App Engine

App Engine does do static file hosting. Instructions are here – you must simply download the SDK, create the app.yaml file appropriately, upload your new App (the files must belong to something, after all), etc. This looks doable, but certainly non-trivial. I also cannot figure out how this is going to be priced, nor the latency to expect from it. So let’s keep looking.

I’m running out of Google services (I considered and rejected a few more). Time to look further afield. I don’t know of any good, trustworthy solutions offhand, and a quick search isn’t taking me anywhere I really want to go, so lets look at Amazon AWS. They have a service for every occasion, right?

Amazon AWS

As it turns out, yes, yes they do. A quick look through the options (there are so many!) says that Amazon Simple Storage Service (S3) will do nicely, optionally backed by CloudFront if I ever really need to scale. A quick signup, upload of my files, and one new CNAME, and I’ve got my files living under static.thelowlyprogrammer.com, backed by S3. Nice! Doubly so since the free usage tier should make this entirely free for the next year or so, and pennies after.

Finally, in researching this blog post, I found one more option. And it’s a Google service, which was my original goal. Let’s check it out!

Google Cloud Storage

New on the block, Google Cloud Storage appears to be a competitor to Amazon S3. Priced a tiny bit cheaper, the features seem roughly comparable. The documentation is a bit rough still, which partly explains why I didn’t figure this out earlier, but everything is there if you look hard enough. One significant distinction is that you do not choose where data is stored (unless you want to add restrictions), and it will get replicated wherever needed. Note that this includes automatic edge caching for popular files, so this is pretty much S3 and CloudFront all rolled into one. Fancy! It supports the same CNAME aliases, so I’ve got this set up as a hot-spare for my S3 data. I’ll leave S3 as primary for now since I’ve got it all configured and tested happily, but it looks like I’d be well served either way.

Maybe in a future post I’ll do a head-to-head comparison of S3 and GCS, if I can figure out a fair way of measuring performance all over the globe. Until then, I’m happy to stick with either.

Mission accomplished – static files hosted. Time for bed.

Monday, January 16, 2012

Marriage Sort, a visualization

You'll need a modern browser that handles SVG (a recent Chrome, Firefox, or IE9 will do) to properly see this post.

On Saturday, Mike Bostock posted a link on Hacker News to an explanation of the Fisher-Yates Shuffle he had written. Alongside the explanation he included a very slick visualization where you could see it in action. Now, at a fundamental level, shuffle algorithms and sorting algorithms are simply the reverse of each other. So if this visualization does well on shuffle algorithms, why not a sorting algorithm as well?

A couple years ago I wrote a sorting algorithm, which I gave the tongue-in-cheek name Marriage Sort (an homage to the problem that inspired it - see the introduction post for more details). I was looking for a project to play with on the weekend, so I thought, why not try the visualization on that? Mike has kindly let me re-use his code, so here's the same visualization of Marriage Sort.


Disclaimer: The visualization I'm building off looks great across platforms, so if there is anything wrong with this one, it's from my changes. Go see the original if you don't believe me.

For those that don't remember the algorithm, here's a synopsis. There are two stages:
  • Stage 1:
    1. Take a prefix of size √m, where m is the size of the working set.
    2. Pick the maximum (in this case, most-right-leaning) of the prefix, and take it as the pivot.
    3. Walk through the list, pulling out anything greater than the pivot and moving it to the end of the working set.
    4. Move the pivot to the end of the working set.
    5. Go back to step 1 and repeat.
  • Stage 2: a bog-standard insertion sort.
As the passes proceed the pivots decrease in size and more values are matched, allowing the array to be put into a "mostly sorted" state. On average, every element is within √n spots of the correct position, which the insertion sort then corrects. Overall complexity O(n1.5) time, O(1) extra space. Code for the visualization is at https://gist.github.com/1628602 for easy perusing.

Looks good to me! What do you think?