Update: I've created a live visualization of this algorithm, so you can see it in action - see Marriage Sort, a Visualization.
Two weeks ago, a link showed up on Hacker News on how to (mathematically) select the best wife. The key takeaway from the article is that knowing only the relative rankings of women you've dated (and assuming you can never go back to a previous one), you can maximize your chances of marrying the best one by letting N/e go by and then marrying the next one you meet that's better than all those. Further, to maximize your expected value you only need to let the first √N - 1 go by. After rattling around in the back of my mind for a couple weeks, yesterday this tidbit coalesced into an actual Idea. Could I turn this selection routine into a sorting algorithm?
And thus, Marriage Sort was born. I present it here purely out of interest, not because it is especially fast - if nothing else, because it is the only sorting algorithm I know of that has an average complexity of O(n1.5)1.
Algorithm
The basic idea of this algorithm is to repeatedly choose the maximum element from the first √N - 1 elements of our working set, and then scan to the end looking for elements bigger than it. When one is found, swap it to the end of the working set, and decrease the working set by one. After the pass is complete, swap out the maximum element from the first √N - 1 as well, and start again. When everything is finished (√N - 1 <= 0), use insertion sort to put the array into true sorted order.One pass of the Marriage Sort. Two elements are found and swapped to the end, followed by the maximum element from the first √N - 1.
This works because, as we have noted from the article linked above, each element bigger than the first √N - 1 is expected to be close to the largest remaining element in the array. By repeatedly taking these elements and moving them to the end, we put the array into an approximately sorted order, where elements should be reasonably close to their 'true' positions. When this is done insertion sort will do the final rearranging, moving elements the short distances to their true positions.
In pseudocode:
def marriagesort(array):
end = array.length
while true:
skip = sqrt(end) - 1
if skip <= 0: break
# Pick the best element in the first vN - 1:
bestPos = 0, i = 1
while i < skip:
if array[i] > array[bestPos]: bestPos = i
i += 1
# Now pull out elements >= array[bestPos], and move to the end:
i = skip
while i < end:
if array[i] >= array[bestPos]:
array.swap(i, end-1)
end -= 1
else:
i += 1
# Finally, move our best pivot element to the end
swap(bestPos, end-1)
end -= 1
# Finish off with insertion sort to put the elements into true sorted order
insertionsort(array)
Here you can see this algorithm working on a randomized array. Many thanks to Aldo Cortesi for the wonderful sorting algorithm visualization framework!
Update: Here is another visualization made using Aldo Cortesi's tools, of a 512 element array this time. If you take a close look at the insertion sort phase (the slope on the right side), you can see that it is composed out of a lot of little triangles. Each of these triangles corresponds to one pass of the marriage selection routine, and shows how far elements can be from their true sorted position.
Complexity
Some quick back-of-the-napkin calculations indicate that this algorithm takes O(n1.5) compares in the average case, although the worst case looks to be O(n2). For the first stage (i.e. without the final insertion sort) each pass will take at most n compares and move approximately √N items to the end, requiring about √N passes. I don't know if this is a tight bound - the passes will actually speed up and require fewer compares as the working set shrinks, however I didn't want to try including that in the calculations. Similarly, after each pass all the items moved are guaranteed to be greater than all the items left, capping the distance moved items are from "home" to √N on average for the insertion sort. This holds the insertion sort to O(n1.5) compares on average as well.Similarly, this algorithm takes O(n1.5) swaps in the average case (and O(n2) in the worst case). The number of swaps could certainly be decreased if a different final sort were used - the first pass only takes Θ(n) by itself - although in practice this isn't much use since the compares will still dominate the runtime.
Note that both of these graphs are log-log plots, so the important feature to look at is the slope of lines rather than the absolute position. For example, in the "Swaps" plot quicksort appears to be hovering around the n∙log(log(n)) line, however looking at the slope we can see that it is actually approximating n∙log(n), just with a better constant factor.
Note 2: The x axis of these graphs is the number of elements in the array, and the y axis is the number of comparisons/sorts required. I should have labeled these better.
Postscript
Some sample code implementing this algorithm is up on GitHub - this is the code to generate the log-log plots shown above.As always, questions and comments are welcome!





I'm Eric Burnett, a software engineer at Google. I'm interested in studying what makes programmers tick, and writing whatever code strikes my fancy. Read my 
20 comments: