Wednesday, April 14, 2010

Introducing: Marriage Sort


Update: I've created a live visualization of this algorithm, so you can see it in action - see Marriage Sort, a Visualization.

Two weeks ago, a link showed up on Hacker News on how to (mathematically) select the best wife. Tongue firmly in cheek, of course. The key takeaway from the article is that knowing only the relative rankings of items in a sequence (and assuming you can never go back to a previous one), you can maximize your chances of selecting the maximum one by letting N/e go by and then picking the next one that's better than all those. Further, to maximize your expected value you only need to let the first √N - 1 go by. After rattling around in the back of my mind for a couple weeks, yesterday this tidbit coalesced into an actual Idea. Could I turn this selection routine into a sorting algorithm?

And thus, Marriage Sort was born. I present it here purely out of interest, not because it is especially fast - if nothing else, because it is the only sorting algorithm I know of that has an average complexity of O(n1.5)1.

Algorithm

The basic idea of this algorithm is to repeatedly choose the maximum element from the first √N - 1 elements of our working set, and then scan to the end looking for elements bigger than it. When one is found, swap it to the end of the working set, and decrease the working set by one. After the pass is complete, swap out the maximum element from the first √N - 1 as well, and start again. When everything is finished (√N - 1 <= 0), use insertion sort to put the array into true sorted order.
One pass of the Marriage Sort. Two elements are found and swapped to the end, followed by the maximum element from the first √N - 1.


This works because, as we have noted from the article linked above, each element bigger than the first √N - 1 is expected to be close to the largest remaining element in the array. By repeatedly taking these elements and moving them to the end, we put the array into an approximately sorted order, where elements should be reasonably close to their 'true' positions. When this is done insertion sort will do the final rearranging, moving elements the short distances to their true positions.

In pseudocode:
def marriagesort(array):
    end = array.length
    while true:
       skip = sqrt(end) - 1
       if skip <= 0: break

       # Pick the best element in the first vN - 1:
       bestPos = 0, i = 1
       while i < skip:
           if array[i] > array[bestPos]: bestPos = i
           i += 1

       # Now pull out elements >= array[bestPos], and move to the end:
       i = skip
       while i < end:
           if array[i] >= array[bestPos]:
               array.swap(i, end-1)
               end -= 1
           else:
               i += 1

       # Finally, move our best pivot element to the end
       swap(bestPos, end-1)
       end -= 1

    # Finish off with insertion sort to put the elements into true sorted order
    insertionsort(array)

Here you can see this algorithm working on a randomized array. Many thanks to Aldo Cortesi for the wonderful sorting algorithm visualization framework!

Update: Here is another visualization made using Aldo Cortesi's tools, of a 512 element array this time. If you take a close look at the insertion sort phase (the slope on the right side), you can see that it is composed out of a lot of little triangles. Each of these triangles corresponds to one pass of the marriage selection routine, and shows how far elements can be from their true sorted position.

Complexity

Some quick back-of-the-napkin calculations indicate that this algorithm takes O(n1.5) compares in the average case, although the worst case looks to be O(n2). For the first stage (i.e. without the final insertion sort) each pass will take at most n compares and move approximately √N items to the end, requiring about √N passes. I don't know if this is a tight bound - the passes will actually speed up and require fewer compares as the working set shrinks, however I didn't want to try including that in the calculations. Similarly, after each pass all the items moved are guaranteed to be greater than all the items left, capping the distance moved items are from "home" to √N on average for the insertion sort. This holds the insertion sort to O(n1.5) compares on average as well.

Similarly, this algorithm takes O(n1.5) swaps in the average case (and O(n2) in the worst case). The number of swaps could certainly be decreased if a different final sort were used - the first pass only takes Θ(n) by itself - although in practice this isn't much use since the compares will still dominate the runtime.

Note that both of these graphs are log-log plots, so the important feature to look at is the slope of lines rather than the absolute position. For example, in the "Swaps" plot quicksort appears to be hovering around the n∙log(log(n)) line, however looking at the slope we can see that it is actually approximating n∙log(n), just with a better constant factor.

Note 2: The x axis of these graphs is the number of elements in the array, and the y axis is the number of comparisons/sorts required. I should have labeled these better.

Postscript

Some sample code implementing this algorithm is up on GitHub - this is the code to generate the log-log plots shown above.

As always, questions and comments are welcome!

Footnotes

  1. Most sorting algorithms are either Θ(n2) or Θ(n∙log(n)) in the average case. This one falls between those two extremes, making it faster (asymptotically) than algorithms like insertion sort or bubble sort but slower than quicksort or merge sort.

22 comments

  1. Is there an error in your pseudocode? It looks like the first inner while loop should be "while i < skip" instead of "while i < end". Am I missing something?

    ReplyDelete
  2. You are correct, my pseudocode was broken. The first inner while loop should have been i < skip, and the second one for i < end was missing entirely. Thanks for catching that!

    ReplyDelete
  3. That's Python, not pseudocode! :-D

    ReplyDelete
  4. If I call it pseudocode I can skip details like float->int casts, and I don't have to make sure it actually runs ;)

    ReplyDelete
  5. What did you use to create one+pass.png?

    ReplyDelete
  6. Photoshop. I don't have any good tools for making diagrams like that, unfortunately, so that one was made by hand.

    ReplyDelete
  7. Your footnote reference takes me to log in to Blogger. What?

    ReplyDelete
  8. Ugh, that's the blogger post interface trying to be "smart". I write the posts in html, but when I use the "Compose" interface to add the images, it has a tendency to corrupt the code I wrote. I can only hope there is a better editor coming someday.

    Should be fixed now.

    ReplyDelete
  9. This is also known as the "secretary problem" and is nothing spectacularly new. More accurate formulae are given at http://en.wikipedia.org/wiki/Secretary_problem, showing the odds of selecting the best one are far from certain and actually approach zero if you don't know the value of N initially; if N is known initially, they converge on 0.368 as N increases.

    ReplyDelete
  10. Just for your reference, shell sort (with particular gap sequences) gives O(n^1.5) worst case.

    Shell sort is probably my favourite sort, just because of the magic behind the gap sequences :)

    ReplyDelete
  11. It'd be interesting to compare this to shell sort , another sorting algorithm whose complexity is greater than O(n log n) but less than quadratic.

    ReplyDelete
  12. @Andy: So it does! I had looked at Shell sort briefly and saw it was O(n log(n)^2) in the best case, but I didn't realize it had other known sequences with differing complexities. Time to read up on Shell sort I think.

    @Jeremy: The most common Shell sort has complexity O(n log(n)^2), so for a rough comparison we can simply plot the two of them: http://www.wolframalpha.com/input/?i=plot+x^1.5,+x*(log(x)/log(2))^2+from+1+to+100000
    It would seem these two sorts are approximately equivalent in the average case (ignoring constant factor differences) for arrays up to about 100,000 elements. Hmm...I may have to try implementing Shell sort, and see how it actually compare in practice. If I do, I'll report the results back here.

    ReplyDelete
  13. Shell sort (with 2^j-1 increments) is one algorithm that has a complexity of O(n^{3/2}).

    ReplyDelete
  14. How about posting an image using that "fruit salad" visualization from sortvis? I always found that a lot more informative than the spaghetti tangle one:

    http://corte.si/posts/code/sortvis-fruitsalad/index.html

    ReplyDelete
  15. @Anonymous: sure, here is a visualization with 512 items:
    http://3.bp.blogspot.com/_4uUxYbT6E0Q/S8cpqJpVF1I/AAAAAAAAD7I/GdEUsBuqHQk/s1600/marriagesort.png

    Note that this only shows swaps, not compares, so the time isn't as dominated by the insertion sorts as it appears.

    ReplyDelete
  16. @Anonymous: Ok, I have made a better version of the "fruit salad" visualization and added it to the post. Thanks, that was a good idea.

    ReplyDelete
  17. Brilliant work. I am constantly amazed by the creativity of programmers in their free time! If only we mathematicians knew that much about programming :)

    Ruben

    ReplyDelete
  18. Sorry about coming so late here, but I just saw this over on Programming Praxis (http://programmingpraxis.com/2010/08/20/marriage-sort/) where I did a Ruby version. One question though, in the picture for the one pass of the merge sort, why isn't the max of the sqrt(N)-1 the value "7" rather than "8"? Wouldn't the sqrt(10)-1 just check the first two values of the array?

    ReplyDelete
  19. Ahhh... that makes sense. It's interesting how hard this is to test. The insertion sort at the end covers up a multitude of sins.

    I know you put code up on GitHub and you're welcome (if it looks correct to you) to grab it and share it there also.

    ReplyDelete
  20. Kinda cool, but not quite sure why you would want to use it. If it does run O(n^1.5) then it is no improvement. There is already merge-sort and heap-sort that run in O(nlg(n)) which is asymptotically faster than O(n^1.5). So there isn't any improvement, nor can there be. There is a lower bound on comparison based sorting algorithms of nlg(n). So it is impossible to beat merge and heap. Unless you use a non comparison based sorting algorithm like counting sort or radix sort.

    ReplyDelete
  21. You are quite correct. If my memory serves me right, I took artistic
    liberties when making that diagram to make the swap pattern obvious,
    but the partition really should be the first two elements only.

    Ruby port you say? Awesome! I shall have to check it out when I get
    back to a real computer.

    ReplyDelete