Hacker News new | past | comments | ask | show | jobs | submit login
Bubble Sort: An Archaeological Algorithmic Analysis (duke.edu)
47 points by mac01021 on Aug 28, 2017 | hide | past | favorite | 31 comments



For me it seems that bubble sort is mostly popular because it is very intuitive. It is the typical way of how would one sort a bunch of heavy bags with only scales, which is another question that comes up often.


I would think that some form of insertion sort is the most intuitive. That's what people use when they do things like sort cards in their hand.


Anecdotally, the professor of my algorithms class had us all sort cards manually and asked us to describe the algorithm we used when we began studying sorting algorithms. Most of us used either selection sort or insertion sort.


I would think bucket sort (into the suits) would be the most common. Whenever I am sorting named papers I always do it with 26 piles.


A-K in 13 piles then suits in 4 piles is a little faster. Or if your space limited you can group A-7 then 8-K.


That's more or less radix sort, no?


Yep, though radix and bucket sorts are closely related.


I suspect that if you took students who knew just enough programming to write a sort, but with no prior exposure to sorting algorithms, and asked them to write a sort, many would come up with a bubble sort (I think that was true of me.) If so, then it is worth mentioning simply to point out its inefficiency. Also, the bubble sort serves a useful pedagogical purpose for introducing the concept of time complexity analysis, especially if my guess is correct, as then it does not look like a contrived example.

After all, the goal of computing education is not just to churn out mechanical algorithm-implementors, and it is hard to teach good design without something to compare it to.


Bubble sort is still the only sort I will write by hand. It's fast to write and easy to read. It's in-place and stable. It has a stupidly low coefficient in front of that n² that keeps it in the running until about 100 elements.

Once I get beyond that, I select a pre-written sort that's tailored to my needs. Do I need stability? Do I need in-place? Do I need a local copy at all, or can I stream the results?


Do you know of any source that compares the coefficients for bubble sort, insertion sort, and selection sort? This sounds interesting.


Given that those are all average case n² algorithms, the article's figure is probably a good representation:

https://users.cs.duke.edu/~ola/bubble/bubble.html#fig:quadso...

Mannykannot is correct that the full coefficient is dependent on the machine and exact code. However, we can make some refinements based on the algorithms alone.

Bubble sort is n² / 2, as it processes one less element per pass. This is a pretty common pattern, so it's good to remember.

Insertion sort starts at zero elements and processes one extra element per pass. So it's also n² / 2. However, on average each pass only needs to process half the elements in order to find the correct insertion point. So it's actually n² / 4. If you use a binary search, it becomes n log n -- n passes, each perform a log n binary search. (This is usually called binary insertion sort.)

However, those numbers are in comparisons, which is typically what sort algorithms are measured by. It assumes that all memory operations are constant time. But you might also want to keep track of swaps or other memory movement. For instance, when you add the memory swaps back to the binary insertion sort, it's O(n²) again, because it still has to perform an average of n / 2 swaps on every pass -- O(n * (log n + n / 2)) = O(n²).

Choice of data structure can also make an impact. Insertion sort is not awesome on memory arrays, because it has to perform a block copy for every insert. However, on linked lists, that single-element shift happens for "free" when you perform the insert.


I think you will find it to be data-, implementation-, language-, and platform-dependent. You could consider implementing versions of each in a compiled language and look at the disassembled machine- or byte-code generated for them. If your target processor is pipelined and/or does branch prediction, speculative execution, caching or virtual memory, these could all affect the relative performance on a given input.


I remember how clever I felt when I discovered binary search, and how humiliated I was when I discovered that not only had I not been the first person to come up with it, but that it was so obvious pretty much everyone had "discovered" it independently. ;-)


I "rediscovered" tries in order to cheat at scrabble my freshman year. So it goes!

https://en.wikipedia.org/wiki/Trie


I'd expect insertion or selection sort to be easiest to invent; I remember coming up with selection sort as a kid. Where bubble sort comes out on top is being the simplest to code (if you have for-loops) -- not quite the same thing.


I remember that when I have started to code and had no internet or books, just examples of other code, and needed to sort something. I have written a bubble sort too. But back then being slow was sometimes a feature, as I knew nothing about threads or clocks, so the processor ticks commanded the speed of my shitty game.


Paradoxically, bubblesort can actually be pretty fast on small inputs (i.e. short arrays).

I once experienced this first hand in C codebase. I had to sort an array of structs where the array never contained more than ~50 members, and the comparison was very simple; in that case, bubblesort actually ran faster than the qsort(3) supplied by the compiler.

Then again, at such small scale, the performance win is so tiny it usually is not worth the effort.


qsort(3) is quite slow because it can not inline the compare function. If you created a 'generic' bubble sort that take a compare function pointer and no LTO optimization (or use an inlined c++ sort), the quick sort algorithm might be faster even on 50 elements.


I had suspected that might be a factor.

Either way, at 50 elements, if the difference in performance between qsort and bubblesort has significant impact on overall performance, there is probably something very wrong. ;-)


Intuitive? Bubble sort is the most counterintuitive sort I know after Shell sort. I still have to think hard about what it does every time I read it and I'm not sure if I remember it accurately to be honest. So I don't understand how people find it intuitive. Insertion sort is far, far more natural.


It's not even particularly clear that bubble sort halts. Unlike insertion sort, the loop variant is not trivial to write down. I suppose the best thing about it is that it is very clear that if it halts, the array is sorted.


Do you mean with only balances? With scales you would just weigh each bag once and sort the weights. In either case, why would bubble sort be preferred? I'd think you would want to minimize how many times you have to move the bags on and off the scale/balance.


Yes. I thought that "scales" was the same thing as balances when used for a single object. I see that it does not.


A "Pair of scales" does mean the same as a "balance", but "scales" on its own sually means a device containing an electronic (or spring) weighing mechanism.


Why is bubble sort what you would use for heavy bags?


The problem goes something like this: You have N number of bags of the same size but different (maybe) weights. You have nothing to write and only balances. The easiest way to sort them is to put them in a row, compare them two by to and swap them in order.


The only situation where I've actually employed bubble sort in production code was in a situation where the container to be sorted was guaranteed to be at most 5 elements (and usually 3) and was usually already sorted - and calling this sort was done a lot. All other sorts proved to have too much startup overhead or too much code that bloated caches.. so yes, I wrote a bubble sort to be used there and it performed quite well. But I would generally never recommend bubble sort for 99.9% of cases - but just once in a blue moon it actually can be the right answer.


I re-discovered bubble sort when I was 12, with a deck of cards and a C64. It wasn't until I got to college that I learned it had a name.

I believe it's a very intuitive algorithm, it has likely been rediscovered many times.


"Well, I think the bubble sort would be the wrong way to go."

--Barack Obama


I did not know about this and Googled the line out of curiosity. TIL


Same, googled and found this great story:

http://fortune.com/2015/08/18/mindware-nisbett/

> Shortly after Barack Obama announced he was running for president in the fall of 2007, Google’s CEO, Eric Schmidt, interviewed him in front of a large audience of Google employees. As a joke, Schmidt’s first question was, “What is the most efficient way to sort a million 32- bit integers?” Before Schmidt could ask a real question, Obama interrupted: “Well, I think the bubble sort would be the wrong way to go,” a response that was in fact correct. Schmidt slapped his forehead in astonishment, and the room broke out in applause. Later, in the question-and-answer period, Obama assured his audience, “I am a big believer in reason and facts and evidence and science and feedback,” and promised that he would run the government accordingly.

> In the audience that day was a product manager named Dan Siroker, who made a decision on the spot to go to work for Obama. “He had me at bubble sort.”




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: