With random pivot selection, quicksort is expected O(n log n) time for all inputs. (The expectation is over the set of random numbers used to pick the random pivots, and doesn't involve the input at all). Algorithms like this are called "las-vegas".
Yes. And if your adversarial algorithm is good enough, it should work even in that case. (Of course any input is the worst case if all inputs have the same expected time.)
Ahh, ok, it takes N * log N to do the selection, but it's only O(l) for the length of the set partition each time. I was thinking it's the same n each cycle vs the same n at each depth. Err, nm it's hard to put into words.
PS: It still doubles the time but that's not important in O notation.
You can get away with less than the media. For example a pivot guaranteed to lie between the 25th and 75th percentile is good enough and can be found faster.
Personally, I really like merge sort. It makes few assumptions, it's stable, it works well with sorted inputs, can be split across multiple CPU's, and it works well with sequential input streams. Radix sort tends to be faster especially for strings, but it does not work for many data structures without adapting the algorithm. EX: Radix sorting doubles requires bit manipulation of the data.
Also one can always select the median in (deterministic) linear time. This way quicksort won't degenerate to O(n^2) ever.