Hacker News new | past | comments | ask | show | jobs | submit login
Qsort.h – Quicksort as a C macro (2019) (github.com/svpv)
85 points by olitech on May 7, 2023 | hide | past | favorite | 47 comments



This is very good: it uses the original Hoare partitioning algorithm which moves two indices at opposite ends of the array partition toward each other, rather than the ill-considered Lomuto partitioning:

    while (1) {                              \
      do q_i++; while (Q_LESS(q_i, q_l));    \
      do q_j--; while (Q_LESS(q_l, q_j));    \
     if (q_i >= q_j) break; /* Sedgewick says "until j < i" */ \
     Q_SWAP(q_i, q_j);                       \
    }                                        \
Lomuto is that algorithm that moves one index, swapping lower-than-pivot elements into a growing lower partition.

Lomuto ends up performing more comparisons.

What's more, Lomuto has quadratic behavior for sequences that are just repetitions of a value: every partitioning and sub-partitioning is a degenerate case. Hoare's algorithm deftly avoids this problem. You can explain that using the above snippet. If all elements are the same, then Q_LESS(q_i, q_l) is always false. Thus each of the two do/while loops executes one unconditional iteration, so that the two indices q_i and q_j march toward each other and meet in the middle. The repeating sequence thus nicely cut in half, and recursively so, ensuring O(N log N) behavior.

You can't easily banish the worst case from Quicksort, but implementations should not exhibit worst case behavior on inputs that have some obvious pattern, like a repeated value.


Lomuto partitioning has one major advantage over Hoare: Lomuto can be implemented in such that the inner loop is branchless, while the inner loops in Hoare are branchy and wreck the branch predictor.

Lomuto typically fixes the Dutch National Flag problem by keeping two 'midpoints'; left of the left midpoint is less than the pivot, right of the right midpoint is greater than the pivot, between the two midpoints is equal to the pivot. This makes most data inputs that result in the O(n^2) worst case of quicksort give O(n) performance instead.

Unfortunately the code is horrific looking. It's not something I'd ever want to do in a C macro.


And my algorithm pdqsort uses Hoare-style partitioning, is branchless, and properly solves the Dutch National Flag problem by being 0(n log k) for k unique values on average using only a single extra comparison per partitioning.

https://github.com/orlp/pdqsort


The link to Sedgewick's 1978 paper no longer works, but an archive of it can be found here: http://web.archive.org/web/20190713031319/http://penguin.ewu...


The paper says that more detailed discussion of partition methods is given in [15]. That looks like Sedwicks's Ph. D thesis which was on Quicksort?

I found this: https://sedgewick.io/wp-content/themes/sedgewick/papers/1975...

Basically a longer book on Quicksort with a lot more analysis.

He doesn't discuss any partitioning methods that do not move two indices toward each other from opposite ends of the subarray. It's not even on the table.


Of course the logo on the paper title page is a sequence of 8 lines with each doubling on thickness.

Though it'd be more visually and intuitively accurate if the lines started large and cut in half each step. :P


Nice catch, yes!


Along the same lines, I use Christopher Swenson's sort.h at https://github.com/swenson/sort/

  You get the choice of many sorting routines, including:

    Timsort (stable)
    Quicksort
    Merge sort (stable)
    In-place merge sort (not stable)
    Shellsort
    Binary insertion sort
    Heapsort
    Selection sort (this is really only here for comparison)
    Grail sort (stable)
    Sqrt Sort (stable, based on Grail sort, also by Andrey Astrelin).


Today I learned that for modern processors, quicksort code is small enough that the compiler can inline them if the definition is visible during compilation. See here for comparison of isort function implemented using plain static functions vs the macro in TFA: https://godbolt.org/z/arGjPGhKE . It's not as flexible as the macro form (not being able to pass two arrays at once like the sortByAge example in TFA), but I figure it should be friendlier to debuggers (though with all the inlining the experience is not going to be that great either).

IIRC, if LTO is enabled the calls to the less and swap functions can also be inlined even if the generic function is compiled to a separate translation unit. Haven't tried it for the quick sort code above, but I found that out for some other thing I experimented with.


Did you try inline as well? Why are they not as flexible as the macro? Couldn't the swap not also process two arrays?


The function is already inlined because the implementation is visible. You can check the assembly, it has no calls to the compare and swap functions.

I guess you're right, since the function accepts a void pointer, it could also point to an array of two array pointers. So yes, it can also process two arrays.


If you're doing metaprogramming using the C preprocessor, it's time to move to a more advanced language.


In many cases it's also enough to just enable LTO to give the compiler enough information for stamping out a specialized version.

But TBH I'm a bit tired of people shitting on the C preprocessor. It's a relatively simple text replacement tool, and provides an incredible amount of bang for the buck (e.g. many problems can be solved without having to add more specialized bells and whistles to the language). It's a trade-off like anything else in computing.


The C preprocessor was an easy and compact way to extend C back when memory was really really tight. You can't even fit a C++98 compiler into DOS's memory space, but you can fit a C compiler in 64K. (Well, an earlier version of C.)

C has been adding generics anyway, like _Generic.


>C has been adding generics anyway, like _Generic.

"_Generic" added function overloads to c, not generics. The naming is exceptionally poor.

The submission is an example of generic programming. It an implementatin of something that works on all types that fullfill certain constrains. In this case comparable and swapable. "All types" include user defined types.

"_Generic" does something completely different. The C++ equvivalent would be std::conditional_v<std::is_same<T, int>, myIntFunc, std::conditional<std::is_same<T, double>, myDoubleFunc ............ The exact opposite of a generic, I'd argue, as myIntFunc and myDoubleFunc do depend on the type. You might use the preprocessor to create these functions that differ only in type, but then again, you might do what OP did.


About a year ago I had a look at a C compiler for 16bit computers, featured in disk form on I think Adrian's digital basement YT channel.

The compiler was very basic, nothing like you'd expect from a compiler even from the dragon book.

So simple and it was a production compiler too!

Sadly I can't remember the name of it to reference here.


I wrote one back in 1983 or so for 16 bit DOS, too!

A few years later, I was at a C++ conference where they asked me to sit on an "Ask us anything" panel. I was there along with the developers of Microsoft C, Borland C, etc.

The first question was "do you still ship a version of your compiler that will run on a floppy disk system?"

Vendor 1 said sure, and launched into a long description of how the files could be shuffled about on the floppy to make it work.

Vendor 2 said sure, and launched into ...

My turn. I said sure, and said the floppy disk version costs $200 and comes with a hard disk drive. (That was the price of a hard disk in those days.)

That was the end of that, I never heard that question again from anybody.

Progress...


What versions did you offer otherwise? I don't recall CD becoming popular on computers until the 90s.


Wow, in 1983 I hadn’t even seen a hard disk in person!


I had one back then, but I did say "a few years later...".


Agree, but it's nowhere near TASM or even MASM macro processor ;0)


> many problems can be solved without having to invent new language features

You mean something like a type checker?


You're right, but also famously an interested party in the promotion of one of these "more advanced languages".

People often take bias to mean that a source is false or untrustworthy and I think this is a nice counterexample.


I knew someone would come up with this kind of "wisdom". For starters you do not know what other languages they might be using for development. And whatever they do they definitely do not need patronizing.


I've done a share of C metaprogramming with the preprocessor myself, and have dealt a lot with other peoples' C metaprogramming.

I've also seen people use pages of algrebra as a substitute for a couple lines of calculus.

I struggled for years with a soldering iron, always frustrated by bad solder joints. Then, I discovered a Weller thermostat controlled iron, and get a perfect joint every time.

Not everyone knows there are better ways to do things.


Your experience soldering is a much better response. You present a concrete suggestion for use a weller thermostat controlled iron. But you did not present a concrete suggestion for an alternative to the C pre-processor.

You could do this with C++ templates, for instance. But the LESS and SWAP operations may or may not get inlined. With the C pre-processor you can be certain that the operations get inlined, since no other alternative interpretation is available to the compiler.


> But the LESS and SWAP operations may or may not get inlined.

True. But modern inliners are pretty good, and if they can't inline it due to its complexity, it's pretty unlikely it'll be faster.

Note the performance comparison in the article.

Low level hand-optimizations paid off handsomely in the 1980s, but are usually best left to the compiler's optimizer these days.


The article is talking code and demonstrating results.

Talk is cheap, show the code!


A rote translation of Q_SORT3 would look like:

    /* Sort 3 elements. */
    void Q_SORT3(alias Q_LESS, alias Q_SWAP, Q_UINT) (Q_UINT q_a1, Q_UINT q_a2, Q_UINT q_a3)
    {
        if (Q_LESS(q_a2, q_a1)) {
            if (Q_LESS(q_a3, q_a2))
                Q_SWAP(q_a1, q_a3);
            else {
                Q_SWAP(q_a1, q_a2);
                if (Q_LESS(q_a3, q_a2))
                    Q_SWAP(q_a2, q_a3);
            }
        }
        else if (Q_LESS(q_a3, q_a2)) {
            Q_SWAP(q_a2, q_a3);
            if (Q_LESS(q_a2, q_a1))
                Q_SWAP(q_a1, q_a2);
        }
    }
A D template function is distinguished by having two parameter lists, the compile time parameter list and the runtime parameter list. Q_UINT would be a type parameter with its type inferred from the function arguments. Q_LESS and Q_SWAP are alias parameters, which can be an alias to any symbol or type. This is often used to pass lambdas.

Note that D allows the following ways to pass a parameter:

1. by value

2. by reference

3. by name

4. by type

The alias parameters are "by name". C++ has by name parameters in the form of template template parameters, along with the restriction that such a parameter can only be a template. D allows any symbol or type to be passed by name.

Passing the lambdas by name means a direct function call/inlining rather than an indirect call through a function pointer.

Anyhow, with this example you can see how the rest can be fairly easily translated.

So, you might ask, what's the advantage?

1. name hygiene

2. your debugger will see the symbols

3. no ugly \ line splicing

4. no wacky do-while(0) kludge

5. the names are actual symbols rather than transitory apparitions seen only by the preprocessor

6. color syntax highlighting in your editor works

7. the compiler will check the syntax and give error messages in terms of the symbols, not generated preprocessor text

8. It's not going to conflict with another symbol named Q_SORT


Parent comment is by the inventor of D, among other things…


SIMD being an exception still (unless maybe Intel compiler + C/C++ + proper annotations)


LLVM can do an awful lot with SIMD that isn't visible in clang.

We have experienced this with LDC in D.

What I've been told is that LDC produces better IR than clang (I haven't compared, I only know that LDC is practically magical from both mine and other peoples experiments).

About the only thing I've seen that has problems with inlining is inline assembly. Intrinsics are fine (which you don't need thanks to vectorization being practically magical as long as you do some annotations like assert and get the memory layout right).


> But you did not present a concrete suggestion for an alternative to the C pre-processor.

If someone asked, I would. But since I do have a dog in that hunt, I felt it would be more appropriate for others to make a suggestion, as there are several.


Oh come on, Walter, don't make us beg... Show us how you'd do it in D to achieve the exact same result.


I would probably use a separate templating language to manually generate variants of the function for each type I need, check those in git along with the template.


>"Not everyone knows there are better ways to do things."

In theory you might be right but I doubt it is applicable in this particular case.


Maybe they don't want to move to a more advanced language, maybe they want to mess around with the C preprocessor. :)


If you want to show cleverness, by all means!

I still use some primitive tools, and know there are better alternatives, but I'm not going to defend sticking with them.

BTW, I did win the Obfuscated C contest one year, and (naturally) used the preprocessor:

https://www.ioccc.org/1986/bright/bright.c


Must... resist... being... nerdsniped... oh no too late, now I have to figure out what it's doing without looking it up. :D I've never entered the OCCC but I've read a bunch of past entries and always wanted to enter someday.


just last night i was thinking how cool it would be for an IDE plugin that expands rust macros for easier grokking. now i need one for your #defines, lol


cargo expand?


C is still the King of languages.


I agree with that

None of the new languages managed to capture what makes C great

The few "modern" ones are interesting, Zig, Jai and Odin for example, I like how they took C designated initialization to the next level

Zig's comptime is also nice

D is nice, but is loosing its charm now that the competition is here.. also what used to be a D strength now is a weakness; it's slow to compile thanks to the std.. debugging still is a pain, and code can become quite overly verbose


Almost, it is the prince of memory corruption bugs, the throne belongs to Assembly.


"A witty quote proves nothing" ~Voltaire


I like C a lot. One big advantage of the language is that its complete syntax and standard runtime functions can fit in one brain.

That said, I hate the way a lot of programmers use the preprocessor. For me, macros should be just simple expressions or functions (i mean they sohould be invokeable like functions) . But very often, there's a lot of magic inside macros, making them hard to use, let alone understand.


On one hand, neat.

On the other hand, I think it's sad that so much time and brainpower is going on contriving ways to get bad tooling to do something, rather than on the actual meat of the problem solving.

You'd think that in 50 years somebody could have invented a better preprocessor, and save the need for all the while(0) and other trickery needed for macros.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: