This is very good: it uses the original Hoare partitioning algorithm which moves two indices at opposite ends of the array partition toward each other, rather than the ill-considered Lomuto partitioning:
while (1) { \
do q_i++; while (Q_LESS(q_i, q_l)); \
do q_j--; while (Q_LESS(q_l, q_j)); \
if (q_i >= q_j) break; /* Sedgewick says "until j < i" */ \
Q_SWAP(q_i, q_j); \
} \
Lomuto is that algorithm that moves one index, swapping lower-than-pivot elements into a growing lower partition.
Lomuto ends up performing more comparisons.
What's more, Lomuto has quadratic behavior for sequences that are just repetitions of a value: every partitioning and sub-partitioning is a degenerate case. Hoare's algorithm deftly avoids this problem. You can explain that using the above snippet. If all elements are the same, then Q_LESS(q_i, q_l) is always false. Thus each of the two do/while loops executes one unconditional iteration, so that the two indices q_i and q_j march toward each other and meet in the middle. The repeating sequence thus nicely cut in half, and recursively so, ensuring O(N log N) behavior.
You can't easily banish the worst case from Quicksort, but implementations should not exhibit worst case behavior on inputs that have some obvious pattern, like a repeated value.
Lomuto partitioning has one major advantage over Hoare: Lomuto can be implemented in such that the inner loop is branchless, while the inner loops in Hoare are branchy and wreck the branch predictor.
Lomuto typically fixes the Dutch National Flag problem by keeping two 'midpoints'; left of the left midpoint is less than the pivot, right of the right midpoint is greater than the pivot, between the two midpoints is equal to the pivot. This makes most data inputs that result in the O(n^2) worst case of quicksort give O(n) performance instead.
Unfortunately the code is horrific looking. It's not something I'd ever want to do in a C macro.
And my algorithm pdqsort uses Hoare-style partitioning, is branchless, and properly solves the Dutch National Flag problem by being 0(n log k) for k unique values on average using only a single extra comparison per partitioning.
Basically a longer book on Quicksort with a lot more analysis.
He doesn't discuss any partitioning methods that do not move two indices toward each other from opposite ends of the subarray. It's not even on the table.
You get the choice of many sorting routines, including:
Timsort (stable)
Quicksort
Merge sort (stable)
In-place merge sort (not stable)
Shellsort
Binary insertion sort
Heapsort
Selection sort (this is really only here for comparison)
Grail sort (stable)
Sqrt Sort (stable, based on Grail sort, also by Andrey Astrelin).
Today I learned that for modern processors, quicksort code is small enough that the compiler can inline them if the definition is visible during compilation. See here for comparison of isort function implemented using plain static functions vs the macro in TFA: https://godbolt.org/z/arGjPGhKE . It's not as flexible as the macro form (not being able to pass two arrays at once like the sortByAge example in TFA), but I figure it should be friendlier to debuggers (though with all the inlining the experience is not going to be that great either).
IIRC, if LTO is enabled the calls to the less and swap functions can also be inlined even if the generic function is compiled to a separate translation unit. Haven't tried it for the quick sort code above, but I found that out for some other thing I experimented with.
The function is already inlined because the implementation is visible. You can check the assembly, it has no calls to the compare and swap functions.
I guess you're right, since the function accepts a void pointer, it could also point to an array of two array pointers. So yes, it can also process two arrays.
In many cases it's also enough to just enable LTO to give the compiler enough information for stamping out a specialized version.
But TBH I'm a bit tired of people shitting on the C preprocessor. It's a relatively simple text replacement tool, and provides an incredible amount of bang for the buck (e.g. many problems can be solved without having to add more specialized bells and whistles to the language). It's a trade-off like anything else in computing.
The C preprocessor was an easy and compact way to extend C back when memory was really really tight. You can't even fit a C++98 compiler into DOS's memory space, but you can fit a C compiler in 64K. (Well, an earlier version of C.)
>C has been adding generics anyway, like _Generic.
"_Generic" added function overloads to c, not generics. The naming is exceptionally poor.
The submission is an example of generic programming. It an implementatin of something that works on all types that fullfill certain constrains. In this case comparable and swapable. "All types" include user defined types.
"_Generic" does something completely different. The C++ equvivalent would be std::conditional_v<std::is_same<T, int>, myIntFunc, std::conditional<std::is_same<T, double>, myDoubleFunc ............ The exact opposite of a generic, I'd argue, as myIntFunc and myDoubleFunc do depend on the type. You might use the preprocessor to create these functions that differ only in type, but then again, you might do what OP did.
I wrote one back in 1983 or so for 16 bit DOS, too!
A few years later, I was at a C++ conference where they asked me to sit on an "Ask us anything" panel. I was there along with the developers of Microsoft C, Borland C, etc.
The first question was "do you still ship a version of your compiler that will run on a floppy disk system?"
Vendor 1 said sure, and launched into a long description of how the files could be shuffled about on the floppy to make it work.
Vendor 2 said sure, and launched into ...
My turn. I said sure, and said the floppy disk version costs $200 and comes with a hard disk drive. (That was the price of a hard disk in those days.)
That was the end of that, I never heard that question again from anybody.
I knew someone would come up with this kind of "wisdom". For starters you do not know what other languages they might be using for development. And whatever they do they definitely do not need patronizing.
I've done a share of C metaprogramming with the preprocessor myself, and have dealt a lot with other peoples' C metaprogramming.
I've also seen people use pages of algrebra as a substitute for a couple lines of calculus.
I struggled for years with a soldering iron, always frustrated by bad solder joints. Then, I discovered a Weller thermostat controlled iron, and get a perfect joint every time.
Not everyone knows there are better ways to do things.
Your experience soldering is a much better response. You present a concrete suggestion for use a weller thermostat controlled iron. But you did not present a concrete suggestion for an alternative to the C pre-processor.
You could do this with C++ templates, for instance. But the LESS and SWAP operations may or may not get inlined. With the C pre-processor you can be certain that the operations get inlined, since no other alternative interpretation is available to the compiler.
/* Sort 3 elements. */
void Q_SORT3(alias Q_LESS, alias Q_SWAP, Q_UINT) (Q_UINT q_a1, Q_UINT q_a2, Q_UINT q_a3)
{
if (Q_LESS(q_a2, q_a1)) {
if (Q_LESS(q_a3, q_a2))
Q_SWAP(q_a1, q_a3);
else {
Q_SWAP(q_a1, q_a2);
if (Q_LESS(q_a3, q_a2))
Q_SWAP(q_a2, q_a3);
}
}
else if (Q_LESS(q_a3, q_a2)) {
Q_SWAP(q_a2, q_a3);
if (Q_LESS(q_a2, q_a1))
Q_SWAP(q_a1, q_a2);
}
}
A D template function is distinguished by having two parameter lists, the compile time parameter list and the runtime parameter list. Q_UINT would be a type parameter with its type inferred from the function arguments. Q_LESS and Q_SWAP are alias parameters, which can be an alias to any symbol or type. This is often used to pass lambdas.
Note that D allows the following ways to pass a parameter:
1. by value
2. by reference
3. by name
4. by type
The alias parameters are "by name". C++ has by name parameters in the form of template template parameters, along with the restriction that such a parameter can only be a template. D allows any symbol or type to be passed by name.
Passing the lambdas by name means a direct function call/inlining rather than an indirect call through a function pointer.
Anyhow, with this example you can see how the rest can be fairly easily translated.
So, you might ask, what's the advantage?
1. name hygiene
2. your debugger will see the symbols
3. no ugly \ line splicing
4. no wacky do-while(0) kludge
5. the names are actual symbols rather than transitory apparitions seen only by the preprocessor
6. color syntax highlighting in your editor works
7. the compiler will check the syntax and give error messages in terms of the symbols, not generated preprocessor text
8. It's not going to conflict with another symbol named Q_SORT
LLVM can do an awful lot with SIMD that isn't visible in clang.
We have experienced this with LDC in D.
What I've been told is that LDC produces better IR than clang (I haven't compared, I only know that LDC is practically magical from both mine and other peoples experiments).
About the only thing I've seen that has problems with inlining is inline assembly. Intrinsics are fine (which you don't need thanks to vectorization being practically magical as long as you do some annotations like assert and get the memory layout right).
> But you did not present a concrete suggestion for an alternative to the C pre-processor.
If someone asked, I would. But since I do have a dog in that hunt, I felt it would be more appropriate for others to make a suggestion, as there are several.
I would probably use a separate templating language to manually generate variants of the function for each type I need, check those in git along with the template.
Must... resist... being... nerdsniped... oh no too late, now I have to figure out what it's doing without looking it up. :D I've never entered the OCCC but I've read a bunch of past entries and always wanted to enter someday.
just last night i was thinking how cool it would be for an IDE plugin that expands rust macros for easier grokking. now i need one for your #defines, lol
None of the new languages managed to capture what makes C great
The few "modern" ones are interesting, Zig, Jai and Odin for example, I like how they took C designated initialization to the next level
Zig's comptime is also nice
D is nice, but is loosing its charm now that the competition is here.. also what used to be a D strength now is a weakness; it's slow to compile thanks to the std.. debugging still is a pain, and code can become quite overly verbose
I like C a lot. One big advantage of the language is that its complete syntax and standard runtime functions can fit in one brain.
That said, I hate the way a lot of programmers use the preprocessor. For me, macros should be just simple expressions or functions (i mean they sohould be invokeable like functions) . But very often, there's a lot of magic inside macros, making them hard to use, let alone understand.
On the other hand, I think it's sad that so much time and brainpower is going on contriving ways to get bad tooling to do something, rather than on the actual meat of the problem solving.
You'd think that in 50 years somebody could have invented a better preprocessor, and save the need for all the while(0) and other trickery needed for macros.
Lomuto ends up performing more comparisons.
What's more, Lomuto has quadratic behavior for sequences that are just repetitions of a value: every partitioning and sub-partitioning is a degenerate case. Hoare's algorithm deftly avoids this problem. You can explain that using the above snippet. If all elements are the same, then Q_LESS(q_i, q_l) is always false. Thus each of the two do/while loops executes one unconditional iteration, so that the two indices q_i and q_j march toward each other and meet in the middle. The repeating sequence thus nicely cut in half, and recursively so, ensuring O(N log N) behavior.
You can't easily banish the worst case from Quicksort, but implementations should not exhibit worst case behavior on inputs that have some obvious pattern, like a repeated value.