I suspect memory indirection would clobber the theoretical perf, but I'd be happy to be proved wrong.
My inclination is that this would be slower than "standard" high perf radix sorting, but I'm not sure if the high level overview of this algorithm represents an equivalent level of implementation.
My inclination is that this would be slower than "standard" high perf radix sorting, but I'm not sure if the high level overview of this algorithm represents an equivalent level of implementation.