Actually linear, or O(*n* log *n*)?

KirinDave · on July 6, 2018

Actually linear. O(n) + k.

Check my submission history. I keep trying to show people this work. Sadly, it never seems to take to a wide audience.

As a consequence, selective joins across 2 fields are also linear time.

AlexCoventry · on July 6, 2018

Interesting. Can you point me at the timestamp or a paper which shows the algorithm?

KirinDave · on July 6, 2018

It's complicated enough that you should just watch the whole talk. The paper is 90 pages.

AlexCoventry · on July 7, 2018

It's been six years since the talk... Why hasn't someone drilled down through all the irrelevant typing goo, and shown how to sort integers in linear time? Where does the standard proof that optimal algorithms are Ω(n log n) break?

KirinDave · on July 7, 2018

> Why hasn't someone drilled down through all the irrelevant typing goo, and shown how to sort integers in linear time?

I mean, if that's all you want Radix sort and American Flag sort are wikipedia fodder.

As for the "typing goo" is, uh, fairly important for how it's derived and what it does. A lot of advanced CS work uses it now, because it's quite powerful. A huge amount of pure-CS research is done in that world.

I guess we don't see this work in more contexts because it's just very very difficult to get it right. Edward Kmett, who is globally recognized for his talent in the field, did a haskell implementation which we all use and love, but it's pretty excruciating to implement it outside of a functional and generic language (and it is genuinely hard to do it in a language without general side effects, so you get difficulty from both sides).

I think a lot of folks in industry just don't think it matters. Certainly when I talk to my peers they suggest it doesn't matter to their work at all.

As for "6 years", they author has continued to push the work. It's also been used in a variety of applications. You can cite search the original paper. https://pdfs.semanticscholar.org/f425/7af9221ca7fe21dc84a049...

https://scholar.google.dk/scholar?oi=bibs&hl=en&cites=180821...

One particularly interesting work was computing specific types relational algebras producing products in linear time, which is pretty cool: https://dl.acm.org/citation.cfm?id=1706372 (btw, sorry, ACM is down tonight they're showing us their cloudflare).

This is not new stuff in the world of graduate CS, anymore than succinct data structures or wavelet trees. It's seldom applied in industry, but many of these techniques and sorts are taught as upper division courses in recent, algorithm-and-theory intensive CS programs. That stuff just doesn't make it out of academia very fast these days. Academics face a lot of pushback from industry now with disruptive findings. Look at how Emin Gun Sirer has been blackballed by the cryptocurrency community, for example. Look at how steadfastly folks have stood by JWT despite Lentczner's team's work on a better alternative in Macaroons.

AlexCoventry · on July 7, 2018

Where is the Haskell implementation? Are there public benchmarks of its asymptotic performance?

I'm pretty sure a worst-case linear sorting algorithm would have multiple NYT articles about it.

EDIT: The edits to the parent since I made this comment make it look a bit silly. It made more sense in the context I was initially responding to.

EDIT 2: Thanks for linking the paper. From there:

> As Proposition 1.1 and the corresponding well-known combinatorial lower bound of Ω(n log n) (Knuth 1998, Section 5.3.1) for comparison-based sorting show, we cannot accomplish efficient generic partitioning and linear-time sorting by using black-box binary comparison functions as specifications of equivalence or ordering relations. Instead, we show how to construct efficient discriminators by structural recursion on specifications defined compositionally in an expressive domain-specific language for denoting equivalence and ordering relations.

Which sounds like the advantage I speculated on in the grandchild below. I understand, now, and it's a nice result. Thanks for pointing it out.

KirinDave · on July 7, 2018

https://hackage.haskell.org/package/discrimination

Which, by the way, contains boatloads of novel work. It uses internal parallel execution to work around some of the problems with discrimination passes between two groups. Cutting edge stuff.

It's haskell. People like you have chips on their shoulder about "typing goo." I'm sure one day someone will write a golang implementation then go to some inventor-centric conference and be hailed as a genius tho, as per usual.

But, did you not know about Radix sort or American Flag sort? Counting sort is a common stupid silicon valley interview question, and has been in the ruby code katas since like 2003. So as keen as I am to see this work widely distributed, it's not totally unprecedented the way you make it out to be.

Hey, want to know another surprising one? With a proper pre-encoding you can compress an (extremely long) binary value and answer questions about how many bits in a given range have occured (a popcount) in constant time. You can also, by extension, figure out the offset of the Nth hot bit appears, in O(1) time. Pretty surprising, huh? Especially when you realize that the entire space of 32 bit integers is only 512mb as a fully realized bitfield, 32x smaller than actually storing one instance of every integer.

See here: http://alexbowe.com/rrr/

AlexCoventry · on July 7, 2018

> But, did you not know about Radix sort or American Flag sort?

Oh, is it that the types somehow mandate roughly evenly-sized ordered partitions at any sample size? If so, I get it, now. Thanks.

KirinDave · on July 7, 2018

No I'm afraid they don't. I confess I regret this conversation. I've gotten very little out of it other than a confrontstonal attitude from you and a bunch of confusingly worded challenges.

jstimpfle · on July 7, 2018

It's easy to see that you two weren't on the same page for quite a bit. I can't see a confrontational attitude. Please avoid belittling the other commenter - they never asked for a long-winded explanation (slightly bragging?) but were obviously only lacking a bit of essential knowledge (O(n log n) is for comparison based sorts).

Anyways, thanks for posting the links, although I'm not sure I will get anything out of it :-)

KirinDave · on July 7, 2018

Sorry, people come at with me with "typing goo" and I tend to assume the worst in this forum.

AlexCoventry · on July 7, 2018

If you have any suggestions on how I could have been clearer, I'd welcome them. I'm always glad to learn of flaws in my writing.

Can you get linear time even on Haskell's arbitrary-precision `Integer`s?

jstimpfle · on July 7, 2018

In general asymptotic analyses (including the well-known O(n log n) bound for comparison based sort) assume a model where operations like compare, swap, arithmetics, etc. are constant-time. So they only work for bounded integers, practically speaking.

If you want to know for the general case, you need to adapt your machine model and redo the analysis. Operations will likely be not constant-time but something like log(x) where x is the size of the integer at hand. But of course, that's up to you and what you want to do and how you build your machine.

KirinDave · on July 7, 2018

I think this isolation is done to critique the algorithm in question in isolation of any underlying structure.

What's cool about Discrimination sort is that it does, in fact, can give these linearity guarantees for most kinds of data a computer can represent, including things like strings which have variable length or structs. There are some caveats there, but they're easy enough to work with that nearly every default structure in haskell as seen here: http://hackage.haskell.org/package/discrimination-0.3/docs/D...

Note particularly the instance for [a], and that you can extend it to something like Data.Text's codepoints (and this would work out of the box with something like ByteString's chunked form).

KirinDave · on July 7, 2018

"typing goo". Generally complaining about the methods used by someone with a novel-peer reviewed result and demanding further and further reductive summarization on your terms, defining said terms with phrases like "typing goo" is bad form.

Further suggesting a lack of credibility by suggesting press coverage should be in place is confrontational.

I lost my temper after you did these things a few times. I apologize.

And yes, it can get O(n) performance with Arbitrary length integers. These analysis exclude underlying comparison times.

AlexCoventry · on July 7, 2018

Thanks for the feedback.

> suggesting a lack of credibility by suggesting press coverage should be in place is confrontational.

You edited out the context I was responding to, there. As I recall, you were originally explaining the "six years" in terms of it just being hard to get academic results into the mainstream. That seemed extremely unlikely for such a well-known result.

I'm not bothered by your invective or personal attacks, but editing your comments so that my responses are out of context is pretty disturbing.

KirinDave · on July 7, 2018

I edited a cloned paragraph with a missing sentence that was accidentally added. I prepared that reply on mobile, and that can happen because you don't see scroll bars.

The literal and figurative content weren't changed. I just removed duplication and improve the flow of the post.

AlexCoventry · on July 7, 2018

Oh, BTW, I love types. I'm programming in OCaml at the moment. I just couldn't see the relevance.

KirinDave · on July 7, 2018

Did you try?

Demiurge · on July 7, 2018

Thanks for selflessly posting! :)