i think the key insight, at least for me, is that if you break things into piles of lots of little pieces and then you come up with n different ways of ordering those piles, similar things are going to have the same piece show up on top across those orderings.
add in the banding stuff and some simple probability and voila! a cheap and embarrassingly parallel way to approximate jaccard similarity across huge datasets.
add in the banding stuff and some simple probability and voila! a cheap and embarrassingly parallel way to approximate jaccard similarity across huge datasets.