Show HN: Cuckoo Filter in Ruby

onli · on May 16, 2017

I get that right that as a probabilistic data structure, the answer to `lookup(element)` to a prior inserted element might be false? How much faster is this implementation compared to alternatives? What's the use case?

nyc640 · on May 16, 2017

I believe it's the other way around. If you inserted an item, then 'lookup(element)' is always true. However, you can get false positives, meaning that you might also get true for items that have not been inserted. The probabilistic aspect is the false positive rate.

One example of a use case would be a database keeping a set of existing column names in memory and testing a query against this set using a bloom/cuckoo filter before actually doing a disk read. That way, it doesn't have to do a disk read for a non-existent column or row.

empath75 · on May 16, 2017

The paper that introduced the concept is linked.

https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf

onli · on May 16, 2017

I had a look at it before asking, but I still didn't know the answer to my questions. Especially the implementation specific ones.

But it is good to have the link in the comments :)

SimbaOnSteroids · on May 16, 2017

Unrelated but is Central Methodist University well regarded for their CS department? Or is this paper a bit of a one off.

onli · on May 16, 2017

I think you are misinterpreting for what cmu stands here. Look at the byline below the author names, they are from Carnegie Mellon University, Intel Labs, and Harvard University.

SimbaOnSteroids · on May 16, 2017

I am indeed, my mind was thinking way to local.

pkd · on May 16, 2017

As u/nyc640 said, it is actually the other way around. False positives are possible. False negatives are not. The original paper linked gives the false positive rates of upto 0.19% (0.09% for the semi-sorted version).

onli · on May 16, 2017

Okay. Thanks. I read the readme that this is more for academic purposes, and saw nyc640 explanation (thanks as well!). But when reading the Readme I wondered "How would this go together with something like https://github.com/SamSaffron/lru_redux ?" Do you have thoughts on that?

I use both the lru_redux as well as https://github.com/jnicklas/memoit in a ruby app of mine, and if another gem (like a memoization scheme using a Cuckoo Filter) would improve performance it would be fun to try it out. But it is of course possible that this is not applicable, or it would just not be faster. I think it should not be an advantage (given that the others will be at its core just a hash map, so you don't need that membership test), but that's why I asked.

pkd · on May 16, 2017

I can't really tell you if the Cuckoo Filter will improve performance since it depends on so many factors. But if you have some free time then why not try it out and find out for yourself? In case you do, please let me know what performance you derive out of it :)

I wrote it is mainly for academic purposes because the main advantage of Cuckoo Filter is that it takes a lot less space than other similar filters. It achieves that by using only a certain amount of bits(6) to store the fingerprint etc. In Ruby, you can't really control memory allocation on a bit/byte level so you lose that advantage.

The rest mainly remains the same. I suspect the bottleneck in this implementation is the hashing algorithm (FNV-1a). I chose it because of its relatively good collision resistance. Plugging in a faster hash algorithm would definitely improve performance.

pacuna · on May 16, 2017

never heard of this before. Really cool. I would love to see a bigger list of use cases.