I get that right that as a probabilistic data structure, the answer to `lookup(element)` to a prior inserted element might be false? How much faster is this implementation compared to alternatives? What's the use case?
I believe it's the other way around. If you inserted an item, then 'lookup(element)' is always true. However, you can get false positives, meaning that you might also get true for items that have not been inserted. The probabilistic aspect is the false positive rate.
One example of a use case would be a database keeping a set of existing column names in memory and testing a query against this set using a bloom/cuckoo filter before actually doing a disk read. That way, it doesn't have to do a disk read for a non-existent column or row.
I think you are misinterpreting for what cmu stands here. Look at the byline below the author names, they are from Carnegie Mellon University, Intel Labs, and Harvard University.
As u/nyc640 said, it is actually the other way around. False positives are possible. False negatives are not. The original paper linked gives the false positive rates of upto 0.19% (0.09% for the semi-sorted version).
Okay. Thanks. I read the readme that this is more for academic purposes, and saw nyc640 explanation (thanks as well!). But when reading the Readme I wondered "How would this go together with something like https://github.com/SamSaffron/lru_redux ?" Do you have thoughts on that?
I use both the lru_redux as well as https://github.com/jnicklas/memoit in a ruby app of mine, and if another gem (like a memoization scheme using a Cuckoo Filter) would improve performance it would be fun to try it out. But it is of course possible that this is not applicable, or it would just not be faster. I think it should not be an advantage (given that the others will be at its core just a hash map, so you don't need that membership test), but that's why I asked.
I can't really tell you if the Cuckoo Filter will improve performance since it depends on so many factors. But if you have some free time then why not try it out and find out for yourself? In case you do, please let me know what performance you derive out of it :)
I wrote it is mainly for academic purposes because the main advantage of Cuckoo Filter is that it takes a lot less space than other similar filters. It achieves that by using only a certain amount of bits(6) to store the fingerprint etc. In Ruby, you can't really control memory allocation on a bit/byte level so you lose that advantage.
The rest mainly remains the same. I suspect the bottleneck in this implementation is the hashing algorithm (FNV-1a). I chose it because of its relatively good collision resistance. Plugging in a faster hash algorithm would definitely improve performance.