What are Bloom filters, and why are they useful?

TTilus · on Feb 25, 2016

Bloom filters are awesomely nerdy stuff. But seriously, blog with no comments section? D'oh.

Would have liked to ask how do you decide how long to "teach" so that you don't degrade your filter? And also when speaking about efficiency the blog post totally omits the overhead cost of "teaching" (or "warming up" if you have cache mindset).

maxpagels · on Feb 25, 2016

Ideally, you'd initialise the filter with all the elements at startup. Obviously, the complexity for this is dependent on the number of elements to be added + a constant overhead for the number of hash functions k. So, for n elements, it would be O(kn) or O(n).

Adding elements degrades the filter and and deciding when to stop adding elements is purely down to what worse-case false positive probability you are willing to accept. The equation for this can be readily found online, but put simply, the probability p is a function not only of the number of bits in the filter's array, but also the number of elements already added to the set.