Co-author of the paper here. The repo https://github.com/FastFilter/fastfilter_c...

HenriNext · on Dec 20, 2019

Thanks, found the benchmarks (your blog links primarily to different repo, "/xorfilter").

My use case is a new kind of debugger (https://BugJail.com), which works by capturing 20+ million events per second from running application, and then reconstructing model of program execution from the raw events. With 20+ million events per second that 110ns per key is just not going to work..

thomasmg · on Dec 20, 2019

Very interesting! If both construction and lookup need to be very fast, and if you do have enough memory, then I would say a blocked Bloom filter would be best (without knowing your use case in detail). With a regular Bloom filter, you will have lots of CPU cache misses. Cuckoo and xor filter might make sense if you can do (batch) construction in background threads, multiple filters concurrently. That way you can save memory.

HenriNext · on Dec 20, 2019

The filters are used to seek from database files, so lookup/query side performance is not major issue (Xor filters were mainly interesting because of the smaller size). Normally a secondary index would do the job fine, but because of the high sustained rate there is no time to write it. And all CPU cores are already maxed out, so moving more to background doesn't provide net-benefit..

+Thanks everybody for your encouraging words :)

jacquesm · on Dec 20, 2019

That is a very interesting project that you have on the go there, it doesn't happen often that I see tools that I think might be game changers coming along and this just might be one of those.

HenriNext · on Dec 20, 2019

Thanks!

Btw you have a bug in your HN profile, it shows "Karma 158422", which seems to be about two orders of magnitude off? ;)

jacquesm · on Dec 20, 2019

I know, I asked dang to fix it but he refused. I'll have to live with it I guess.

What I like about your project is that I've always felt that debuggers and such give us the same 'von Neumann' view of running programmers that we feel our CPU has. It's a bit like trying to figure out why your car won't start when all you have is a view through a microscope. And besides that, it is quite often that which is not happening that is indicative of an issue and debuggers usually allow you to focus only very well on everything that does happen.

HenriNext · on Dec 20, 2019

Maybe you feel that way because that's exactly what debugger was designed for: CPU register access in assembler programs..

It's pretty insane that every other aspect of software engineering has changed so dramatically since the 1970s, but debugger has remained fundamentally unchanged for almost half a century now.

Also.. next BugJail beta will have SQL queries against captured program execution (select * from field_write join method_call...) to test hypothesis about runtime behavior. If you can express "that which is not happening" in SQL and get back 0 rows, then it didn't happen. Do you think this could solve your need?

powturbo · on Dec 20, 2019

It might be possible to use integer compression with block offsets like in the TurboPFor [1] demo.

TurboPFor offers also direct access to single values in a block.

1 - https://github.com/powturbo/TurboPFor

HenriNext · on Dec 20, 2019

Thanks for the link, but actually I already use integer compression.

Nelkins · on Dec 20, 2019

The website mentions a waiting list. Can you point me in the direction of where I can get email notifications about this project? Would be very interested in this for .NET. Is it similar to https://devblogs.microsoft.com/dotnet/debugging-net-apps-wit... ?

HenriNext · on Dec 20, 2019

Sorry, the waiting list is indeed down atm :( You can just send one-line support ticket at https://support.bugjail.com/hc/en-us/requests/new and I'll add you manually.

The underlying technology in BugJail and Microsoft TTD is somewhat similar (capturing based), but the functionality build on top of that is very different: Microsoft TTD still looks and works like "normal debugger plus back button", while BugJail is total redesign from scratch and looks nothing like any traditional debugger.

rdevsrex · on Dec 20, 2019

Wow. That is a really cool project. I can see why you need that level of performance.

flafla2 · on Dec 20, 2019

This is damn awesome. Keep up the good work!

hinkley · on Dec 20, 2019

Yes but grandparent is correct. The first word in the title is still “faster”.

jchw · on Dec 20, 2019

I know it's frowned upon to say this at HN, but I'm beginning to wonder who's reading the actual page. At least to me the page seems pretty clear about it:

>The xor filters takes a bit longer to build, but once built, it uses less memory and is about 25% faster in some demonstration test.

When they say "faster," they mean for querying. Which is actually pretty useful for a lot of use cases where bloom filters are used today.

HenriNext · on Dec 20, 2019

To be clear, I wrote specifically "construction performance".

The blog already stated that "build is bit slower", so i was mainly commenting that instead of "bit slower" it is actually order of magnitude slower (in some cases).

According to the paper, the query side performance is indeed faster, so they are not misleading in any way.