Two problems I see here, based on the research I've done in high-performance gra...

DSMan195276 · on March 4, 2024

> The baseline complexity level of implementing a reasonable hashtable is fairly high, even if for a small dataset.

I would disagree with this, it's actually really easy to make one if you're willing to do away with many features (which aren't essential, but provide performance benefits). Implementing one is just something you never have to do in most modern languages.

brazzy · on March 4, 2024

> The baseline complexity level of implementing a reasonable hashtable is fairly high, even if for a small dataset.

Have you tried doing it? My experience was that it was surprisingly simple. We may have different expectations for what is "reasonable", of course.

bruce343434 · on March 4, 2024

More than just a handful of percent[1], but ok

[1] https://probablydance.com/2017/02/26/i-wrote-the-fastest-has...

obi1kenobi · on March 4, 2024

The work you cited is very impressive and very welcome.

But you seem to be implying that `std::unordered_map` is the default choice one would use, which in my experience is not accurate -- it is well-known to have serious perf shortcomings, and everyone I know uses some other implementation by default. Even so, the delta from `std::unordered_map` to the improved hashtable in the blog post is impressive, and just shy of 10x.

Graph algorithms frequently have 10x improvements from one state-of-the-art approach to the next -- for example, here's one from my own research[1]. The delta between state-of-the-art and "good default" in graph algorithms would often be around 100-1000x. And comparing state-of-the-art to the equivalent of an `std::unordered_map` would be another 10-100x on top of that, so 1000-100000x total.

[1]: https://dl.acm.org/doi/10.1145/3210377.3210395

bruce343434 · on March 5, 2024

Whoah, thank you for sharing. I only knew that just like dictionaries, there are quite a few implementation choices when making a graph, depending on what operations the algorithms needs to do often, and how sparse the data is.