Two problems I see here, based on the research I've done in high-performance graph algorithms:
- It's hard to find a "good-enough" graph implementation. The best hashtable is only a handful of percent better than the built-in ones. The best graph impl is 1000x or more better than any generic built-in one could be, so there's much more incentive to specialize (and people already specialize hashtables for just a handful of percent speedup!)
- The baseline complexity level of implementing a reasonable hashtable is fairly high, even if for a small dataset. The baseline complexity of implementing a graph algorithm for a small dataset is pretty low, and the real problems come in later / at larger scale. So in graphs there's less incentive to learn a complex library's API when "I could just hack it myself," unlike for hashtables where the API is simple and doing it myself is much harder.
> The baseline complexity level of implementing a reasonable hashtable is fairly high, even if for a small dataset.
I would disagree with this, it's actually really easy to make one if you're willing to do away with many features (which aren't essential, but provide performance benefits). Implementing one is just something you never have to do in most modern languages.
The work you cited is very impressive and very welcome.
But you seem to be implying that `std::unordered_map` is the default choice one would use, which in my experience is not accurate -- it is well-known to have serious perf shortcomings, and everyone I know uses some other implementation by default. Even so, the delta from `std::unordered_map` to the improved hashtable in the blog post is impressive, and just shy of 10x.
Graph algorithms frequently have 10x improvements from one state-of-the-art approach to the next -- for example, here's one from my own research[1]. The delta between state-of-the-art and "good default" in graph algorithms would often be around 100-1000x. And comparing state-of-the-art to the equivalent of an `std::unordered_map` would be another 10-100x on top of that, so 1000-100000x total.
Whoah, thank you for sharing. I only knew that just like dictionaries, there are quite a few implementation choices when making a graph, depending on what operations the algorithms needs to do often, and how sparse the data is.
- It's hard to find a "good-enough" graph implementation. The best hashtable is only a handful of percent better than the built-in ones. The best graph impl is 1000x or more better than any generic built-in one could be, so there's much more incentive to specialize (and people already specialize hashtables for just a handful of percent speedup!)
- The baseline complexity level of implementing a reasonable hashtable is fairly high, even if for a small dataset. The baseline complexity of implementing a graph algorithm for a small dataset is pretty low, and the real problems come in later / at larger scale. So in graphs there's less incentive to learn a complex library's API when "I could just hack it myself," unlike for hashtables where the API is simple and doing it myself is much harder.