> As for "bad spatial and temporal locality," can you be more specific?
One could use a lexicographic Btree-like structure with multiple characters per node, sharing consecutive cache-lines. For looking up the word "bar", you would traverse the "b" key from the root node, then its child "a" node, then the "r" child of the "a" node. Most of the nodes near the top would remain hot in the cache, and deeper less-visited nodes could be joined or compacted based on some criteria. Of course there are many subtleties, special cases and complications and the solution would be far more complex in terms of source code length.
With a hash table, you effectively compute a hash for every word, and then jump at arbitrary locations throughout the whole table.
You've traded hashing for a memory access per byte. I did something similar in the 'optimized-trie' program for Rust: https://github.com/benhoyt/countwords/blob/master/rust/optim... --- It ended up being slower. But, it is very much impacted by cache effects. So the next step here would indeed be to figure out how to reduce cache misses. But it's non-trivial.
Like, maybe your idea works. Maybe. I don't know. You'd have to try it. But it's not clear to me that it will. And it doesn't support "is far from what an experienced C programmer would write if performance was paramount" IMO.
One could use a lexicographic Btree-like structure with multiple characters per node, sharing consecutive cache-lines. For looking up the word "bar", you would traverse the "b" key from the root node, then its child "a" node, then the "r" child of the "a" node. Most of the nodes near the top would remain hot in the cache, and deeper less-visited nodes could be joined or compacted based on some criteria. Of course there are many subtleties, special cases and complications and the solution would be far more complex in terms of source code length.
With a hash table, you effectively compute a hash for every word, and then jump at arbitrary locations throughout the whole table.