Trees, Hash Tables and Tries

AlisdairO · on April 4, 2010

In the real world, it's unlikely that a trie of any real size will be comparably performant to a hash table with a sane hash function: trie algorithms involve moving from node to node, each of which will require an unpredictable retrieval from non-sequential memory. Memory latency is pretty high these days (~200 cycles), so hash tables will almost always be quicker.

This is not to say that radix tries aren't extremely useful - they're compact and fast - but they're unlikely to match a hash table for raw performance.

elblanco · on April 4, 2010

I've done a bit of work with tries and hash functions and found that in-memory tries are almost universally faster than in-memory hashes, no matter how good the hash function is.

The difference though is on average around 20-30% faster if both implementations are well optimized. But they come with lots of downsides that need to managed well, tries explode into memory very quickly requiring all kinds of memory optimizations to the naive tries to keep the tree depth short (usually with a small hash on the leaves). I remember building an in-memory trie library once in Perl that outperformed a native C hash library by 70%. The development team's jaws dropped when my code was running, they were insisting it wasn't actually doing anything at all and I was faking the output. "There is no way Perl code can outperform our C." It was an interesting learning experience all around. Suffix trees likewise are insane for full text searching.

Hashes on the other hand have a few benefits, they are more compact in memory, on average have fast lookups (though if you try and constrain the hash table growth algorithm too much the can spend a lot of time on both collisions and on table growth operations) and have many many more high quality implementations in most standard libraries, including many very good and very fast on-disk hashes that only consume another few percentage points (single digits usually) off of speed.

The problem with tries is not that they are fast, there really isn't anything faster (in memory) that I'm aware of for lookup and retrieval problems. But tries are like sports cars, really fast, but lots of downsides. Hashes are like an upper end practical family car.

If you try and put the trie on disk, you end up with all sorts of problems that you don't really see in modern on-disk hash implementations. Coming off of disk, hashes tend to be the better option (in fact many of the on-disk trie interfaces I've seen actually just use hashes for the on-disk portion and optimize for tree depth in memory to keep things small).

So it depends, if the set of things you want to use for lookups is relatively small and you can cram it all into memory in a raw trie, it'll be a bit faster. If the set is big, and you want to use it off of disk, go with a hash.

Also, if the values you wish to store don't map well to a trie, like floating point numbers, you pretty much have to go with a hash. Fortunately, the hash function in that case is really easy ;)

AlisdairO · on April 4, 2010

That's really interesting - but it's hard for me to understand. What makes them faster for lookup than a hash table with a really good, fast hash function (say, murmurhash)? Does it depend on factors like string length? Basically, what's the environment?

My assumption is that it's faster because you might end up performing fewer string comparisons with the trie - if you have strings with common prefixes/sizes, this could cost you time on a hash with collisions. Although it seems like unless you have a lot of strings with common prefixes/sizes, this still ought to be pretty fast. And if you do have a lot of similar strings, you could store the hash value as well to speed this up.

My work on cache-sensitive data structures is in the integer space, so perhaps my ignorance when it comes to working with strings is showing :-).

elblanco · on April 4, 2010

Bingo, string comparisons will eat you alive. Even if you have a very small percentage of collisions in your hash (even close to 0%), you still have to check if you have a collision to determine if you need to iterate to the next slot in the hash no matter what. If you are hashing strings, you have to compare the entire string, each character comparison is guaranteed to be at least one integer sub operation for 1 byte wide character sets and 2 for unicode.

But basically, tries compute almost as fast as just reading the string, while hashes have to read the string a few times AND compute the function AND deal with collisions (AND resize the hash which is stupid expensive).

So to either insert or lookup in a hash you have to:

1) Compute the hash from the string, which naively involves reading the entire string into memory and computing the hash code. So for each character you have to move at least one or two bytes into memory, do something with it (likely some kind of integer op), store the result, then move the next byte or two in, do an operation with that and so on. Let's not forget that I also have to check for end of string, which uses at least one sub operation for each character as well for C style strings.

Otherwise you keep a strlen counter in a register somewhere and do sub operations against that, either way.

1b)If you are smart you simply restrict the length you compute off of to some smaller length that will still do a pretty good job of assuring a reasonable level of uniqueness in the hash, but you can't ever guarantee it (uniqueness is the principle problem of hashing functions).

2) But still, let's assume I'm dealing with a nice small 16 character string (n=16). I have to compute many more operations than just the string length, at least 1 mov, 1 sub (to check for EOS), and then a few operations for your hash and probably some copies back into memory, so let's be generous and say 8 ops for each character. So if n=16 at least 128 ops to simply compute the hash.

3) Then we have to do some pointer arithmetic or some table lookups or whatever to get to the memory location and store the string. So in all we're talking about 500-1000 ops to insert a 16 character string without even checking for collisions.

4) Factor that in, assume no collision, we might be on the order of 1200 executed ops in the end. If we have a collision, a couple thousand or more.

So for n=16, ops=~1200 minimum.

5) Even if there isn't a collision, I have to do a linear string compare to the hash->string value in the hash to assure me that there wasn't a collision. So we have to start moving integers into registers and doing a bunch of subs again (and checking for EOS on both strings now). This could easily introduce another 50-100 ops to ensure no collision.

Lookups on the hash are also of a similar scale. Heaven help you if you have to expand the hash and move all that crap from one place to another.

For a trie, there is no such thing as collisions. In the naive version, you simply allocate an array in memory the length of the character set + 1 in memory, which takes a couple of ops since you know the character set + 1 length ahead of time, use the character (I'm assuming 1 byte character lengths, but the principle is the same for 2 byte) as an operand in some simple pointer arithmetic. And store a memory location pointer in each array index to the next array (of character set + 1) until you finish off the string. If you are inserting, point to one more array and set that extra + 1 byte to something meaningful. If you are doing a lookup, you check that location to see if the byte is set to something meaningful (or you can just set that on the last character instead of one more down the tree, whatever).

The point is, for a 16 character string you count say two ops for memory allocation, an op to find the slot in the array, and another op for allocation or value setting for each character.

So in all, an insert or lookup happens almost as fast as a direct string compare, maybe on the order of n=16, ops=<100. Lookups can be even faster since you have have to short circuit the lookup (read: stop reading the string) as soon as you can't find just one character).

I'm wagging a bit, I've never built either structure in asm, but if you analyze the number of operations each takes, the trie is insanely faster. Even with memory access lags.

You can do some obvious speedups, like loading several characters into a register at a time for integer arithmetic, or loading the hash locations value chains or something...

One of the confusing points on this is that the big-O estimates I've seen never bother to account for the string reads and comparisons. With a hash you basically have to do them a minimum of two times, with a trie, only once (and sometimes not even that much). Usually you see hashes as O(1) best and O(n) worst, but really they are O(k) and O(k*n) where k is the string length while tries are really O(k).

All that being said, as soon as you try to work with either structure on disk, hashes easily overtake tries in terms of speed since hard drive random access times are much worse and tries require lots and lots and lots of random access (for each character).

http://www.statemaster.com/encyclopedia/Trie

AlisdairO · on April 5, 2010

Fascinating - thanks for the detailed response, and that does make a lot of sense.

It seems that if string comparisons are the major cost, you could speed up collision handling on the hash table quite a bit by storing additional metadata. Storing the size of the string, for example, would allow you to skip a lot of comparisons - you can do a simple size comparison prior to actually comparing characters. This would also allow you to do slightly more funky string comparisons, such as comparing characters from the end first, or even starting at the middle, to skip common prefixes.

Alternatively, instead of storing the size you could store the computed hash value for each string, which would allow you to compare hash values rather than full strings. The likelihood of a collision on a 32 bit hash value (say) is pretty low, so you'd mostly be avoiding extraneous string comparisons. This would also speed up growing/shrinking the table quite a bit. Have you tried these sorts of changes, and what (if any) difference do they make?

I can see that these optimisations merely reduce the cost of collisions rather than the baseline non-collision cost, and it definitely makes sense that tries require fewer operations. What I wonder is whether, as you scale up the trie to very large datasets, the superiority of the trie tails off due to the memory latency? Or are tries basically always superior in memory, due to their depth never getting all that large?

(To be clear, I don't mean to doubt your work on this subject, since you're clearly rather more knowledgeable than I am - I'm just interested :-) )

elblanco · on April 5, 2010

No, not at all.

I think this is an incredible interesting, and often overlooked subject. I've even seen hashes that resolve collisions by sticking tries in the slots instead of linked lists for chaining. That way the "I'm checking for a collision" operation part of the hash becomes "give me the memory location 100% of the time I want". It essentially eliminates hash collisions in the classic sense, and the tries in the hash slots usually don't get very big/you can delay resizing the hash for quite a while (just use a different method like when the first trie hold more than n strings).

Some of the optimizations you mention make a lot of sense and I was generally responding with the more naive variations of the two concepts. One thing to keep in mind is that tries grow in memory amazingly fast. So fast that the rest of your system will have to start swapping out just to keep the trie in memory. A list of a quarter million tokens, average length of 6 will probably eat about 300-400MB before optimizations. If you restrict the tree depth to 4 or so, and stick hashes on the end, you can keep it around ~100MB. But a quarter million tokens is not really all that much, real life applications will usually involve millions of token. And a naive trie will rapidly outgrow the memory on 32-bit systems. It's the classic tradeoff, memory for speed. However, hashes are not really all that bad speed-wise. So the tradeoff isn't quite as good as one might hope. If you have to go to a trie from a hash to eek out an extra few percentage points in performance, you're likely having bigger problems.

In practice I've actually only used a trie a couple of times over a hash. Mostly because in the applications I'm using, I never have to grow the hash which is the most expensive hash operation. But also because there are already so many very good hash libraries in most languages particularly persistent hashes which can be very memory efficient. The practical different in speed between the two is really not that noticeable for most applications unless you are doing billions of lookups a day (in which case you probably are looking for a different setup anyway involving some kind of database).

AlisdairO · on April 5, 2010

That is (again) really interesting/informative. Thanks - this is one of those conversations that really restores my faith in communicating over the web :-).

elblanco · on April 5, 2010

Actually, your idea of just sticking in the strlen on the hash nodes as a heuristic to prevent unnecessary collision checks is pretty good.

I too am glad to engage in this topic!

doty · on April 4, 2010

I've done a bit of work with tries and hash functions and found that in-memory tries are almost universally faster than in-memory hashes, no matter how good the hash function is.

Do you have any insight on why this would be? It's not very intuitive to me.

A hash-table with low load has relatively small hash buckets and so touches fairly little memory, and string keys are fairly local, so I would have expected memory costs to dominate. Insertion and deletion from a hash table should require fewer allocations than insertion and deletion from a trie.

elblanco · on April 4, 2010

see http://news.ycombinator.com/item?id=1240914

jules · on April 4, 2010

Here's an interesting paper. It describes a cross between a trie and a hash table. Instead of storing the string itself in the trie, it first hashes the string and then stores that in a trie (in a smart way). The memory usage is claimed to be very good, and the benchmarks in the paper seem to show that it's faster than a hash table for lookup...

http://lampwww.epfl.ch/papers/idealhashtrees.pdf

angstrom · on April 4, 2010

Another good example that can be more performant than a hash or a trie on a given alphabet is a directed acyclic word graph. Think of it as a trie that reuses edges, but has no cycles. This makes it very compact.

There's an implementation of one here that I've messed with, but it's a little expensive on the setup side.

http://www.pathcom.com/~vadco/dawg.html

The final result is that each node is stored in a 32bit word with the possible characters limited to a 5 bit field. 1 bit flag for word endings and I believe he employed offsets with the rest of the bits to perform the directed lookups of the other characters in a limited A-Z uppercase only alphabet. These types of graphs tend to extremely fast at runtime, but like I mentioned, the setup is costly. However, if you were embedding this in a device this would definitely be a worthwhile investigation. This is one of those cases where a data structure fits a very constrained, but well defined requirement.

jules · on April 4, 2010

Interesting. How much space saving do you see? DAWGs work for sets, but can it work for key-value maps? (I don't see how, because a final leaf can belong to multiple key prefixes, and therefore multiple associated values).

AlisdairO · on April 4, 2010

Thanks for the link! I'll be interested to take a look.

signa11 · on April 4, 2010

what about judy-arrays[http://judy.sourceforge.net/] ? they seem to be pretty seriously cache-optimized...

AlisdairO · on April 5, 2010

I've not encountered them before, but yes, they do seem to be very heavily optimised. I guess with relatively large nodes the quantity of cache misses is relatively low.

Dellort · on April 4, 2010

If I do recall correctly if you do not require dynamic insertion/removal; That is, you can make do with a one time construction. You will not need to traverse pointers but can put the trie into one contiguous block of memory.

AlisdairO · on April 4, 2010

That is absolutely true - it's been done with B+Trees, for example, as CSS-Trees. This does make finds faster (because you eliminate the space in the cache that pointers would require), but you're still going to be doing a lot of nonsequential access when performing finds.

liuliu · on April 4, 2010

I will argue that a properly implemented tries are more compact but not necessarily faster than hash table. The fact is, there are some badly implemented tries which even worse than hash table in term of memory consumption. Also, I'd like to point out the actually trie implementation may not have a fix k child node. I most often use k = 256 or 512 for the first few layers and 4~8 for the last few one in order to get "hash table" like performance for the first few layers and "tree" like performance for a deeper lookup.

mhansen · on April 4, 2010

I will argue that a properly implemented tries are more compact but not necessarily faster than hash table.

Where do you argue this?

The fact is, there are some badly implemented tries which even worse than hash table in term of memory consumption.

Any data structure can be badly implemented. What use is comparing a badly implemented data structure to a well-implemented one?

liuliu · on April 4, 2010

It seems in the article the author suggests that in certain case tries is faster than hash table.

Yes, you are right about every data structure can be badly implemented. But some data structures are more easily to get wrong. For example, everyone have no problem with selection sort, it is just hard to get it wrong. But for binary search, some decent programmer may write (a + b) / 2. Same case for tries, people more commonly (intuitively) write things like node_t* child[26];

timtadh · on April 4, 2010

I have also been getting into tries recently (last 6 months) and if you are interested in learning about them I recommend Sedgewick's Algorithms in C (parts 1-4) for a good introduction to the subject. Available on Safari if you have access to that.

senderista · on April 4, 2010

In the same way, any naively "constant space" algorithm that indexes into an array with a fixed number of pointers or indices is technically O(log n) in space, since you need log n bits to index n objects. So a space factor of log n can generally be treated as constant in practice.