Clojure relies on persistent containers - i.e. immutable containers where every ...

itistoday · on May 16, 2010

OK, but let me restate my question then:

Does Cliff's approach result in a faster hash table than Clojure's? Is it "better"? In some situations or in all? If so, which?

When would you want to use Cliff's instead of Clojure's built in one? Would you at all?

barrkel · on May 16, 2010

Far faster. You'd want to use Cliff's version if you have thousands of threads all potentially mutating the table at the same time.

But thinking in these terms is crude; it's more important to understand the principles, hence my answer. It's like asking which is better: taking the train or driving in a car?

itistoday · on May 16, 2010

Sure, I understand, but I don't think my question was crude. I agree it's important to understand the principles.

I understand the basic idea of how Clojure's data structures work with STM and how they're immutable and how a "new" data structure is created upon mutation without copying the old one. But that understanding isn't helping me.

I don't understand Cliff's version well enough to be able to perform benchmarks in my head, nor do I understand either of the data structures well enough to understand the principles (as you said) to know when to pick one over the other.

So you said that when thread-count > 1000 I should pick Cliff's? Is it that simple? I would suspect not. What if there's only 2 threads involved? I'd just like a generalized overview of which is better for what types of situations, and by all means, to understand the "why" of it would be great too.

Or is the idea that because Cliff's version is mutable and does the whole "conflating state w/identity" thing, that it has no place in Clojure at all?

barrkel · on May 17, 2010

With STM and persistent data structures, you're fundamentally limited by how fast you can change a single memory location; and you can't freely modify the memory location because of the risk of missed updates. Cliff's table operates concurrently over the whole data structure - there's no single bottleneck, rather every slot in the table is potentially an update location.

When I said thousands of threads, I was stressing the massively concurrent potential of Cliff's design, but more importantly, the point where the hardware is fully subscribed and physical hardware threads are frequently trying to update the data structure. If there are only 2 threads involved on a dual-core machine, depending on everything else (constant factors), I would still expect Cliff's table to be faster.

Clojure's approach is at a different level of abstraction than Cliff's approach. Probably a better analogy would be whether it's better to manufacture a car or use a travel agent. Not everybody could safely manufacture a car, but almost anyone can use a travel agent with moderate phone, social and organizational skills. But they also solve problems with radically different scopes.

runT1ME · on May 16, 2010

I don't think you can have a hashtable in closure. From what I understand, none of their collections are array based, where as this is.

Clojure uses trees whereas java uses maps primarily.