Hacker News new | past | comments | ask | show | jobs | submit login

Clojure relies on persistent containers - i.e. immutable containers where every mutating operation returns a logically new container, which are cheap (logarithmic) when implemented in terms trees - with transactional memory slots, which may refer to such a container. An update might be in the form of a function mapping the old container value to the new container value (with a value added or removed), but it may be applied multiple times before it "gets in", according to transactional semantics (rollback if conflict etc.).

This approach is different; for the specific purpose of concurrent updates to a hash table, Cliff's approach is far more scalable to multiple threads. But in many ways, the fact that it's a hash table is beside the point. Cliff is presenting this as a different way of thinking about lock free concurrent programming: consider every possible state and transition, and make sure that they all make sense.

As a way of reasoning about multithreaded programs, Cliff's approach is not as scalable as Clojure's approach, but it's very useful for thinking about low-level implementation details.




OK, but let me restate my question then:

Does Cliff's approach result in a faster hash table than Clojure's? Is it "better"? In some situations or in all? If so, which?

When would you want to use Cliff's instead of Clojure's built in one? Would you at all?


Far faster. You'd want to use Cliff's version if you have thousands of threads all potentially mutating the table at the same time.

But thinking in these terms is crude; it's more important to understand the principles, hence my answer. It's like asking which is better: taking the train or driving in a car?


Sure, I understand, but I don't think my question was crude. I agree it's important to understand the principles.

I understand the basic idea of how Clojure's data structures work with STM and how they're immutable and how a "new" data structure is created upon mutation without copying the old one. But that understanding isn't helping me.

I don't understand Cliff's version well enough to be able to perform benchmarks in my head, nor do I understand either of the data structures well enough to understand the principles (as you said) to know when to pick one over the other.

So you said that when thread-count > 1000 I should pick Cliff's? Is it that simple? I would suspect not. What if there's only 2 threads involved? I'd just like a generalized overview of which is better for what types of situations, and by all means, to understand the "why" of it would be great too.

Or is the idea that because Cliff's version is mutable and does the whole "conflating state w/identity" thing, that it has no place in Clojure at all?


With STM and persistent data structures, you're fundamentally limited by how fast you can change a single memory location; and you can't freely modify the memory location because of the risk of missed updates. Cliff's table operates concurrently over the whole data structure - there's no single bottleneck, rather every slot in the table is potentially an update location.

When I said thousands of threads, I was stressing the massively concurrent potential of Cliff's design, but more importantly, the point where the hardware is fully subscribed and physical hardware threads are frequently trying to update the data structure. If there are only 2 threads involved on a dual-core machine, depending on everything else (constant factors), I would still expect Cliff's table to be faster.

Clojure's approach is at a different level of abstraction than Cliff's approach. Probably a better analogy would be whether it's better to manufacture a car or use a travel agent. Not everybody could safely manufacture a car, but almost anyone can use a travel agent with moderate phone, social and organizational skills. But they also solve problems with radically different scopes.


I don't think you can have a hashtable in closure. From what I understand, none of their collections are array based, where as this is.

Clojure uses trees whereas java uses maps primarily.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: