Hacker News new | past | comments | ask | show | jobs | submit login
Erlang vs Java memory architecture (javacodegeeks.com)
71 points by javacodegeeks on April 13, 2011 | hide | past | favorite | 29 comments



Very interesting. I'm surprised that there's no way to limit the size of Erlang queues and therefore memory usage. Surely simple limits like that are a necessity for any production environment? Can anyone with deeper knowledge about Erlang comment on that?


The idea is that there is something very wrong with your design if you allow that to happen. In telco networks, once a message is accepted into a system, it is expected to be processed. You should throw away any messages you can't handle (because of overload) at the edge of the system. The same principle can be applied for any system which is serving any sort of network traffic.

If this unbounded message generation is happening within the system, again there is something wrong. You shouldn't be in a situation where a process is producing messages which other parts of the system cannot consume.


> In telco networks, once a message is accepted into a system, it is expected to be processed. You should throw away any messages you can't handle (because of overload) at the edge of the system.

That's attractive in general, but is it necessarily possible?

Assume a planar grid with O(N2) nodes of which O(N) edge nodes that can accept a message for a given node somewhere near the center. In other words, that "given node" is O(N) steps away from any edge node.

Suppose that said given node has limited capacity. How can it keep those edge nodes aware of its capacity without wasting capacity or requiring O(N2) queuing at said given node? Remember - it takes O(N) time to get a message from the given node to any edge node and during that time and there are O(N2) places for a message to be in the network at any moment in time.

Now repeat for the other O(N2) interior nodes.


Yes, it is possible. You have to ensure that every edge node is configured in such a way that it allows only a certain amount of traffic to a "inner" node. And if it finds that the inner node isn't as responsive as it should be, it has to scale back how much traffic it will accept.

For real time traffic, where queuing doesn't make sense, there isn't anything else you can do really.


> You have to ensure that every edge node is configured in such a way that it allows only a certain amount of traffic to a "inner" node.

Let's assume that we have 5 edge nodes and that each is configured to send t traffic. The total amount of traffic that they can send is 5t. If the inner node can handle 5t, you won't need to drop traffic in the network. However, you're likely to drop traffic unnecessarily at an edge node.

Suppose that you have 4t worth of traffic continuously. Clearly the inner node can handle that.

However, consider what happens if all of the traffic during hour n tries to enter the network at edge node n.

For example, during hour 0, node 1-5 see no traffic and node 0 sees 4t worth of traffic. By configuration, node 0 drops 3t and sends t to the inner node. Since none of the other nodes are sending any traffic, the inner node only sees t worth of traffic even though it could handle 5t.

You can try to get around this problem by dynamically configuring the edge nodes, but since they're as much as N apart and they're responding to presented demand, that configuration is always stale.

You can guarantee that you don't overload the inner node, but only if you're willing to waste some of its capability in some circumstances.


Yes, I agree. There are a few ways around that:

- Some out of band signalling between the edge and the core to indicate current load levels at the core.

- Implement similar form of overload protection at the core. You might now say, "Aha! But you're throwing away messages you've accepted". The answer is no: the edge nodes are typically just relaying messages after some checks, application layer processing is typically done in the core.

This thread is becoming slightly hand-wavy but I hope you get the gist of what I'm saying :)


> Some out of band signalling between the edge and the core to indicate current load levels at the core.

In-band/out-of-band doesn't make any difference - the edge nodes can't communicate fast enough. Remember, they're O(N) from each other. (Yes, two neighbors are adjacent, but there are nodes on the other side of the grid.)

Long wires doesn't help either - distance always matters.

> You might now say, "Aha! But you're throwing away messages you've accepted". The answer is no

Actually the answer is "yes" - The original claim was "In telco networks, once a message is accepted into a system, it is expected to be processed. You should throw away any messages you can't handle (because of overload) at the edge of the system."

> I hope you get the gist of what I'm saying

I get the gist - the problem is that you must either waste capacity, drop in the interior, or both because of distance and the fact that there are more interior nodes than there are edge nodes. (And, you can't have all edge nodes because of distance and fan-in.)

One of the subtle ways to waste capacity is to run the control signals faster than the data signals.


IIRC, the Erlang scheduler increases the cost of sending a message to a process as its queue grows, so that processes which are piling messages on a backlogged processes give it a chance to catch up.


I don't think this is true. I wrote a blocking queue implementation for the purpose of limiting the # of messages or amount of memory used by a queue.


I'll see if I can find a citation, pretty sure it was in a paper I read about optimizations in the Erlang VM's scheduler. May be a recent thing.

Edit: Here's one (http://www.haskell.org/pipermail/haskell-cafe/2006-January/0...):

"Regarding the issue of why a logger process in Erlang does not get overwhelved, this is the reply I got from Raimo Niskanen (Erlang team at Ericsson):

There is a small fix in the scheduler for the standard producer/consumer problem: A process that sends to a receiver having a large receive queue gets punished with a large reduction (number of function calls) count for the send operation, and will therefore get smaller scheduling slots."

Sending to a mailbox costs reductions proportional to how many messages are already queued. While this isn't a perfect solution, it should influence the scheduler quite a bit. (Imagine trying to do that sort of thing in node.js!)


There are two solutions to this. First is to have some kind of flow control between producers and consumers (send/wait for reply/continue). Second is to pull messages from sender whenever receiver is ready, perhaps with some rate-limiting or buffering.


So the memory in Erlang is isolated per thread basis, and it makes Erlang more scalable. While in Java we have only common/shared memory.


It's generally suggested that you make heavy usage of the private and final keywords when writing Java code in general and concurrent java code in particular.

ExecutorService + SynchronousQueue/LinkedBlockingQueue + Callables without mutable state = basically a functional coroutine model of programming in Java with non-lispy syntax.

If you're doing it right, you're not allocating new threads for every task (and neither is Erlang, they're "cheating" in roughly the same way I'm describing - their "threads" are really "tasks"). You push tasks onto your work queue, taking up very little memory per task, and your ExecutorService takes care of them. The only difference is you have to set up your ExecutorService yourself, which is a plus and a minus compared to the runtime implicitly managing one for you.. sometimes it'd be convenient to have a default coroutine manager, I guess.


The GC in Erlang VM is also per-process which allows it to be soft real-time. An application can be using gigabytes of memory, but it it's split between thousands of processes, garbage collection can be very very fast because the GC only needs to work through a small heap at a time.


You can also measure the amount of memory a process tends to use in typical operation, then spawn it with an appropriately sized heap. It never needs to do GC at all, it just dies and frees its heap when complete. Look at min_heap_size in spawn_opt/* .

It's the same concept as giving a copying garbage collector really large subspaces - GC only happens when one fills. Giving each process its own heap means that the GC pauses will still be brief.


And to my understanding it can be done on processes not currently scheduled so there is very little visible impact.


also Erlang "threads" are called processes but are not OS threads nor processes - there is generally one scheduler per core and the processes are swapped in and out by the scheduler.


They're more like coroutines, but they can be safely pre-empted because Erlang has very little mutable data.

Erlang's scheduler also takes dataflow (e.g. which processes have new messages to process?) into account, so some kinds of overhead are O(active processes) rather than O(processes).


> While in Java we have only common/shared memory.

That isn't entirely true. There are ThreadLocal variables in Java. However, these have some performance ramifications and have to be carefully managed so they don't leak, basically because the system wasn't designed with this kind of storage in mind.


From my memories of Java ThreadLocals, these are still common/shared memory. The ThreadLocal property is a reference - a convenient way to scope access without locking - but the objects stored are still in shared memory.

For example, correct me if I'm wrong, but in Java if you do something like:

  MyAwesomeClass myObject = new MyAwesomeClass();
  myThreadLocal.set(myObject);
  myOtherObject.someField = myObject;
Then the thread local and the field value are references to the same object, in the global heap. Any thread can get to "myOtherObject.someField" and change my thread's local copy.

This specific example is probably bad design, but the underlying difference is between a shared memory design and a message passing share-nothing design, where the latter doesn't let you make that kind of mistake.

In Erlang the paradigm is different enough that my bad example doesn't really translate[1], but even if two similar references were made equivalent, they're to _immutable_ objects not references - so from the programmer's point of view they're never shared at all.

In addition, if you want to share something then you send a message so the other process (aka thread) always has their own distinct copy.

[1] The process dictionary is kind of the same as thread local storage, but the kind of state you'd assign via a reference on a mutable field is dealt with differently.


That is an explicit sharing of an object between the thread-local-storage and the global heap by the programmer. If MyAwesomeClass is readonly in all methods or it is copied when assigned to myOtherObject, then it's the same as Erlang.


That's an interesting point. If the Erlang language can stop the programmer making mistakes like unintentional reference publishing across threads, that sounds fantastic.


That's one of its major intentions. There's a fairly good summary here: http://ulf.wiger.net/weblog/2008/02/06/what-is-erlang-style-...


Java has thread-local-storage which is the same as Erlang's private heap in term of concurrency isolation.


The variables themselves have concurrency isolation, but not the objects referenced from the variables. The thread local storage implementation in some JVMs uses the shared global heap.

There's no sharing at the application level or the implementation level in Erlang.


If you don't explicitly pass the objects referenced in TLS to outside or to other threads, I don't see it breaks the concurrency isolation.

For some JVM implementation that use a shared global heap, that may just impact the performance due to lock. The concurrency isolation contract is not broken. The app still has full confidence that its TLS values are not changed in other threads. OT: if performance comes into picture, one can claim excessive copying of immutable objects cause performance problem as well.


If you don't explicitly pass the objects referenced in TLS to outside or to other threads, I don't see it breaks the concurrency isolation.

Sure. That's the same as if you never share any resource in any shared global heap between threads.

The point is that Erlang's concurrency model allows the programmer to never make that mistake, because you're storing immutable resources not references to mutable data. See my longer post on the subject, further down the page. :)

OT: if performance comes into picture, one can claim excessive copying of immutable objects cause performance problem as well.

When you have a share-nothing immutable-data concurrency model like Erlang's, heap type is an implementation detail (and you'd probably choose it based on your performance requirements, as you've alluded to.) To wit, as TFA says, you can run the Erlang VM with either a shared heap or a separate heap model. Language semantics stay the same.


I am not sure this article has some real bearing for someone who wants to learn about Erlang. It actually might lead to more confusion that it's worth.


It looks like an entry about what Java can learn from Erlang.

It's not about learning Erlang.

I am pretty sure Java guys know quite well about various memory models. But early Java was kind of experimental, so backward compatibility affects what they can change now.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: