Multiprocess versus Multithreaded... or why Java infects Unix with the Windows mindset

LogicHoleFlaw · on April 10, 2008

I like the Unix model. Lightweight processes with separate memory spaces, with well-defined interfaces between them. Pipes and signals. Message passing if you want to get fancy. Erlang, with its touted high-availability capabilities follows this model.

Threading is hard. I've written thread-oriented code and process-oriented code. The process-oriented code was far easier to develop and debug. Issues in threaded code are difficult to reproduce and even harder to fix.

I've been on both the OS engineering team and the app development team across several corporations. In a large managed Java environment, the OS team is powerless to stop a runaway process. To the OS, all that business logic is enapsulated in one giant opaque container. The kill -9 and fuser ommands are barbaric ways to manage your sensitive computing needs. Unix broke away from the monolithic style of development, but the Enterprise Java culture embraces it. On the other hand, the app development team can live in its own little world, not trusting the OS and reinventing wheel after wheel.

These days it's apparent that Unix has won the OS wars, at least on the server side. With Unix, you can trust your operating system. Take advantage of it. It has amazing reliability and if you can just work with it instead of against it, you reap many benefits.

Now we just need to convince the app developers that the philosophy of "many small tools working together" is superior to the "one giant tool doing everything". You don't want to be a giant tool, do you?

davidw · on April 10, 2008

Erlang is super cool, but... to be picky, it does and it doesn't share the Unix model. It does in the sense that you don't share anything between processes. But it doesn't because all those processes live in one real process, which is why spinning them out is so cheap. It's also why you have to be very careful about linking to 3rd party libs in Erlang - all it takes is one blocking call, and poof, everything grinds to a halt. IIRC, one defense mechanism they've developed against this is to actually start app linkages up in separate unix processes and talk to them via sockets.

yariv · on April 11, 2008

Strictly speaking, Erlang processes have a type of shared memory -- ets tables -- but the semantics are somewhat different from other languages that use traditional shared memory. However, when you need synchronization, you just use Mnesia transactions, which make working with this kind of shared memory a piece of cake. Also, Mnesia is distributed, a nice attribute that other STM implementations lack.

wmf · on April 10, 2008

Java does involve many small tools (JARs), but they all get loaded into one VM process. The issue is not modularity vs. giants but how modularity is implemented.

Tichy · on April 10, 2008

Ok, suppressing my first impulse to shout "bullshit", so just asking: how is Java making threads expensive? Because of the startup time of the virtual machine? That is hardly a Java thing, it is a problem that is shared with scripting languages and other languages that are running on a virtual machine. Within Java threads are dead easy, though. Not as easy as in Erlang, but to be honest, it is very rare that Multithreading becomes necessary in a web application anyway.

Second, how do processes help writing web applications? I don't think he understood what frameworks are made for. Not saying that some Java frameworks are not bloated. Most frameworks I know also have only one hook, the method that has been mapped by the routing. What alternative to the "hook approach" does he suggest? What is the essence of the article, except "different is better", without saying how "different" would look like?

Back to the "expensive threads" issue: maybe it is an issue if you want to run CGI scripts from Apache. If you run scripts from a server written in Java, it is not an issue, because the JVM is already running.

st3fan · on April 11, 2008

"""Because of the startup time of the virtual machine?"""

Fact is, we have hundreds of JVMs running that have uptimes of months. So really, we don't care about a JVM and app server starting up in 15 seconds. That is short term thinking. I'm rather fond of the long term stability though :)

LogicHoleFlaw · on April 11, 2008

I believe his point is that the cheap buildup/teardown of processes is inherently safer and more mature than a monolithic clump of threads in terms of isolating processes and ensuring resource deallocation. The startup time of the VM is a red herring in this case. I too have seen impressive VM uptimes. Almost as good as the OS uptimes I've seen.

Java threads are dead easy as threads go. However, I don't believe that they're going to be a sustainable solution as we move forward. As processor counts keep rising and the focus moves from scaling up to scaling out we need to find more easily understood and debuggable abstractions for dealing with a massively parallel world.

tx · on April 10, 2008

I can't stand either JVM or .NET: they're both exactly what Eric is saying - less mature, less featured, but nevertheless big and fat operating systems built on top of real OS. If you look at them this way, it appears that your applications are NOT portable (your app will need JVM OS to run on, it wont' run on just Linux or BSD or Windows).

I much prefer Python+C or Ruby+C pair running on POSIX-compatible OS. It gives you performance, portability and higher level of abstraction provided by Python/Ruby and native speed of C. You are also free to decide and pick between multiprocess/multithreading as well: nothing is forced down your throat.

On a side note, I believe that virtualization and especially paravirtualization make JavaOS or .NETOS obsolete even more, since it gives you a real "OS in the box": I can build my system using any kind of most obscure languages I can think of, package them all inside of a Xen image and deploy with a click of a button.

wmf · on April 10, 2008

OTOH, virtualization with bare-virtual-metal VMs would eliminate the Unix-Java impedance mismatch.

anupamkapoor · on April 11, 2008

i don't see / understand how it would, so can you please elaborate a bit on your idea ? thanks !

wmf · on April 11, 2008

There'd be no Unix, thus no Unix-Java mismatch. You'd only have Java tools (such as they are) to manage your Java code. Of course, you'd have different problems.

raganwald · on April 10, 2008

nickb, nice to see you posting some links :-)