Hacker News new | past | comments | ask | show | jobs | submit login
Why threads vs events is a nonsensical question. (swtch.com)
92 points by xtacy on Aug 20, 2011 | hide | past | favorite | 46 comments



Russ is pretty much describing Go as it existed in its developer's minds in 2007. He even explains nearly all of Go's (future) interesting features - with nearly identical syntax - before he even gets to an example.


Well, actually he's describing Alef as it existed in the Plan 9 system in the early 90s. The creators of Go were also involved in creating Plan 9, so presumably they drew on their earlier experience of channels in Plan 9 when they developed the concurrency system for Go.


Go existed in various forms for a very long time. I was learning alef in the mid-nineties. When go came out, it was pretty easy to recognize.

That was a great explanation, though.


A small usability note - I looked at the page, seeing no way to do anything with it, I reloaded it ("So little text, so much votes on HN - there must be something there!"), and then - still not knowing what it is - accidentally clicked on the page, which moved me to the next slide... There is no hint there that it's a presentation and you can move forward by clicking on it.


It's completely unusable on an iPhone.


You can also use the arrow keys.

(and there is a hint that there is more to it, in that it showns the slide # in the bottom right corner)


turn off javascript (I didn't even know it was a slide deck, I have js off (NotScripts extension) in Chrome).

To be fair to Russ, it is his slidedeck from doing his talk, not a publication. It was only the 2nd IWP9, we've got a bit better since then :)


2nd IWP9 was a good one, though--and I got to meet Dennis :)


Right, it's never been either/or, they can be used to together. Several years ago I implemented libevent support at mysql using a pool of threads, allowing an order of magnitude more client processing. And for many years prior Microsoft had APIs and examples supporting similar operations on Windows.


In my view, the biggest drawback of the evented approach is that logically sychronous code involving IO becomes very low level spaghetti code. E.g:

  log(write(process(read(source))))
becomes:

  read(source, function(input) {
    write(process(input), function(success) {
      log(success)
    })
  })


I don't see that as spaghetti code at all, especially when you want to handle other things while the read() or write() is blocked. Callback-style code allows you to see the interleaving of events. If you use threads then the interleaving is non-obvious.


I guess if you don't see the interleaving in log(write(process(read(source)))) our mental models are too different to merit further discussion of this aspect.

And of course thread based code can do other things as well during blocking IO - on other threads.


I meant to say "logically sequential" not "logically synchronous".


use channels, you can make them block so the first loop will wait for logger to finish the logging if you like

    do { 
       source = <- sources // read a source from the sources channel
       if source
           data = read(source)
           process <- data  // send it to the process channel
       else
           process <- nil
    } while source

    fn processor {
        do {
            data = <- process
            result = process(data)
            results <- result
        } while data
    }

    fn writer {
        do {
           data = <- results
           msg = write(data)
           log <- msg
        } while data
    }

    fn logger {
        do {
           msg = <- log
           if msg
                print msg
        } while msg
    }


My ideal version would be something like

  parallel for(item in source)
    log(write(process(read(item))))
And let the compiler/runtime figure out how to do that. It could generate something similar to your code for instance.


ok, but now I have a generic logger that I can send all kind of messages to. I can use the source channel anywhere and pop a new data source into it etc. etc.

channels rule :)


The important question is: how can humans correctly write concurrent code?

We think of "threads" as one of the options. The thing is that we've mostly been using threads with locks (e.g. Java), and slide #3 points out where we have gone wrong:

  Drawbacks of threads
    ...
  Drawbacks of events
    ...
  Actually drawbacks of locks and top-level select loops!
In fact, it is "locks" that humans have trouble with.

Note the title of the presentation - Threads without Locks - suggesting another option. (as others have noted, this presentation describes Go, and I personally believe this is why Go will do well)

edit: wording, formatting


I'm not really sure, but it seems like an Erlang execution model (except for single-assigment, to which everything can be, and frequently is, automatically boiled down anyway)


This is exactly where I want to go with RingoJS: Many threads with private mutable scope, global read-only scope, worker/actor based thread interop, one event loop per thread.

Currently we still have shared mutable state that sometimes requires locking (unless you resort to a functional style of programming): http://hns.github.com/2011/05/12/threads.html


Threads tend to be event driven anyway. If one used, say ObjecTime back in the Olden Days, one could actually configure which FSMs had their own O/S thread and which were shared on a single thread.

This being said, event-driven as a design choice has much to recommend it.


No, threads tend to be:

   while( connection = accept_connection ){
       in_thread_run(handle_connection, connection)
   }


That is event-driven code. A typical accept_connection only returns after receiving a connection event. The I/O code in handle_connection blocks until the event that data can be read/written. The event handling just happens in the OS instead of in userspace with poll/select/etc.


Eh? If you reason that way then all I/O code is evented and that makes the discussion meaningless. Event-styled code doesn't have anything that blocks except the event loop I/O multiplexer and everything returns to the event loop ASAP.


anyone has the talk or video download?

sometimes you just can't get enough from slides...



YES. THANK YOU. I was just ranting about this elsewhere. People go "threads are evil" with vague rationales about getting locks right and such, and insist we all use separate heavyweight processes. It's ridiculous.

Selection of sane data structures and communication channels can get you virtually all of the safety and ease of separate processes WITH the performance benefits of a shared memory space.

It reminds me of people that criticize C++ for allowing memory leaks. There as here, simply selecting the right primitives and development strategy in advance make the problem simply disappear.


> People go "threads are evil" with vague rationales about getting locks right and such, and insist we all use separate heavyweight processes.

Insisting processes be used != insisting OS processes be used. Although most language don't give any choice in the matter.

> Selection of sane data structures and communication channels can get you virtually all of the safety and ease of separate processes

It gives you none of the safety, as you have to be very careful in ensuring no mutable datastructure is ever shared unknowingly. When using processes, you can't share memory implicitly, which is safe.

> There as here, simply selecting the right primitives and development strategy in advance make the problem simply disappear.

That's bullshit. It may make the problem less prominent, but it can not make the problem disappear.


Virtually disappear. I haven't had a significant memory leak issue for years, and I program almost exclusively in C++.

Scoff if you like, but consider you may not know everything there is to be learned about the craft.


It may make the problem less prominent, but it can not make the problem disappear.

It would look quite weird and most other programmers would think you were crazy for having done it, I think it's possible to write a C++ program that provably doesn't leak. You could define a custom operator new for every type which ensures that it gets allocated with some smart pointer or GC heap.

You could probably still use most of the C++ standard library that returns something needing manual de-allocation (except perhaps new and the old C malloc itself, which can be banned in various ways).


I guess the code I work on is weird.. who knew.


Why would it look quite weird, and why would anyone think you were crazy? It's standard practice and quite easy in C++. Smart pointers and RAII are your friends.


I agree with you. That's how I code and I don't have any problem with such leaks. But the usual response by people who don't believe that is "well the language doesn't force you to use them".

I was thinking of the weird tricks that would have to be in place to plausibly prove that there was no unmanaged dynamic allocation going on.


So you go from distinguishing between processes on the language and OS levels, to categorically declaring you can't share memory implicitly with processes.

A facsimile of a process that isn't implemented as an actual process is going to be in a shared address space. Your pet bondage-and-discipline language might work to prevent one pseudo-process from interfering with another, but I don't see it being equal to full-blown processes, nor do I see it being substantively more trustworthy than making a few simple, easy decisions about how to structure your programs.


> categorically declaring you can't share memory implicitly with processes.

That's kind-of the whole point, and difference between threads and processes. If you have implicitly shared mutable memory with processes, your processes are broken and you have threads.

> I don't see it being equal to full-blown processes

Really?

> nor do I see it being substantively more trustworthy

That's interesting. So you don't see how the language enforcing a share-less discipline would be more trustworthy than people trying to do so informally?

> than making a few simple, easy decisions about how to structure your programs.

Such as not using any third-party code which has not been fully audited to the statement? What a simple and easy decision that is.


People go "threads are evil" with vague rationales about getting locks right and such, and insist we all use separate heavyweight processes. It's ridiculous.

Nevermind the fact that (almost) all the things that (supposedly) make threads evil are still there when you use heavyweight processes.


The one thing that makes threads evil is shared memory, especially when doing so unknowingly.

You can't share memory unknowingly with processes, when you can share memory at all.


unknowing is key. Assume you are a programmer that can reliably code threaded, shared, mutable memory code… and you have to make a library call.

Is it safe? Does it say it's safe? Do you believe it? What about the next release? If you single thread all calls to the library to be safe, is your program still provably deadlock free?

The unknowns eat up a lot of thinking.


If you are building an OS kernel, database kernel, or writing high-performance computational code (after having prototyped a low-performance version, and made sure that the high-level algorithmic design is sound) then must be picky about the libraries you use anyway.

If you are writing a quick script, then you don't want to be picking through your libraries source code for thread safety.

But 99% of people who use threads are just looking to keep a GUI looking responsive, which only takes a couple of processes anyway. Sand-boxing different components into their own process seems to be the way players like Google (Chrome) and Apple (Lion) are going anyway.

While threads have their place, I think it's the same kind of place that inlined assembly should be considered.


They certainly do, but I'm very skeptical when I'm told that a language/runtime cannot do X because in 95% of all cases X is the wrong thing to do.

What about the other 5%? Working around the lack of threads in those remaining instances takes orders of magnitude more work. Anyone who has ever implemented complex data structures in shared memory or memory mapped files knows that. No pointers, no new/delete/malloc/free, no garbage collector, just a big blob.

It's definitely more difficult than using only a few well documented, high quality, libraries in the parts of the code where it matters.


Except deadlocks have nothing to do with sharing memory, and everything to do with sharing state. Shared state is necessary whether your address space is shared or not, and deadlocks will always be a risk in complex systems, whether the components involved are in the same or different processes.

Anyway, I'm mostly a server-side engineer, so I can't speak well to the disastrous mess of GUI libraries and such, but the libraries I use are, indeed, thread-safe, and I'm quite confident that they will remain so. Their state is maintained through handles, not global variables.


I did a bunch of research for my PhD thesis on this, and came to the conclusion that "the probability of concurrency errors is proportional to the square of the amount of shared state". Threads share state by default, which brings significant risk.

Also server side engineers aren't necessarily safe; there are a number of C library functions that are not particularly reentrant. This has been known for decades because reentrancy is important in signal handling situations. The safest way of dealing with asynchronous signals, in fact, is to use global flag variables and use the flag as an indication that one should e.g. call waitpid because at least one SIGCHLD has come in since you last went through the main loop. You may deny that this happens particularly often, but that particular hack was an important simplification of my life a few months ago.


The "unknowingly" part is the big difference then.

Besides that, you still have to deal with synchronization and mutual exclusion. Or you use a message passing model, which you can do with threads too, though your points in your other comment apply there, of course.


It's shared mutable memory that's the problem. Shared constants are alright.


Sure, but semantically constants are about sharing values not memory, so they don't fall under "sharing memory" as far as I'm concerned.


That is a good abstraction for thinking about code. In the meanwhile, machines still have "one memory, many cores" architecture, in which inter-proc communication is about references is a practical optimization (like ref counting of binaries in Erlang). Some of those design decisions will have to change when machine archs move to "many (core, memory)" form.


I believe most people are actually complaining "threads are hard to use", evil is just a possible outcome if they are used by incompetent programmers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: