Can anyone familiar with this say if it runs one coroutine per thread? How does ...

rumcajz · on Nov 18, 2015

It creates a stack per coroutine and if the coroutine would block, it switches to another coroutine instead.

As for threads, it's single threaded (yet concurrent). If you want to take advantage of multicore machines, there's a step in the tutorial about that: http://libmill.org/tutorial.html#step7

infogulch · on Nov 18, 2015

One of the reasons goroutines are considered 'lightweight' is that they allocate a small stack to begin with which is reallocated and moved if more space is needed. Of course, stack-moving isn't possible in C for various reasons.

What does libmill do here? Does it allocate a large stack right off the bat? What happens in a stack overflow, does it segfault or will it just trample over whatever is beneath it?

Edit: an ounce of investigation is worth a pound of answered questions, or something like that. It appears stack sizes are configurable, but default to 256kb [1] (for scale, go defaults to 2k), and are retained in a cache of size 64 [#L72] (by default). Also the bottom page is used as a stack guard if posix and mprotect are available [#L90].

From a cursory glance there appears to be some data races in the stack allocation code, though I could be mistaken. E.g. stack.c#L137

[1]: https://github.com/sustrik/libmill/blob/master/stack.c#L47

evmar · on Nov 18, 2015

Note that large stacks only use address space, not memory, so it's not such a problem (especially on 64-bit).

I've never used libmill but from a glance at the docs it appears they suggest using multiple processes for parallelism, not multiple threads, so there is no such thing as a data race.

rumcajz · on Nov 18, 2015

I don't think there's a data race, given that the whole thing is single-threaded. Multiple processes should be used for parallelism (see step 7 of the tutorial).

As for the default stack size, it's hard (but doable) to work with smaller stack size in C. For example, humble printf() can allocate a buffer several kB long on stack.

rpedela · on Nov 18, 2015

select(), poll(), etc are available in C which enables non-blocking I/O. I dont know how libmill is implemented, but I assume it is using one of those functions for non-blocking I/O.

sunnyps · on Nov 18, 2015

So it looks like there are facilities for doing some IO, listening on a socket, etc. But I don't see any facilities for waiting on locks, semaphores, etc. Maybe they could build that in a future release or maybe they consider it antithetical to the idea of "share via communicating as opposed to communicate via sharing".

exDM69 · on Nov 18, 2015

> But I don't see any facilities for waiting on locks, semaphores, etc

There's a very good reason for this. First of all, you can use file descriptors as locks, semaphores, condition variables, etc using pipe(2) and other functions. On Windows you can use WaitForMultipleObjects, etc on Mutex objects.

But select/epoll/kqueue/WaitForMultipleObjects is a system call that goes in to the kernel. That is very inefficient if you want e.g. mutual exclusion using a mutex. In a modern OS, a mutex (or CriticalSection in Windows) is implemented using a futex ("fast userspace mutex"), which is just a spinlock in userspace that only calls to kernel if the mutex is contended. An uncontended futex can be locked and unlocked in less than 20 nanoseconds. A system call can be 100x slower than that.

Some languages, e.g. Haskell have a green threading system that implements mutexes and i/o in a consistent way with i/o using a userspace scheduler. But this depends on language and runtime implementation.

cthrowago · on Nov 18, 2015

It's single-threaded, so coroutine locks are trivial and other locks "can't block" (against coroutines, anyway).

cthrowago · on Nov 18, 2015

Sort of. They enable non-blocking network IO. Disk IO in unix always blocks (aio excepted). Disk IO in Windows can by async, but not via select or poll.