Can anyone familiar with this say if it runs one coroutine per thread? How does a coroutine get it's own stack? If coroutines share threads, is it possible to switch in between a coroutine say when it's blocked on IO like it is in go?
I'm not an experienced go developer but my understanding is that it's possible in go to have millions of goroutines each blocked on IO. I don't see how a C function could mimic that. I thought stack moving was an essential requirement of any runtime that wants to mimic go in this way.
I've looked at some of the special macros like coroutine and go and it looks like libmill runs the entire function fn when you call go(fn) so if fn is blocked on IO then your entire thread is blocked on IO and you can soon grind to a halt if you have even a few blocked coroutines.
It creates a stack per coroutine and if the coroutine would block, it switches to another coroutine instead.
As for threads, it's single threaded (yet concurrent). If you want to take advantage of multicore machines, there's a step in the tutorial about that: http://libmill.org/tutorial.html#step7
One of the reasons goroutines are considered 'lightweight' is that they allocate a small stack to begin with which is reallocated and moved if more space is needed. Of course, stack-moving isn't possible in C for various reasons.
What does libmill do here? Does it allocate a large stack right off the bat? What happens in a stack overflow, does it segfault or will it just trample over whatever is beneath it?
Edit: an ounce of investigation is worth a pound of answered questions, or something like that. It appears stack sizes are configurable, but default to 256kb [1] (for scale, go defaults to 2k), and are retained in a cache of size 64 [#L72] (by default). Also the bottom page is used as a stack guard if posix and mprotect are available [#L90].
From a cursory glance there appears to be some data races in the stack allocation code, though I could be mistaken. E.g. stack.c#L137
Note that large stacks only use address space, not memory, so it's not such a problem (especially on 64-bit).
I've never used libmill but from a glance at the docs it appears they suggest using multiple processes for parallelism, not multiple threads, so there is no such thing as a data race.
I don't think there's a data race, given that the whole thing is single-threaded. Multiple processes should be used for parallelism (see step 7 of the tutorial).
As for the default stack size, it's hard (but doable) to work with smaller stack size in C. For example, humble printf() can allocate a buffer several kB long on stack.
select(), poll(), etc are available in C which enables non-blocking I/O. I dont know how libmill is implemented, but I assume it is using one of those functions for non-blocking I/O.
So it looks like there are facilities for doing some IO, listening on a socket, etc. But I don't see any facilities for waiting on locks, semaphores, etc. Maybe they could build that in a future release or maybe they consider it antithetical to the idea of "share via communicating as opposed to communicate via sharing".
> But I don't see any facilities for waiting on locks, semaphores, etc
There's a very good reason for this. First of all, you can use file descriptors as locks, semaphores, condition variables, etc using pipe(2) and other functions. On Windows you can use WaitForMultipleObjects, etc on Mutex objects.
But select/epoll/kqueue/WaitForMultipleObjects is a system call that goes in to the kernel. That is very inefficient if you want e.g. mutual exclusion using a mutex. In a modern OS, a mutex (or CriticalSection in Windows) is implemented using a futex ("fast userspace mutex"), which is just a spinlock in userspace that only calls to kernel if the mutex is contended. An uncontended futex can be locked and unlocked in less than 20 nanoseconds. A system call can be 100x slower than that.
Some languages, e.g. Haskell have a green threading system that implements mutexes and i/o in a consistent way with i/o using a userspace scheduler. But this depends on language and runtime implementation.
Sort of. They enable non-blocking network IO. Disk IO in unix always blocks (aio excepted). Disk IO in Windows can by async, but not via select or poll.
I'm not an experienced go developer but my understanding is that it's possible in go to have millions of goroutines each blocked on IO. I don't see how a C function could mimic that. I thought stack moving was an essential requirement of any runtime that wants to mimic go in this way.
I've looked at some of the special macros like coroutine and go and it looks like libmill runs the entire function fn when you call go(fn) so if fn is blocked on IO then your entire thread is blocked on IO and you can soon grind to a halt if you have even a few blocked coroutines.