Good stuff, thanks for sharing. Random comments based on a quick glance through the code if I may.
chan_t* chan_init(int capacity);
should be "size_t capacity", since negative capacity values are invalid. This also eliminated the need for the first check in the function.
chan->buffered
is duplicate in meaning to chan->queue being non-NULL.
Why chan's mutexes and pcond are malloc'ed and not included in chan's struct? They are also not freed when channel is disposed, unless I'm missing something. Mutexes are also not destroyed when buffered chan is disposed, but they are created for it.
(Edit)
Also, my biggest nitpick would be that chan_t structure should really not be in chan.h header since it's meant to be transparent to the app and the API operates exclusively with pointers to chan_t. Then, if it is made private (read - internal to the implementation), then you can divorce buffered and unbuffered channels into two separate structs, allocate required one and then multiplex between them in calls that treat buffered and unbuffered channels differently. I.e.
struct chan_t
{
// Shared properties
pthread_mutex_t m_mu;
pthread_cond_t m_cond;
int closed;
// Type
int buffered;
};
struct chan_unbuffered_t
{
struct chan_t base;
pthread_mutex_t r_mu;
pthread_mutex_t w_mu;
int readers;
blocking_pipe_t* pipe;
};
...
or better yet, just include a handful of function pointers into chan_t for each of the API methods that need more than the shared part of chan_t struct, initialize them on allocation and put type-specific code in these functions. It will make for a more compact, concise and localized code.
Thanks, I'm not much of a C programmer, so I appreciate the feedback. I'm not sure what you're referring to with the mutexes and pcond not being on the struct because they are. Can you clarify? You are right with the disposal though, I need to clean that stuff up :)
that saves some mallocs and dereferences. If you are new to C, the notion of taking the address of fields in a struct may be unnerving, but you get used to it :)
I'm more of a c++ programmer, but I think i can chime in on the mutex/pcond being "part" of the struct. Essentially, OP means that why arent they on chan's stack? In the struct you have pointers to mutex and pcond which means you have to malloc them. OP is suggesting they not be pointers.
If he's gonna do it that way (Which I agree with and also recommend) he could probably use the container_of macro (You can implement it without gcc extensions - it has slightly less type checks however). Using the container_of he can safely turn the chan_t into a chan_unbuffered_t or chan_buffered_t or etc. assuming he does the proper checks beforehand to make sure it's actually inside of that type.
Also worth noting, he really shouldn't be using '_t' since it's reserved by the standard.
The container_of macro is intended for cases where contained objects are at some (possibly compile-time variable) offset into a structure. Eliding type-checking, it reduces to the simple cast in this particular case.
Not to say that it wouldn't work, just that it isn't really necessary - casting a pointer to a structure to a pointer to its first member (and back) is perfectly portable. Perhaps even idiomatic. (Though the casts in that gist seem to be backwards?)
That's exactly what we have here though, the 'base' chan_t structure is contained inside of the chan_unbuffered_t structure. And I disagree with your second point, I consider casting structures directly like that to be bad practice. It's ugly, error prone, and not guaranteed to work by the standard since struct ordering isn't guaranteed (IE. The standard doesn't say that 'base' will be at the beginning of the struct.).
That said, it is 'portable' since in general no compilers reorder the struct members, but container_of is guaranteed to work, works for multiple members in a struct exactly the same way, and will reduce to a simple cast when your offsetof is zero so there isn't any performance penalty in situations like this. IMO there's no reason not to use container_of in this case.
The standard permits padding structures, but not reordering their members. That's why container_of is necessary (to avoid constantly-updated manual offset calculations or writing redundant code). Any implementation that reorders structure members is non-compliant (and likely incompatible with much existing code).
To quote the relevant section (C99 6.7.2.1.13, same wording present in C89):
Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared. A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.
I'll repeat that I consider direct casts to be somewhat idiomatic. Macros are more traditional for offset members.
Using container_of is better because it doesn't lead to a broken code when someone (accidentally) splices another field at the top of the struct.
Or put differently, direct casting comes with an assumption, while container_of doesn't. The fewer obscure assumptions are there in the code, the better. I think we can all agree on that :)
I agree re container_of and I put it in initially, but then saw tyletreat's "not quite C programmer" and took it out. I remember being majorly confused by this struct-by-field recovery way back when I was just starting up with C.
Libtask[1] by Russ Cox (one of the creators of Go) may provide some inspiration. It implements channels but uses coroutines instead of threading. As a consequence it has some cool stuff like networking and it does all of the hard parts of saving/restoring registers for task switching.
Just a forewarning: that library, like most C cooperative threading libraries, is using ucontext.
ucontext is unbearably slow, to the point where it's just as fast to use real threads and mutexes, even on a single-core system. There is nothing lightweight about it. (The technical reason is because they call into kernel functions to perform their magic.)
The actual logic of saving/restoring registers and the stack frame requires about 5-15 instructions per platform, and is very easy to write in assembly. And for the platforms you don't do this for, there's a non-standard trick to modifying jmpbuf which works on x86/amd64, ppc32/64, arm, mips, sparc, etc. These techniques are literally hundreds of times faster.
Try libco instead. It's the bare minimum four functions needed for cooperative threading, which lets you easily build up all the other stuff these libraries provide, if and only if you want them. And if you benchmark libco against all of the other stack-backed coroutine libraries, I'm sure you will be stunned at just how bad ucontext really is. That people keep using ucontext in their libraries tells me that they have never actually used cooperative threading for any serious workloads.
libtask isn't using the ucontext syscall (normally), just the ucontext_t struct. If you look at the source code, it provides local assembly versions of the context swapping code for the most common architectures.
That said, your library is quite nice in that it just provides the co-routine wrappers and nothing else, which I appreciate. Also, your switch code has less instructions (are there cases it doesn't handle?), but if you look at libtask's code, it probably not going to be 100x faster. :-)
But it looks like you're right, context.c is actually defining its own version of swapcontext to replace the system function that can use other backends. I'm sorry for the mistake, thanks for correcting me.
> are there cases it doesn't handle?
No, it conforms to the platform ABIs. I even back up the xmm6-15 registers that Microsoft made non-volatile on their amd64 ABI (compare to how much lighter the SystemV amd64 ABI's switch routine is; the pre-assembled versions are in the doc/ folder.)
Their own fibers library even has a switch to choose whether to back those up or not, which I think is quite dangerous.
Still, nothing can top SPARC's register windows for being outright hostile to the idea of context switching.
...
Oh, and it also supports thread-local storage for safe use with multiple real threads.
> it probably not going to be 100x faster.
No, definitely not compared to his ASM versions. Even with a less efficient swap, once you get past the syscall, most of the overhead is simply in the cache impact of swapping out the stack pointer, so his will likely be very close. In fact, even I had to sacrifice a tiny bit of speed to wrap the fastcall calling convention and execute non-inline assembly code (to avoid dependencies on specific compilers/assemblers.)
I do still strongly favor jmpbuf over ucontext for the final fallback, but with x86, ppc, arm, mips and sparc on Windows, OS X, Linux and BSD, you've pretty much covered 99.999% of hardware you'd ever use this on. That and libtask lacks Windows and OS X support.
I came here to link to libyaml too but I'm glad I was beaten to it. I'm not much of a C programmer but I love reading and studying concurrency. Reading Russ Cox's code is one of my favorite things to do.
Quick question: isn't channel's raison d'etre the fact that they need not be backed by system threads, that they are considerably cheaper and one can fire tens of thousands of them without worrying too much about performance. In other words aren't they primarily intended as a tool for solving a concurrency problem rather than a parallelism problem.
Although, as far as I recall Go does map a bundle of them to a thread and can thus handle parallelism as well.
Just to clarify, this is not a complaint, I am trying to get a better idea of these abstractions.
@tylertreat
> I believe you're confusing goroutines with channels. Goroutines are lightweight threads of execution. Channels are the pipes that connect goroutines.
That is indeed correct. I was thinking more in the line of fibres that meld the concepts of a coroutine and a channel into a single abstraction. So the idea is Chan connects threads rather than coroutines.
Channels are useful because they allow for easy synchronous communication between threads of execution. Whether the threads of execution are dedicated POSIX threads or drawn from a thread pool or simulated by an event-driven state machine (as in ClojureScript) is a different consideration.
One of our students[0] at Monkey Project Organization[1] through Google Summer of Code[2], wrote a complete co-routine implementation for the Duda I/O[3] Web Services stack.
The feature is called "dthread" or "duda-thread" which aims to expose Channels and generic co-routines functionalities, you can check the following relevant links:
That's very interesting work. I did a similar thing for Kore for its background task implementation so that page handlers could talk to its tasks it fired off and read its responses accordingly.
nice!, its good to see many people working on that area. We got a lot of support from the author of LWan (http://lwan.ws) project during the development process.
Thanks for sharing, that's a really interesting read. Also would have been very helpful while writing my own implementation, although maybe it's better I didn't beforehand. It's more fun to work on a problem without any preconceptions.
That is not "pure C". "pthread.h" (POSIX threads) is not part of the C language standard, it is a system-specific header commonly found on UNIX systems. Windows has a different threading system for example.
To be fair, you'll probably have better luck finding people who can compile it if you use pthreads.h vs. threads.h. IIRC glibc doesn't have threads.h implemented, and I don't know of any C standard libraries that do. pthreads is fairly universal even if it's not standard C, and at the very least it's much more universal then threads.h is at the moment.
Correct, I was just going to chime in about this. Musl is very impressive with its compact size, correctness, Drepper-less-ness =P, and speed of implementing new standards.
Why chan's mutexes and pcond are malloc'ed and not included in chan's struct? They are also not freed when channel is disposed, unless I'm missing something. Mutexes are also not destroyed when buffered chan is disposed, but they are created for it.
(Edit)
Also, my biggest nitpick would be that chan_t structure should really not be in chan.h header since it's meant to be transparent to the app and the API operates exclusively with pointers to chan_t. Then, if it is made private (read - internal to the implementation), then you can divorce buffered and unbuffered channels into two separate structs, allocate required one and then multiplex between them in calls that treat buffered and unbuffered channels differently. I.e.
or better yet, just include a handful of function pointers into chan_t for each of the API methods that need more than the shared part of chan_t struct, initialize them on allocation and put type-specific code in these functions. It will make for a more compact, concise and localized code.(Edit 2)
Here's what I mean with function pointers - https://gist.github.com/anonymous/5f97d8db71776b188820 - basically poor man's inheritance and virtual methods :)