I agree there are additional complexities with M:N. I was assuming a lock-free queue, since you'll need this for scalable work-stealing.
I was also assuming a single kernel thread performs I/O via epoll/kqueue/etc. and either has its own queue from which other threads steal, or simply pushes results onto a random queue when requested I/O are complete. This accomplishes the migration I believe you were describing.
When I/O needs to be done, you enqueue the uthread on the I/O queue and invoke a reserved file descriptor to notify the I/O thread, which then reshuffles its file descriptors and again calls epoll/kqueue.
I'm not sure whether this would scale as well as what you're doing since you mentioned an epoll-per-kernel thread, but I wouldn't be surprised if it got close since it's so I/O-bound.
> I was also assuming a single kernel thread performs I/O via epoll/kqueue/etc. and either has its own queue from which other threads steal, or simply pushes results onto a random queue when requested I/O are complete. This accomplishes the migration I believe you were describing.
> When I/O needs to be done, you enqueue the uthread on the I/O queue and invoke a reserved file descriptor to notify the I/O thread, which then reshuffles its file descriptors and again calls epoll/kqueue.
If I remember correctly, what you described is similar to how golang perform I/O polling since they are using work stealing, except there is no dedicated poller thread, and threads take turn to do the polling whenever they become idle.
Also, with epoll/kqueue there is no need to enqueue uThreads and you can simply store a pointer(a struct that has reader/writer pointers to uThreads) with the epoll event that you create and recover it later from the notification, this way you can set the pointer when need to do I/O.
I agree a single poller thread does not scale as well as per kthread epoll instance, and that requires some meditation to provide a low overhead solution that does not sacrifice performance over scalability.
I was also assuming a single kernel thread performs I/O via epoll/kqueue/etc. and either has its own queue from which other threads steal, or simply pushes results onto a random queue when requested I/O are complete. This accomplishes the migration I believe you were describing.
When I/O needs to be done, you enqueue the uthread on the I/O queue and invoke a reserved file descriptor to notify the I/O thread, which then reshuffles its file descriptors and again calls epoll/kqueue.
I'm not sure whether this would scale as well as what you're doing since you mentioned an epoll-per-kernel thread, but I wouldn't be surprised if it got close since it's so I/O-bound.