I think you'd be hard pressed to find a workload where that behavior needs to be generalized to the degree you're talking.
If you're serving HTTP requests, for instance, simply serving each request on its own thread with its own event loop should be sufficient at scale. Multiple requests each with CPU-bound tasks will still saturate the CPUs.
Very little code teeters between CPU-bound and io-bound while also serving few enough requests that you have cores to spare to effectively parallelize all the CPU-bound work. If that's the case, why do you need the runtime to do this for you? A simple profile would show what's holding up the event loop.
But still, the runtime can't naively parallelize coroutines. Coroutines are expected not to be run in parallel and that code isn't expected to be thread safe. Instead of a gather on futures, your code would have been using a thread pool executor in the first place if you'd gone out of your way to ensure your CPU-bound code was thread safe: the benefits of async/await are mostly lost.
I also don't think an event loop can be shared between two running threads: if you were to parallelize coroutines, those coroutines' spawned coroutines could run in parallel. If you used an async library that isn't thread safe because it expects only one coroutine is executing at a time, you could run into serious bugs.
Interesting. I don't disagree, in general, but I actually have worked with a lot of applications that like to do this. Specifically in the world of ML/AI inference there's a lot of moving between external querying of data (features) and internal/external querying of models. With recommendation systems it is often worse -- gather large data, run a computation on it, filter it, get a bulk API request, score it with a model, etc...
This is exactly where I'd like to see it.
I'd like to simultaneously:
1. Call out to external APIs and not run any overhead/complexity of creating/managing threads
2. Call out to a model on a CPU and not have it block the event loop (I want it to launch a new thread and have that be similar to me)
3. Call out to a model on a GPU, ditto
And use the observed resource CPU/GPU usage to scale up nicely with an external horizontal scaling system.
So it might be that the async API is a lot easier to use/ergonomic then threads. I'd be happy to handle thread-safety (say, annotating routines), but as you pointed out, there are underlying framework assumptions that make this complicated.
The solution we always used is to separate out the CPU-bound components from the IO-bound components, even onto different servers or sidecar processes (which, effectively, turn CPU-bound into IO-bound operations). But if they could co-exist happily, I'd be very excited. Especially if they could use a similar API as async does.
If you're serving HTTP requests, for instance, simply serving each request on its own thread with its own event loop should be sufficient at scale. Multiple requests each with CPU-bound tasks will still saturate the CPUs.
Very little code teeters between CPU-bound and io-bound while also serving few enough requests that you have cores to spare to effectively parallelize all the CPU-bound work. If that's the case, why do you need the runtime to do this for you? A simple profile would show what's holding up the event loop.
But still, the runtime can't naively parallelize coroutines. Coroutines are expected not to be run in parallel and that code isn't expected to be thread safe. Instead of a gather on futures, your code would have been using a thread pool executor in the first place if you'd gone out of your way to ensure your CPU-bound code was thread safe: the benefits of async/await are mostly lost.
I also don't think an event loop can be shared between two running threads: if you were to parallelize coroutines, those coroutines' spawned coroutines could run in parallel. If you used an async library that isn't thread safe because it expects only one coroutine is executing at a time, you could run into serious bugs.