I think you'd be hard pressed to find a workload where that behavior needs to be...

elijahbenizzy · 2024-07-14T19:55:58 1720986958

Interesting. I don't disagree, in general, but I actually have worked with a lot of applications that like to do this. Specifically in the world of ML/AI inference there's a lot of moving between external querying of data (features) and internal/external querying of models. With recommendation systems it is often worse -- gather large data, run a computation on it, filter it, get a bulk API request, score it with a model, etc...

This is exactly where I'd like to see it.

I'd like to simultaneously:

1. Call out to external APIs and not run any overhead/complexity of creating/managing threads 2. Call out to a model on a CPU and not have it block the event loop (I want it to launch a new thread and have that be similar to me) 3. Call out to a model on a GPU, ditto

And use the observed resource CPU/GPU usage to scale up nicely with an external horizontal scaling system.

So it might be that the async API is a lot easier to use/ergonomic then threads. I'd be happy to handle thread-safety (say, annotating routines), but as you pointed out, there are underlying framework assumptions that make this complicated.

The solution we always used is to separate out the CPU-bound components from the IO-bound components, even onto different servers or sidecar processes (which, effectively, turn CPU-bound into IO-bound operations). But if they could co-exist happily, I'd be very excited. Especially if they could use a similar API as async does.