You are right that you are effectively stuck in a single coroutine (non-stack) frame. But you can chain multiple such coroutine frames, because one coroutine can co_await another.
Thanks for this great article! I've been circling my way around understanding coroutines and this really put it in place.
I think this co_awaiting is the most confusing part for folks: in most languages with stackful coroutines, it makes sense how one can
1. call "co_await" on a coroutine at a high-level API boundary
2. have that coroutine execute normal code all the way down to some expensive part (e.g. I/O)
3. at that particular point "deep" down in the expensive parts, that single point can just call "yield" and give control all the way back to the user thread that co_awaited at 1, usually with some special capability in the runtime.
I believe the way you can do this in C++20 concepts is to co_yield a promise all the way back to the originating "co_await" site, but I may be confused about this still...
It's totally clear to me why they didn't choose this for C++: keeping track of some heap-allocated stack frames could prove unwieldy.
I wish more effort went into explaining and promoting coroutines. Right now the advice seems to be "be brave and DIY, or just use cppcoro until we figure this out".
Also, if it’s not clear, within a coroutine, you can call any function (or coroutine) you want. It’s just that to co_yield, you have to be in the coroutine not deep in the stack.
Isn't it like that is most languages? I'm thinking about Python, C#, JS. If you call a blocking function from an async function, you cannot yield from deep inside the blocking function.
Why is this a big deal in C++? Am I missing anything that makes c++ coroutines less powerful than other mainstream solutions? Or are people comparing its power with e.g. Lisp or go?
It's a big deal because, while it has some downsides, being stalkless means they can have next to no overhead, meaning it can be performant to use coroutines to write asynchronous code for even very fast operations. The example given https://www.youtube.com/watch?v=j9tlJAqMV7U&t=13m30s is that you can launch multiple coroutines to issue prefetch instructions and process the fetched data, so you can have clean code that issues multiple prefetches and process the results. Whereas in Python (don't get me wrong, I love Python) you might use a generator to "asynchronize" slow operations like requesting and processing data from remote servers, C++ coroutines can be fast enough to asynchronously process "slow" operations like requesting data from main memory.
Wow, that talk is a fantastic link. He actually gets negative overhead from using coroutines, because the compiler has more freedom to optimize when humans don't prematurely break the logic into multiple functions.
All those languages have also stackless coroutines. Notably Lua and Go have stackful coroutines.
It is sort of a big deal because the discussion of wether adding stackful or stackless coroutines in C++ was an interminable and very visible one. Stackless won.