Context switching cost is mostly artificial. You save kernel/user transitions, but you can also have user-mode threads without transitions. What else are you saving by using coroutines instead of threads? You're not saving register swapping or cache warmup. You are saving stack allocations but those don't cost time.