IOCP was indeed ahead of its time and should have been copied by Linux a long ti...

benlwalker · on April 13, 2023

IOCP certainly was ahead of its time, but it only does the completion batching, not the submission batching. io_uring is significantly better than anything available on Windows right now.

mappu · on April 13, 2023

> io_uring is significantly better than anything available on Windows right now.

Windows 11 copied io_uring almost 1:1 - https://windows-internals.com/ioring-vs-io_uring-a-compariso...

Matthias247 · on April 13, 2023

It comes with its own set of challenges. In the integration I've seen, it basically meant that all the latency in the system went into io_uring_enter() call which blocked then for far longer than any individual than any other IO operation we've ever seen. Your application might prefer if it pauses 50 times for 20us (+ syscall overhead) in an eventloop iteration instead of a single time for 1ms (+ less syscall overhead), because that means some IO will just sit around for 1ms and will be totally unhandled.

The only way to avoid big latencies on uring_enter is to use the submission queue polling mechanism using a background kernel thread, which also has its own set ofs pro's and con's.

anonymoushn · on April 13, 2023

This sounds abnormal, are you using io_uring_enter in a way that asks it not to return without any cqes?

I don't have much of a feel for this because I am on the "never calling io_uring_enter" plan but I expect I would have found it alarming if it took 1ms while I was using it

benlwalker · on April 13, 2023

For many syscalls, the primary overhead is the transition itself, not the work the kernel does. So doing 50 operations one by one may take, say, 10x as much time as a single call to io_uring_enter for the same work. It really shouldn't be just moving latency around unless you are doing very large data copies (or similar) out of the kernel such that syscall overhead becomes mostly irrelevant. If syscall overhead is irrelevant in your app and you aren't doing an actual asynchronous kernel operation, then you may as well use the regular syscall interface.

There are certainly applications that don't benefit from io_uring, but I suspect these are not the norm.

Matthias247 · on April 13, 2023

You need to measure it for your application. A lot of people think „syscalls are expensive“ because that’s repeated for your years, but often it’s actually their implementation and not the overhead.

Eg a UDP syscall will do a whole lot of route lookups, iptable rule evaluations, potential eBPF program evaluations, copying data into other packets, splitting packets, etc. I measured this to be fare more than > 10x of the syscall overhead. But your mileage might vary depending on which calls you use.

As for the applications: these lessons where collected in a CDN data plane. There’s hardly any applications out there which are more async IO intense.

GoblinSlayer · on April 13, 2023

They are the norm if you're google, most others are low load.

muststopmyths · on April 13, 2023

Winsock RIO available since 2012 would beg to differ. It only applies to network I/O but I suspect IOCP is more than sufficient for files

dontlaugh · on April 13, 2023

Sure, io_uring is better. IOCP still captured the biggest benefit, completion triggering.

benlwalker · on April 13, 2023

Without the submission batching you lose your system call reduction and that is far and away the biggest benefit.

dontlaugh · on April 13, 2023

The way I see it, a bigger benefit is avoiding readiness-triggered IO. That can add a variable number of syscalls, wakes up threads, etc.

Of course, batching is also great.