LShift is terrified by asynchronous libraries performance

tptacek · on Dec 15, 2008

Couple things.

First, there isn't enough context in the LShift article to judge whether Marek has a valid point. He's conceded up front that both Libev and libevent show logN growth for adding new fd's to the loop, and seems to be arguing that it could be brought down to constant time. So? He hasn't presented evidence that the overhead he's talking about is significant in the real world. This is compute vs. I/O in an I/O-bound environment.

Next, he also seems to be taking his cues from the second set of benchmarks on the Libev page, in which every socket has a random timeout. This is a worst-case scenario for both Libev and libevent. Contrary to both the Libev page and Marek's analysis, setting timeouts on individual I/O events is not the only way to implement "idle timeouts" (which many protocols don't even have); it is in fact one of the poorer approaches.

Finally, he's taken two very specific benchmarks and generalized them to the whole design approach. Libevent is considered "high-performance" because async I/O is a high-performance design --- especially compared to demand-threading --- and libevent was the first stable, cross-platform implementation of it. It's also "high-performance" because it implements the high-performance kernel interfaces, like epoll. But the internals of libevent, while reasonable, are not (at least last time I checked) particularly optimized.

You can get around most of these problems by building on top of libevent; for instance, at Arbor Networks, we ditched libevent's (verbose, clumsy) timer interface for a better data structure, and simply kept libevent notified of the next closest timeout.

"Terrified" seems like a very silly word to use here, but I'm always eager to get schooled.

damienkatz · on Dec 15, 2008

This article seems a bit over the top. I implemented the MySQL interface to libevent for mysql 6.0, and we got really good performance numbers vs. thread per connection: http://krow.livejournal.com/572937.html

I've spent a fair amount of time inside the libevent library source, and I'm pretty sure adding new connections is constant time as far as the library goes. There is however the underlying cost of the event struct memory allocation and the cost of the low level apis, but otherwise, IIRC, they keep it down to a single memalloc and link it into the head of internal linked lists for fixed cost insertion.

The observed log(N) insertion cost comes likely from the low level poll apis, and there is pretty much nothing libevent can do to make that faster. However you can choose the low level api used on some platforms via build options, so theoretically you can use a poll api better suited to your workload.

tptacek · on Dec 15, 2008

My impression was that Marek assumed every read event would always have a specified timeout, and his insertion time referred to inserting into the timer queue.

damienkatz · on Dec 15, 2008

I see, but if it's a timer problem then with libevent you can roll your own timer queue optimized for your workload. The libevent core is small and well designed (but not well documented), integrating custom timer mechanisms is actually pretty straightforward.

However the claim is made that connections are added and removed more often than polling is conducted (definitely the opposite of the average mysql workload), so I'd guess most low level high performance poll apis aren't optimized for that sort of workload.

tptacek · on Dec 15, 2008

The only place where I'd intuit that the event queue is modified as often as raw I/O is performed is for async writes (you're potentially adding/removing them every other timeslice).

Obviously I agree with you on the timer queue thing; I got a performance win simply by shoplifting the timer queue implementation out of MIT Athena; it took maybe a couple hours to integrate and test, and I didn't need to touch the libevent internals to do it.

signa11 · on Dec 16, 2008

> Contrary to both the Libev page and Marek's analysis, setting timeouts on individual I/O events is not the only way to implement "idle timeouts"

just a quick question: what would you consider, is a superior approach to idle-timeouts ? thanks for you insights !

tptacek · on Dec 16, 2008

Use another data structure to track connections by liveness, which you can measure by timestamping on each RX event. Relax the requirement that an connection idle timeout of 60 seconds must fire in exactly 60,000 milliseconds; 60, 61, 65, nobody cares.

huhtenberg · on Dec 16, 2008

http://lwn.net/Articles/156329

huhtenberg · on Dec 16, 2008

The problem is that currently asynchronous libraries often use binary heap as a representation of internal priority queue.

This is a grossly ignorant statement. I have written my share of event loops (including epoll and IOCP based) and choosing heap for timer management is an obviously dumb thing to do if you care about the performance. Also removing objects from the event loop before delivering a callback is even more dumb for exact reasons he listed in the post. Essentially what he's bitching about is a lame-ass un-optimized implementation of an event engine.

Also, interestingly enough he actually reinvented a simplified form of Linux timer management code - http://lwn.net/Articles/156329.