Not exactly, this is where Zed presumably shows that using pipes is equivalent t...

zedshaw · on Aug 4, 2010

If the test harness is wrong than every other test that everyone doing epoll comparisons with poll has done is wrong. This is the one they've all been using. So, either you write your own test to evaluate the same thing, without confounding it with other external factors (like this test does), or you use this test and prove me wrong by finding different data.

However, I'm glad you finally went and got some metrics to show one webserver instead of the useless rhetoric you were spewing before. Not that it proves a damn thing since it has no external validity, but at least it's a step in the right direction.

I also say that your findings of 40% are much higher than what you implied in your previous comments. You made it seem like it's close to the 10% range, but 40% is damn near ready to leap over into the equal performance levels. It's interesting that even after that you still can't see the value of either making epoll faster or using both, but whatever, you can't convince every idiot on the internet, even with Science.

jacquesm · on Aug 4, 2010

You should read more carefully. 40% was the highest observed, 10% the lowest. Also, I said 'the majority', which is not like making it seem like it is only 10%. So it's not '10%' or '40%', for this particular workload, on average you're hovering somewhere between 25 and 30%, since you seem to prefer the higher number make that 30%. And if you read a bit more you would see that there are some factors that could pull that number down considerably in different situations and only one major one (large file serving, streaming) that would pull it up.

If I get around to it (I'm on a holiday right now) I'm going to instrument some more sites to measure their active-to-total ratios to see if there is some kind of number (or a set of them) that we might be able to agree on when it comes to real life web traffic. One of them is a filedump, which serves up large media files to clients, that's the use case that I can come up with in which your scheme might pay off, and in which it is possibly advantageous (even if it is a very small advantage) to use poll.

You keep 'challenging' people to come up with numbers. So, I did, from a real life webserver serving real customers, lots of them. If 60% is the cutoff point (and we're assuming that you are right about that) then 40% is comfortably under that, not 'much higher' than what I wrote in my previous comments, the majority (60 to 90%) of the fds is idle. So it's not in to 'equal performance levels' yet.

But as far as I'm concerned even that is probably moot. (But that needs more testing) because I suspect that even this fairly high traffic server spends only very small amount of time on polling the OS, and a much larger time on shovelling bits in and out of the system.

You keep using that word 'science' as though measuring anything at all is automatically science.

But what you're doing is more like selecting the data that fits your 'who needs epoll' and 'epoll is bad' hypothesis and rejecting out of hand any experience and / or experimental data that seems to prove you wrong.

That's not science. Science is to use all the data available to you, not just the juicy bits. Not just those bits that allow you to make bold claims about discoveries when in fact all you've done is to show that the underlying implementation of one systemcall is different and has different trade-offs than another, which may be specific to the way your system is set up.

I agree that to test something you need to reduce it to the minimum of code required to do your tests. But when you're going that route you are doing what so many other database, CPU and application benchmarkers forget. The real world is nothing like a benchmark.

You to suggest that this is 'not a localhost test' (http://news.ycombinator.com/item?id=1573195) and you say so pretty loud. That would mean that there may be some validity to your data that I have not been able to find to date in the stuff you've provided.

I'm challenging you to provide the data that apparently is still missing that proves that this was not 'a localhost test'.

And if you don't supply that data then I think you should take back (and try to do it gracriously for a change) that this wasn't a local host test, or you're going to have to come with some definition of what you consider a localhost test to be.

For me, that definition is simple, a localhost test is a benchmark that never generates any network traffic.

Feel free to disagree.

gthank · on Aug 4, 2010

Having followed most of the comments on both these threads, I don't think it's fair to characterize Zed's position as saying "epoll is always bad". I think a better characterization would be: "epoll is not automagically better under every imaginable scenario".

You may argue that nobody is making any such claim, but I can tell you that the sections of the blogosphere I skim via HN, reddit, etc. do imply it. Maybe the authors just assume everyone can infer that every decision has tradeoffs and there's no need to discuss them in depth for this particular case, or maybe—and the nature of the blogosphere leaves me inclined to think that this is the case—most of those authors never consciously thought about those tradeoffs because epoll happens to be a good fit for their particular problem. Either way, Zed is the first person I've seen trying to explicitly identify and quantify those trade-offs for poll vs. epoll. As such, I think he's doing the community a service.

If this research leads to a better implementation of epoll, or a server that requires less tuning to do well under a particular load, that is great for everyone. Even if it doesn't lead to any great improvements, a better-educated developer population that makes more informed decisions is still a net win for the world.

pphaneuf · on Aug 5, 2010

Right on. Myself, I always figured epoll was a bit slower than poll when most/all of the fds are ready all the time, from reading the code.

Now, there's an experiment to figure out what the threshold is, and frankly, it's a bit lower than I had hoped (I would have guessed 0.9, 0.8 at worst), which is exactly why having the actual data is awesome.

zedshaw · on Aug 4, 2010

Thank you. I very much appreciate your comments.

gthank · on Aug 4, 2010

I appreciate the work you're doing. I'm not sure I've worked on a system that would really benefit from Mongrel2, but I think it's a pretty interesting concept, and there's always a chance I work on something more interesting in the future. Even if I don't, I almost always learn something or gain a different perspective from reading your stuff, and that's reason enough to be thankful.

isaacforce · on Aug 4, 2010

>I'm challenging you to provide the data that apparently is still missing that proves that this was not 'a localhost test'.<

I think what he was saying in relation to it being not 'a localhost test' is that it isn't a test that touches the network stack, even to the loopback interface -- you seem to be equating localhost with anything that happens on a single local machine regardless of whether it has anything to do with the network or not. A difference in use of terminology, perhaps.

zedshaw · on Aug 4, 2010

Yes, nobody seems to get confounding at all. I'm testing poll vs. epoll on selecting active file descriptors to compare their performance. I'm not testing anything else, not claiming anything else. To then test that with a full on server that has an Amazon EC2 cluster blasting requests with HTTP parsing and serving files would completely confuse the analysis.

You don't measure a specific thing by inventing some full on "real world" evaluation. You test that specific thing. Why is it this simple concept seems to baffle so many coders?

pphaneuf · on Aug 5, 2010

Your results seem correct to me, even though I'm a bit disappointed at the actual number (I was hoping for 0.8 or 0.9, not 0.6), and the code looks good. I'm meaning to run it for myself, but didn't have the time to do so, I should get on it shortly.

I still of the opinion that a >0.6 ATR means you're fucked anyway (where processing all those fds can't be done as fast as data arrives), but I'm trying to come up with a good small test for that hypothesis (like you did).

Until I do, it's just an opinion/hypothesis, and it's kind of hard to argue about it strongly either way.

zedshaw · on Aug 4, 2010

You know jack squat about statistics. That's basically how I summarize your statement. You don't have a mean, median, mode, or standard deviation in your "10% 40%". You don't understand confounding and want to test the performance of poll vs. epoll on file descriptors by testing an entire server. You think science is to use available data, when it's nothing of the sort, and in fact the way you remove confounding is to remove data.

Basically, you're a FUD slinger who's got a beef and doesn't know enough to hold his own in a real discussion about measurement and analysis.

scott_s · on Aug 4, 2010

So where does his reasoning go wrong?