Hacker News new | past | comments | ask | show | jobs | submit login

> Report the maximum value.

In benchmarks, assuming you run the same workload each time, you often want the minimum value. Anything else just tells you how much system overhead you encountered.

(Complete agreement that applying statistics without knowing anything about the distribution can mislead.)




[This article](https://tratt.net/laurie/blog/entries/minimum_times_tend_to_...) explaib why using the minimum time may not be a so great idea.


Just looking at the chart, your article makes a case that minimum is much better than maximum, and in fact if you report one number, the minimum is the best number to report (in that particular example).

If we go into details - sure, 1 number is not ideal, but neither is the confidence interval, because people will assume normal distribution for them. If you report 2 numbers, perhaps instead of confidence interval (which falsely implies normal distribution) it's better to report the mode and the median.


I don't fully agree. Imagine there are two programs: Acme and Blob. Acme theoretically runs faster, but it also degrades much worse under interference from other system loads. Blob doesn't run as fast at the theoretical maximum, but it handles system load very well and doesn't degrade too bad.

There's an argument to Blob being the better choice, if this will run in production on a system that might encounter unrelated loads. Predictable performance is frequently more useful than theoretical maximum performance.

That said, I still agree a little bit. The minimum value is also a useful metric, and if you have the opportunity to report two numbers, the minimum-maximum pair is a great choice.


I disagree. I think you want to report as much as you reasonably can about the distribution, and then you have to let the reader make up their mind which they think is most important.

You can have your own idea about which is most important and build your comparisons on that choice, but you should give the reader enough information that the work you put in is still valuable to them in case they should make a different choice.

If we both have equal information about the distribution, then we can make our own choices about the data being presented.


To be pedantic, you want the mode of the distribution. You are right though, the minimum is a much closer approximation of the mode in most situations.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: