Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is exactly the algorithm we developed at LogNormal (now part of Akamai) 10 years ago for doing fast, low-memory percentiles on large datasets.

It's implemented in this Node library: https://github.com/bluesmoon/node-faststats

Side note: I wish everyone would stop using the term Average to refer to the Arithmetic mean. "Average" just means some statistic used to summarize a dataset. It could be the Arithmetic Mean, Median, Mode(s), Geometric Mean, Harmonic Mean, or any of a bunch of other statistics. We're stuck with AVG because that's the function used by early databases and Lotus 123.




No we’re stuck with it because average was used colloquially for arithmetic mean for decades.

I wish people would stop bad-mouthing the arithmetic mean. If you have to convey information about a distribution and you’ve got only one number to do it, the arithmetic mean is for you.


I think it still depends on the nature of the data and your questions about it, and median is often the better choice if you have to pick only one.


“It depends” is always right but which function is better for arbitrary, unknown data?

At least the arithmetic mean is fine for Gaussian distributions, and coneys a sense about the data even on non-Gaussian ones. but the median doesn’t even work at all on some common distributions like scores of very difficult exams (where median=0)

For the mean, at least every data point contributes to the final value.

Just my 2c


Yes, Lotus 123 came out 38 years ago :)




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: