Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

say you load 1 million records and you average them at 5.1. you then load another million and average them at 4.5. so you 5.1+4.5=9.6/2=4.8. rinse and repeat. as long as you keep the amount of records processed per each run about the same, your numbers will not be skewed. only the last chunk will most likely be smaller and it will introduce small rounding error, like if it has only 10k records instead of 1M. but still it is the simplest solution with good enough outcome.



essentially that is how integrals are calculated in mathematics if i remember correctly. you take a curve and divide it into columns, the thinner the column the smaller the deviation(because the curve has round edges so your bar will have inherent error) and you simply calculate each column and then total it and you get the body/volume of the function. same principle like radians with circle. you are merely splitting the work into smaller pieces that you can process.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: