Hacker News new | past | comments | ask | show | jobs | submit login

Sampling lowers your confidence resolution, period. When you're testing hypotheses the biggest constraint can be that the effect you're looking for is too small to be within the resolution that your confidence intervals give you. Improving this resolution, even by a little bit, can be worth a lot.



Because you stated something absolutely, I feel the need to round off the edge. Sampling can increase your confidence resolution if it allows you to integrate signals from more data sources together using a larger model that is infeasible without sampling.


I think the difference is in the aims. With traditional statistics, you're trying to estimate some quantity of interest in the population, while with "big data", you're typically trying to make predictions for individual users. While this can be done with traditional statistics (in fact, the predict method in R does exactly that) it becomes easier to match participants on what books they might like if you have data for what books everyone in your population likes rather than just a sample.

Now, whether or not the inferential premises of statistics hold up on website data (and population data) that typically is neither random nor representative, that's another story.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: