Hacker News new | past | comments | ask | show | jobs | submit login

The data probably wouldn't look so clean if it were skewed. If Google were doing something interesting it probably wouldn't be skewed only by a little bit.



Admittedly, I did not read the paper linked. But my point is not about google doing something funny. Even if we assume that ids are truly random and uniformly distributed this does not mean that the sampling method doesn't have to be iid. This problem is similar to density estimation where Rejection sampling is super inefficient but converges to the correct solution, but MCMC type approaches might need to run multiple times to be sure to have found the solution.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: