Hacker News new | past | comments | ask | show | jobs | submit login

Here's a site a friend and I built a while back to test some open datasets against Benford's Law:

http://www.testingbenfordslaw.com/

Most seem to match fairly closely. We accept pull requests with new datasets if anyone wants to contribute.




> Distance of stars from Earth in light years

that's weird


It's actually explainable, I think!

Assume stars are distributed roughly evenly in space. This is not true about the overall universe, they're clumped in galaxies, but fine for the dataset we're explaining, which maxes out at 3000 light-years from Earth.

Approximate the galaxy as a flat plane. This isn't a bad approximation since it's much flatter than it is wide.

Which stars are at a given distance D from earth? Those that lie on a circle of radius D centered at earth. But that count is just going to be proportional to the radius (circumference = 2pi times radius). So the count of stars does follow a power-law distribution, and that power is 1. There are KD stars at distance D, where K is a constant proportional to the overall density of stars in our neighborhood.

(I'm glossing over the fact that a circle is infinitely thin and stars are finite in number and so any given circle will probably have 0 stars on it.)

If you instead model the galaxy as a sphere, you get the same sort of result: count(stars) at distance D is proportional to the surface area of a sphere of radius D, 4pi*D^2. Still a power law. (The galaxy is about 1000 ly thick though, according to Wikipedia, and the data go out to 3000ly. So a sphere is probably not great here.)

So why do power law distributions tend to obey Benford's law? I'll punt on that: https://en.wikipedia.org/wiki/Benford%27s_law#Explanations


Here's my explanation : assuming random distribution, the probability of finding a star at a given distance is proportional to the area of the sphere having that radius. So it follows a square law.

Correct me if I'm wrong, but I think any function with an increasing rate of change (ie. second derivative > 0) will yield a distribution with the same ordering of digits as Benford's if random numbers are taken from it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: