Benford's law

lukasm · on May 24, 2014

"Benford's Law can be used to show that binary is the best base for doing floating point math."

http://blogs.msdn.com/b/ericlippert/archive/2005/01/13/float...

eudox · on May 24, 2014

>In the United States, evidence based on Benford's Law has been admitted in criminal cases at the federal, state, and local levels.

This, to me, is the most interesting part of the article.

lunz · on May 24, 2014

It's said that german finance authorities use it for revealing tax fraud.

jacquesm · on May 25, 2014

It's a well-known tool in the arsenal of any forensic accountant.

VonGuard · on May 25, 2014

Most large financial institutions use it, as well.

jds375 · on May 24, 2014

It is very surprising that distributions such as Fibonacci and the powers of two follow this law. Some number sequences that don't are numbers like pi and e. These numbers are said to be normal numbers, meaning they have an equal distribution amongst all digits. However, this hasn't been rigorously proven and is still an open problem.[1]

http://en.m.wikipedia.org/wiki/Normal_number

pavpanchekha · on May 24, 2014

The distribution of the digits of pi and e is not the same as the distribution of first digits in a set of numbers; only the second is subject to Benford's Law. While both normal numbers and Benford's Law are interesting mathematical objects and deal with distributions of digits in a number, that is the complete extent to which they are related.

mikeash · on May 25, 2014

Just about any distribution where the distance between sequential numbers grows as the numbers grow will follow the law. Fibonacci and powers of two both fit that.

tbrock · on May 25, 2014

I worked at a hedge fund and we used this to figure out whether other funds were falsifying their returns or not. The most notable deviation was Bernie Madoff's.

mikeash · on May 25, 2014

Before or after he got caught?

bayesianhorse · on May 24, 2014

Benford's law is so well known today, that many a "forger" will evade it easily. One way is to create random numbers and find a solution that fits both your goal and Benford's law.

I think this is what the German ADAC did when they falsified test results around a general "idea" what they wanted to see.

ajtulloch · on May 25, 2014

Terry Tao has written an excellent (if mathematically advanced) post on Benford's law that is worth looking at for a more rigorous presentation.

http://terrytao.wordpress.com/2009/07/03/benfords-law-zipfs-...

ericchiang · on May 24, 2014

Probably the best explanation of the intuition behind Benford's law. Worth a watch if you've got the time:

https://www.youtube.com/watch?v=XXjlR2OK1kM

Terr_ · on May 25, 2014

To reuse an old post:

> [I]t has to do with relative growth/shrinkage and the base of the positional-numbering system you're using. If you have a random starting value (X) multiplied by a second random factor (Y), most of the time the result will start with a one.

> You're basically throwing darts at logarithmic graph paper! The area covered by squares which "start with 1" is larger than the area covered by square which "start with 9".

brycethornton · on May 24, 2014

Here's a site a friend and I built a while back to test some open datasets against Benford's Law:

http://www.testingbenfordslaw.com/

Most seem to match fairly closely. We accept pull requests with new datasets if anyone wants to contribute.

pera · on May 24, 2014

> Distance of stars from Earth in light years

that's weird

wging · on May 24, 2014

It's actually explainable, I think!

Assume stars are distributed roughly evenly in space. This is not true about the overall universe, they're clumped in galaxies, but fine for the dataset we're explaining, which maxes out at 3000 light-years from Earth.

Approximate the galaxy as a flat plane. This isn't a bad approximation since it's much flatter than it is wide.

Which stars are at a given distance D from earth? Those that lie on a circle of radius D centered at earth. But that count is just going to be proportional to the radius (circumference = 2pi times radius). So the count of stars does follow a power-law distribution, and that power is 1. There are KD stars at distance D, where K is a constant proportional to the overall density of stars in our neighborhood.

(I'm glossing over the fact that a circle is infinitely thin and stars are finite in number and so any given circle will probably have 0 stars on it.)

If you instead model the galaxy as a sphere, you get the same sort of result: count(stars) at distance D is proportional to the surface area of a sphere of radius D, 4pi*D^2. Still a power law. (The galaxy is about 1000 ly thick though, according to Wikipedia, and the data go out to 3000ly. So a sphere is probably not great here.)

So why do power law distributions tend to obey Benford's law? I'll punt on that: https://en.wikipedia.org/wiki/Benford%27s_law#Explanations

darsham · on May 24, 2014

Here's my explanation : assuming random distribution, the probability of finding a star at a given distance is proportional to the area of the sphere having that radius. So it follows a square law.

Correct me if I'm wrong, but I think any function with an increasing rate of change (ie. second derivative > 0) will yield a distribution with the same ordering of digits as Benford's if random numbers are taken from it.

jcr · on May 25, 2014

Here's a fairly recent link about using Benford's law to detect fraud.

http://www.theregister.co.uk/Print/2014/05/14/theorums_1_ben...