Probably because the mathematicians don't seem to have the ability to explain wh...

dxbydt · on Sept 12, 2014

If this was html & written to the point instead of stuffed in a pdf & written up in this roundabout manner, it would be on top of HN with 100+ votes :) I read the whole pdf but here's the TL:DR; anyways -

Randomization improves algorithms. This is so obvious to CS folks its taught in basic cs101 - the ones i'm familiar with are where you pick a random pivot element for quicksort, or say the one where you draw a circle inside a square & pick n random points inside the square, four times the number of points inside the circle will equal pi as n becomes large, or you resort to the miller rabin test for primality when you are doing those rsa calculator type problems where you pick keys for encryption, which will randomly sample some number of possible witnesses and call the number prime if none turn out to be witnesses, or using monte-carlo methods to compute integrals for functions for which no closed-form formula exists etc....there's like tons of examples where you introduce a little bit of randomness & it'll speed up your algorithm.

So this Page+Hong wrote a paper where they want you to randomly hire employees ( who they call problem-solvers or agents ), because diversity trumps ability ?! So if you have 1000 applicants, instead of hiring based on some metric like ivy-league/test-scores/IQ/github/whatever, you just hire randomly, because the randomness introduces diversity which trumps ability ?!! To prove this nonsensical point, they introduce an artificial math problem where agents proceeding randomly obtain the right answer, & not doing so gets you stuck. Ergo, diversity > ability. Sheesh.

yummyfajitas · on Sept 12, 2014

This is in Notices of the AMS, which is about as mainstream (together with SIAM Review) as mathematical publications get.

Randomization improves algorithms.

This is a contentious claim. Random choices are actually very rarely the best - they are often good enough and versatile, but they are rarely the best.

For example, consider Monte Carlo integration. You get O(N^{-1/2}) convergence. If you use a deterministic set of points explicitly designed to have low discrepancy (aka "Quasi-Monte Carlo"), you can get O(N^{-1 + logarithmic stuff}) convergence.

http://www.chrisstucchio.com/blog/2014/adversarial_bandit_is...

Eliezer Yudkowsky also wrote a great critique of this issue, though I can't find it right now.

zwegner · on Sept 13, 2014

I believe the article you're referring to is: http://lesswrong.com/lw/vp/worse_than_random/

Choice quote: > As a general principle, on any problem for which you know that a particular unrandomized algorithm is unusually stupid - so that a randomized algorithm seems wiser - you should be able to use the same knowledge to produce a superior derandomized algorithm.

An interesting counterexample to this, an example which to me illustrates a very powerful aspect of randomness, is the monte-carlo revolution in computer Go (AI for the ancient Asian board game). For years, computer progress stagnated, as the techniques were mostly focused on encoding human knowledge into code. While most of this human knowledge is generally correct, it introduces significant bias in the way positions are evaluated. The way these rules-of-thumb interact is very hard to predict, and tree search algorithms are quite good at finding positions that are incorrectly evaluated. Because of this, the worst-case behavior of an evaluation function is much more important than it's best-case or average behavior.

Computers started making lots of progress when a new technique was used: rather than try to evaluate positions by a large set of heuristics, they were evaluated by playing random games. Go positions are quite hard to evaluate from simple rules of thumb, and this random game approach gives a much more balanced, long-term view of positions. And most importantly, there is much less bias.

Obviously, an entirely random game, with a uniform distribution over possible moves, is easy to improve upon. But computer go programmers noticed an interesting phenomenon: while certain types of knowledge incorporated into the random move distribution (to make it "more intelligent", as judged by a human) were helpful, others were not (even after taking into account the computational cost of adding the knowledge), and it wasn't always clear why. The same observation about heuristic evaluation noted above applied: having a balanced distribution of move choices, with a reasonable probabilistic lower bound of effectiveness, is more important than making an intelligent choice that is usually correct, but has unpredictable, extreme worst-case performance.

So we see that randomness does have an important property: it avoids the downside of "knowledge" that generally seems correct but can go horribly wrong in unexpected ways.

I don't know of a good writeup of this phenomenon. My understanding of it is mostly assembled from following informal discussions on the computer-go mailing list for several years. In a quick search through my gmail archives I can't find much on the subject, but here's an interesting post about related topics in the computer chess world (that incidentally doesn't talk about randomness, but illustrates well the benefits of avoiding bias): http://www.talkchess.com/forum/viewtopic.php?topic_view=thre...

yummyfajitas · on Sept 13, 2014

I don't know a lot about this Go example, but using a uniform distribution for the evaluation function sounds surprisingly similar to another phenomenon I observed.

Suppose you have a linear evaluation function - h(x) is the value of something. Suppose also the coefficients of h are all positive. Then you'll be right 75% of the time (averaged over all possible h, drawn uniformly from the unit simplex) if you just approximate h=[h1,h2,...] by u=[1,1,...,1].

http://www.chrisstucchio.com/blog/2014/equal_weights.html

So I agree with this claim - uniform distributions are fairly robust to errors. But I don't think that's particularly related to randomness - Monte Carlo is only needed to integrate the distribution.

It's also worth noting that adversarial situations (like Go or Chess) are considerably different than most other cases. In a true adversarial problem, there is no probability distribution - the opponent is omnipotent. The purpose of randomness is simply to reduce the power of the adversary's intelligence - in a completely random world, intelligence is useless.

zwegner · on Sept 13, 2014

That's not too different from a pretty old observation in the chess world: the presence/absence of an evaluation term is more important than the weighting given to it.

> So I agree with this claim - uniform distributions are fairly robust to errors. But I don't think that's particularly related to randomness - Monte Carlo is only needed to integrate the distribution.

Ah, that's an interesting distinction, thanks. I'll have to think about this some more. But given a situation where exact integration is intractable (like chess or Go), I'm not too sure what the difference really is, because it is those cases (on first thought) where the uniform distribution is useful--if you can see to the end, you don't need to care about bias, right? I mean, "randomness" in the strictest sense is not really necessary; all these programs I speak of used deterministic pseudorandom generators of course. It's really just about ensuring lack of bias given finite sampling. I'm happy to hear your take on it though--you definitely seem to have a lot more knowledge of math/statistics/etc. than I do.

(That does remind me of another fascinating tidbit from the Go world: programmers noticed that using a low-quality PRNG, like libc's LCG rand(), produced significantly weaker players than more evenly-distributed PRNGs, even though it would seem that playing lots of random games of indeterminate length (with the PRNG called at least once per move) would not correlate at all with the PRNG's distribution.)

The adversarial-or-not issue is also good food for thought. I'm not convinced that it explains much in this case, though, since I believe most of these observations were made by playing computer-computer games with each program using very similar algorithms, or with old hand-tuned programs against the newer Monte-Carlo based programs.

yummyfajitas · on Sept 13, 2014

But given a situation where exact integration is intractable (like chess or Go), I'm not too sure what the difference really is, because it is those cases (on first thought) where the uniform distribution is useful--if you can see to the end, you don't need to care about bias, right?

Put it this way - suppose I can cook up a deterministic quadrature rule, e.g. quasi monte carlo or an asymptotic expansion. I assert that the quasi monte carlo will work just as well as monte carlo, probably better if convergence is faster.

If I'm right, this is a situation of "yay for uniform distributions". If I'm wrong, it's a "yay randomness" situation. It's nice to know which situation you are in - if I'm wrong, there is no point cooking up better deterministic quadrature rules.

Incidentally, LCG is known to be useless for monte carlo due to significant autocorrelation. So it's quite possible that people using LCG are incorrectly estimating their evaluation term.

Also for me, it's nice to know these things just for theoretical purposes and to enhance my understanding.

wisty · on Sept 13, 2014

Yeah, but having some random noise still often helps.

4bpp · on Sept 13, 2014

> Randomization improves algorithms.

The better intuition is "randomisation fixes algorithms that get stuck for superficial reasons" - if you wrote that instead of the diversity thing as the intuitive conclusion of the Hong and Page theorem, you could get away with it in a theoretical CS survey paper or introductory textbook. The set of assumptions they make about the different \Phi would be fairly reasonable, for instance, if we were talking about a number of different flawed heuristics for a search problem.

sopooneo · on Sept 13, 2014

I have not read the circle-square theorem, but surely you are leaving something out. With a large n, many points will fall inside the circle, and four times that quantity can not logically get closer and closer to 3.14

dxbydt · on Sept 13, 2014

you have to divide by n.Here -

     def pointInsideCircle = {val (z,w) = (0.5,0.5);val (x,y) = (math.random,math.random); if ((x-z)*(x-z)+(w-y)*(w-y) < 0.25) 1 else 0;}
     (1 to 1000).map{x=> pointInsideCircle}.sum*4/1000.0
     scala> 3.132
     (1 to 1000000).map{x=>pointInsideCircle}.sum*4/1000000.0
     scala> 3.141612

So as n goes from 1000 to a million, your pi accuracy has improved from 3.13 to 3.1416. As Chris Stuccio pointed out elsewhere on this page, this thing converges O(N^{-1/2} ie. very slowly.

You can try Buffon's needle if you want something much faster.

_j5l3 · on Sept 13, 2014

It's 4*the proportion of points that fall in the circle.

jev · on Sept 15, 2014

I also read the paper. I didn't have any trouble understanding it. The point is that it can be explained 10x better in 10x less space, just like you did.

revelation · on Sept 12, 2014

Have you read the linked paper and the one its discussing?

I find the critique vastly more readable.

amathstudent · on Sept 12, 2014

I think that's a very important point.

Indeed, when you try to explain something simply to people, they often won't believe that it's really that simple, since there's all those symbols!