Could anyone point me on how to solve this(either a solution or preferrably a pointer on how to get to it):
"Q: Given a function which produces a random integer in the range 1 to 5, write a function which produces a random integer in the range 1 to 7."
First I thought it was simple but the I got stuck, maybe I'm just tired.
No matter how I twsit and turn it I seem to get only an even distribution over 5 numbers.
Execute f 7 times, add all numbers, call that x. Then you do mod(x,7)+1, and you get a random number between 1 and 7. If the original function was unbiased, this one is going to be unbiased too.
This won't work. f gives 1 to 5. 7 * f gives 7 to 35. But 7 to 35 will not be evenly generated. Think about it: There are more ways to get a 20 than there are to get a 7 or a 35. Same thing with rolling 2 die.
That was my second thought too (my first thought was to upvote the comment), but at least in the 2-die case, the modulus takes care of it. If you work out the probabilities, the chance of getting 0mod2 = 1/36 (2) + 3/36 (4) + 5/36 (6) + 5/36 (8) + 3/36 (10) + 1/36 (12) = 18/36 = 1/2. Same goes for getting 1mod2. So you end up with a fair result even though the chances for each individual outcome are biased.
I didn't want to go through all 5^7 possibilities for the 7-die case, but I figured it's likely enough that he's right that I'd keep my mouth shut.
experimental evidence says that it is in fact a uniform distribution:
r = random.Random()
def one_to_five():
return r.randint(1, 5)
def mod_seven():
return (sum(one_to_five() for x in xrange(7)) % 7) + 1
def test_dist(lst):
return [(x, lst.count(x)) for x in [1,2,3,4,5,6,7]]
i've tried it out experimentally in google docs and it looks really good for big numbers. No real proof tho and it might be that the errors counter each other by chance:
Ok, a less elegant one then, but one that works for a reasonably simple reason. f gives 1 to 5, if it gives 5, try again, and so on, until you have a number from 1 to 4. Then, do that mod 2. You have a random binary digit, that's unbiased.
Now use that process to get 3 binary digits. You get a random number from 0 to 7. If the random number is 0, start again... eventually you'll get a number from 1 to 7, and all numbers have the same chances.
f * f gives you 25 boxes, number 11, 12, ... 15, 21,...25...55. Assign 3 each of 21 of those to 1..7. If f * f doesn't fall into one of those 21 boxes, repeat until it does.
But, if you want to uniformly map a random number from set X to set Y where (IIRC) lcm(|X|, |Y|) != |X|, it seems you need an infinite worst-case running time.
Here's an informal proof that you can't have a finite upper bound to the number of iterations. After n iterations, you have |X|^n possible outcomes. But, since lcm(|X|, |Y|) != |X|, |X|^n cannot be divided evenly by |Y| (since its factors are the same). So some outcomes in Y must be more likely than others.
(This is not nearly complete, but hopefully it's enough to show how it might be right.)
Of course, in practice it is highly unlikely that you will get past more than a couple iterations before determining an outcome.
Sure. Sorry, I didn't take the time to work this out on paper before posting or I would have realized that the condition itself is wrong. The condition lcm(|X|,|Y|) != |X| instead should be that |Y| has some prime factor that |X| does not.
Here is an explanation with the new condition:
Let p be any prime factor of |Y| that |X| does not have. It follows from Euclid's First Theorem[1] that p cannot divide |X|^n for any n [2]. Since every integer (> 1) has a unique prime factorization, it follows that |X|^n can't divide |Y|, because the prime p divides |Y| but not |X|^n.
[1] http://mathworld.wolfram.com/EuclidsTheorems.html
[2] We are given that p does not divide |X|^1. Suppose that p also does not divide |X|^(n-1) for some n > 1. |X|^n = |X|^(n-1) * |X|^1, so by Euclid's First Theorem, if p divides |X|^n it must divide either |X|^(n-1) or |X|^1. We know it divides neither, so p does not divide |X|^n. By induction, this is true for all n > 0.
Observe the following lookup table, which is a 5x5 matrix.
12345
67123
45671
23456
7RRRR
R means "reroll".
Constant time execution in average case. Mathematically provable that it is as unbiased as your rand5() function. Technically not guaranteed to terminate but if you dock me points for that you're technically not guaranteed to survive to the end of the interview, are you.
My team uses this as a screening question, so I won't give you the answer - but you forgot to mention that both the rand5 and rand7 should have a uniform distribution. :)
I'll give you a hint, though - the solution is not elegant at all. Which is part of the point.
1) Given a number 1 to 5, get a new one if it's 5, else throw away the MSB, you get 2 bits of randomness per number. Subtract one and call it a.
(Now you have two bits of randomness evenly distributed among the set {00, 01, 10, 11})
2) Get another number 1 to 5, toss it if it's 5, take the LSB. Call it b.
(Now you have another bit of randomness, evenly distributed among {0,1})
3) if b == 1 and a == 11, start over. else return (4b)+a+1
Python implementation (with the -1 then +1 factored out):
def one_to_seven(debug=False):
a = one_to_five()
while a == 5: a = one_to_five()
b = one_to_five()
while b == 5: b = one_to_five()
b = (b % 2) * 4
if a+b == 8: return one_to_seven()
return a+b
Generate two random numbers with your rand5 function, let them be the two digits of your base 5 number. Each number from 0-24 has a 1/25 chance of being picked.
If the number is greater than 6 throw it away and repeat, otherwise you have your random number.
Works, but not as efficient as possible. Look at patio11's solution; you only need to throw it away and repeat if it's >= 21. Otherwise you can just take n mod 7.
The elegant solution escapes me for now, but you could treat the odds and evens from the 1-5 function as a biased bit stream, apply a von Neumann corrector and then use the resulting stream.
First I thought it was simple but the I got stuck, maybe I'm just tired. No matter how I twsit and turn it I seem to get only an even distribution over 5 numbers.