> so that if another person is shown your sequence of digits from 1 to 6, he/she should not be able to tell whether these numbers were produced by a real die or just “made up” by somebody.
That instruction is a flaw in the experiment. It's always impossible to tell, for any given sequence, whether it was produced by a fair die. There's nothing an experimental subject can do to make the impossible more impossible.
> the kind of sequence you’d get if you really rolled a die
Well, there is no such sequence. The instruction is incoherent.
You could take it as meaning "Construct a sequence that you think will convince others that it came from a random source". That would be coherent. And then it would be legitimate to eliminate responses that were all-heads ("clearly didn't even try"). But then what are you measuring? The comparative understanding of older and youger people concerning random sources, or the Gambler's Paradox? Their comparative expertise in human psychology? Their comparative willingness to move the mouse-pointer over the screen?
When you throw a coin 100 times, each sequence you get is equally likely. However. You can look at properties of the sequence which are more likely to be one way than the other. For instance, it's more likely that the number of heads and tails are about equal than not. The reason is that there are more sequences, in general, where that is true, than those where heads or tails strongly prevail.
With the right property, you can make statements such as: This sequence is statistically likelier to be human made than random.
One such property is for instance the number of changes from heads to tails, or vice versa. In expectation, random sequences change heads to tails about 50% of all flips. For humans, the expectation is much higher. Hence, if you compare two sequences where one has 51% changes and one 63%, it is mathematically (/statistically) accurate to say that the latter one is likelier human made.
Your point doesnt refute OPs argument. Your final statement "you can say the latter is more likely random" is not the same as "you can say this sequence is not random". I think lots of people (especially programmers) who know about true RNG vs expectations of RNG might intentionally put in strings of same numbers, or not include the full set, because we know its what often happens during plain RNG. It isnt clear what the goal of the sequence is, hence the confusion in the comments.
Exactly, and their "good" dice roll sequence, 3 1 5 6 2 6 3 4 4 1 contained the full set which should only happen ~1/4 of the time for 10 rolls. It also contained no number more than twice, which should happen < 7% of the time. This looks to me like they purposely tried to make this sequence look like their idea of "random".
I'm curious about how they scored this section because my overall age was reported to be 60+ with the sequence 2 1 5 2 6 2 2 4 6 6.
I also scored 60+ (actual age is in my 30s). I had similar thoughts and also did things like not use up all the numbers and repeat numbers more than twice exactly because I've looked at a lot of random number sequences in my life and I was trying to make it look like one of those.
> I've looked at a lot of random number sequences in my life and I was trying to make it look like one of those.
This is perhaps the difference between pseudo and statistically random. No idea which of those the study or the experiment is trying to validate btw.
And IIRC, interestingly they write that human capacity to create random numbers declines 25+. I can imagine that the older we are the more we look for something to make our decisions look more random based on what we've learned so far - more time, there was more time to look at more random number sequences - and the less random the outcome will be.
> And IIRC, interestingly they write that human capacity to create random numbers declines 25+. I can imagine that the older we are the more we look for something to make our decisions look more random based on what we've learned so far - more time, there was more time to look at more random number sequences - and the less random the outcome will be.
This is what they are testing, and at least based on the data they've got so far, it looks like it increases up to 25-ish and then stays pretty flat.
Another possibly interesting observation is that their preliminary data set (just eyeballing it, but) looks to have gotten
1) a flatter response
2) generally, less random responses
Which leads me to wonder if the live stats have been skewed more random as there might be some correlation between "interested in this sort of thing" and "has some idea what a random distribution ought to look like," and possibly this knowledge doesn't go away with age.
(human capacity to create random numbers declines 25+)
Not really; what they're testing is what kinds of response differently-aged people give to their question. So it's important what the question actually is; and it's important if the question might, for example, seem to older people to be a waste of their time.
They're not measuring what they claim to be measuring.
My opinion is that "real" randomness looks less random than artificial randomness.
That's why Apple changed iTunes random music shuffle to be less random because people complained that it wasn't random enough and replayed songs too close together.
> This is perhaps the difference between pseudo and statistically random.
Not quite. Pseudo-randomness is defined as being indistinguishable from a uniform distribution, meaning if the next in a sequence is no more predictable than a statistically random selection.
I always pressed the same button. Let's say 10x 1. You get a rating below 60 then. Just in case you need the sequence to come towards your real age group when you redo the experiment.
Perhaps to guess random is also in the property of the age of someone clicking on a website? Perhaps someone should create an experiment that finds their experiment flawed ^^
What does that mean, "you can tell this sequence is not random"? If you show me a blue hat, and ask me if it's blue, I'll say yes, I can say it's blue. But there is always a chance it's not actually blue. It's very conceivable that I'm in a situation where I say with confidence that something is blue, but it isn't.
You always only ever speak in probability. Of course you can't say the sequence isn't random, because every sequence can be the result of a random process. "can you tell which is random" to me is equivalent as asking "is one of the sequences such that it is rational to choose it over the other as being random".
It's about rational decisions. Consider the frequentist view, where a probability p for an event A means that out of k trials, pk will show A (in expectation), and fruthermore if k -> infty, then the portion of events that show A will converge to p. If you want to choose the right sequence as being random as often as possible, it is rational to choose the one I described above, because it will, overall, be the one that is MORE OFTEN the random one compared to the other.
For a blue hat, it either is or isn't blue (and there's some rather strong evidence - whether or not it looks blue). Like, with a sequence of 6 digits, if you don't know whether the source was random or not, then that's like NOT showing me your hat, and asking me whether it's red or blue.
For a single sequence of six digits, it might or might not have come from a random source. You can't get any edge on that judgement by just inspecting the sequence. Only inspecting the source (the hat, if you like) can give you an advantage. Perhaps you're colour-blind, or the lighting is weird; so there's still uncertainty. But that's equivalent to examining the source of the digits, determining that it's really a random source, but making a mistake in your determination. That's uncertainty on a different level.
Red-pill blue-pill is a sort of meta-uncertainty.
> "is one of the sequences such that it is rational to choose it over the other as being random"
Most people don't care about this shit; it doesn't matter to them what random means, nor whether it's sequences or sources that can be said to be random. But for some people it does matter, and they have to try to use language precisely.
All [red|blue] hats are either red or blue. But no sequence is random or non-random; it's the source of the sequence (the process, if you like) than can be random or non-random.
If 111111 is emitted by a random process, then you can call that a "random sequence" if you like. If I emit 126692 from my ass (not a random process), that's not a "random sequence" in any sense, whatever statistical properties it has. You can't tell which is of random origin by inspection. The experimental subjects face an impossible challenge, and I can't see what conclusions you can draw from their responses.
Regarding "random process": (and sorry for commenting to myself)
I'm not taking a position on what a "random process" is; for these purposes, a PRNG, a LFSR or even the last three bits of the system-clock would do as well as radioactive decay.
By this logic the expression "being able to tell" should be banned from the English vocabulary, because no-one is able to tell anything with 100% certainty. Requiring 100% certainty as a precondition of using this expression is silly.
It depends on the framework. I can tell a geometric figure is a square because it’s a quadrilateral with right angles and sides of equal length. You could ask me a question like “Is a rectangle with a side of length 1 and a diagonal of root 2 a square?” and I can tell it is.
Ask me “Was 1 1 1 1 produced by a random process?” and it’s impossible to tell in the way I did with the square.
> It depends on the framework. I can tell a geometric figure is a square because it’s a quadrilateral with right angles and sides of equal length.
You're claiming to be able to craft a mathematical proof with 100% certainty. Although the thing you are proving appears to be obviously true (assuming a certain mathematical framework), the probability that you made a mistake is not 0%. You might falsely believe that the probability of making a mistake in a simple proof like this is 0%, but you would be wrong, and we have plenty of historical examples of mathematicians "proving" something and thinking that there is 0% chance of errors in the proof, only later being shown that they were incorrect.
You’re mixing up two layers of uncertainty. There’s an outer uncertainty. This would include things like I made a mistake, this is all a dream, etc. This outer uncertainty pervades all problems.
It’s often useful to ignore that outer uncertainty. We create a framework where we take certain things as true (shared reality, mathematical axioms). This framework may or may not have uncertainty inside of it, which we could call inner uncertainty.
Questions of probability have inner uncertainty. Questions of geometry do not. This makes them qualitatively different.
If you frame the initial task as something like “do your best to lead people to believe your sequence is random”, that makes sense. If the task is “make it so they can’t tell if it’s random”, that’s a bit off in some way. At the very least, it’s because you’ve presented the spotting of randomness as something that can truly be done to a logical conclusion (random/not or true/false). This violates both the outer and inner uncertainties of randomness.
Interestingly, the article computes the odds incorrectly: "... hit the same number on seven consecutive spins [...] the odds of which happening are 114billion to one...", which actually are the odds of having 7 consecutive 19s or the same (unspecified) number on 8 consecutive spins.
No seriously, humans can have 100% truth about core things that have an insane amount of empiricism such as e.g gravity being real. But it is accepted that for any non-empiritismed ad nauseam knowledge, when we use universal quantifiers, we tolerate generally some credible kinds of exceptions, contextually.
If it's light when I wake up, I would say that I can tell it's daytime, despite the possibility that it's still nighttime but a sufficiently near star has gone supernova or that the house next door is on fire.
This is a good illustration of a binary epistemology vs a continuous one.
> It's always impossible to tell, for any given sequence, whether it was produced by a fair die.
Something like "you can't make any determination, because it's random".
Whereas under the second worldview you can make statements about how likely things are, despite uncertainty.
For some reason the binary worldview seems to be incredibly common. My sibling commenter exhibits the same issue.
> you can make statements about how likely things are
Sure. And it's true that some sequences are more likely than others to have been emitted by a random process. [Edit] All sequences from a random process are equally likely. It's still true that some sequences are more-likely to have come from non-random processes.
The point is that randomness isn't a property of the sequence; it's a property of the process.
Could be. Though if you think long enough, with this worldview you can't decide anything, ever. And it's irrational. You can't say for sure which is random, but you can say for sure on which you should bet your money if you have to.
I was definitely confused and assumed it was about 'looking like' randomness.
But I did a lot of double clicking of things, because I felt that in 'real life' you're not going to get 1 roll of each number, but odd things happen.
But this is a bit moot - the people clicking 'all the same number' have obviously come to some different conclusion as the others - i.e. 'all possible values are the same' and therefore.
So what the study is really 'testing' probably, is how people react to the question.
They really need to change the question substantially in order to get randomness.
I don't see any insightful aspect in the experiment or the debate.
It's pedantic -> some people read the question differently and do different things.
I disagree. You want people to click numbers s.t. if you asked them 10k times, a uniform distribution would emerge. But that is not what's happening. They think all numbers have the same probablity, but if you click only 1, then the probability of your choice being random is low.
Edit:
Maybe this will convince you: You said each sequence of numbers is equally likely, hence, we can't tell. I'm going to disagree with that statement.
Let's say I give you a coin, and tell you: I've flipped this coin 100 times in a row, 10k times. And you look at the flips, and each flip result is 1111...111. Would you guess it's random, or biased? The probability of that happening is as high as any other sequence, but clearly, if you'd guess it was random, you'd be a fool. This is exactly what is happening here, just on a smaller scale: 111111111 being the result of the coinflip has a lower probability of being random than the result 100101101110.
11111 has to happen at some point if the experiment is really random. The probability that it happens with YOU is low, however. Thus, it is rational to decide that the sequence is not random. Because it most cases, it won't be.
Well, I'd guess that it's not a coin-flip at all; even a biased coin won't produce 10,000 heads and no tails, unless it's a two-headed coin.
Let's go back to the actual case in hand: suppose you provide me with "111111", and not 10,000 1s. I simply have no way at all of determining whether that is more or less likely to have come from a random source. So I would decline your bet. If it was 10,000 1s, then maybe I'd be a fool to bet it was of random origin; but you can't convince me that a string of 6 1s is or isn't of random origin. So no bet.
This is all irrelevant. We're discussing a single sequence of 6 digits. There are not enough samples to perform statistical analysis. Probability doesn't come into it.
> For instance, it's more likely that the number of heads and tails are about equal than not
This isn't even true. Heads and tails being equal over 100 flips is something you'll see something like 8.33% of the time (not based on probability, I just ran a simulation of 100 flips 10,000 times and got 833 instances of them being equal)
edit: I missed the key word, "about". Sure, they're more likely within maybe 5-6 of one another than not.
Yes :) The exact probability for having 50%/50% is (100 nCr 50)/2^100, or about (as you found through experiemtation) 0.08.
The more often you run the experiement, the more likely you'll get a result close to 50%/50%, by the way. In the limit, you have a variance (i.e. spread of results away from the expected value, which is 50%) of 0. This is called the law of large numbers. As the generic name suggest, it's pretty central to mathematics haha.
>It's always impossible to tell, for any given sequence, whether it was produced by a fair die.
If the sequence is long enough you can model how likely it is to have been produced by a fair die. Are all numbers equally distributed? Are some numbers more likely to follow or not follow other numbers? Are some patterns repeating?
Of course any sequence can be produced by a fair die, but you can still create some objective metric that will tell you how truly random a sequence is, and the longer it is the more accurate it will be. It's what tests like Diehard are about after all.
Can you roll a fair die a thousand times and only get 6s? Well yes of course. It'll never happen though.
>Can you roll a fair die a thousand times and only get 6s? Well yes of course. It'll never happen though.
Somewhat related, but humans are also REALLY bad at generating randomness and one of our big tells is an aversion to repeats. If you ask someone to pick 0-9 randomly, repeatedly, they will rarely repeat numbers. But in a truly random sample a repeat is likely 10% of the time, and a three-peat will happen roughly once every 100 numbers. An average person will rarely if ever repeat a number and sure as heck won't ever come up with it three times in a row.
Yeah but even then most people cap the repeats if they’re faking. If 100 people have to roll 10 random numbers, it is very likely some people will have 3-4 in a row of the same number. It’s less likely none will. My brother in law teaches stats and he runs this scenario with a seminar he teaches. He then runs the results through a simple formula he built in excel and can ascertain who faked vs. did it for real with about 95% certainty IIRC. I love little games like that haha
And funnily enough, you'll often hear this trait as being desirable in a pseudo-random number generator. People often want something that will jump around fairly unpredictably but that will come close to outputting all possible numbers once before getting into re-runs.
Quasi-Monte Carlo has a rate of convergence close to O(1/N), whereas the rate for the Monte Carlo method is O(N^(−0.5))
For such applications it's best to use quasi-random numbers (a.k.a. low-discrepancy sequences) such as the Halton sequence or the Sobol sequence instead of pseudorandom numbers.
Thank you for the link - I had not heard of this kind of sequence. It looks like something I'd like to know about, but I think it's beyond my schoolboy-level mathematical abilities. Anyway, I guess I'll have a peek in the rabbit-hole.
I’ve coded up this exact algorithm, it’s really fun. It’s useful for “shuffling” in the music sense (not the cards sense).
I think it’s actually the prototypical real-world software engineering problem. User says they want X (random music). X is a term in software, so you give them that (you get a random song). They’re not happy. You dig and find out they really want A, B, and C (next song is unknown, songs don’t repeat too soon or too infrequently). This new problem is harder to verify (how soon is too soon?).
Editing in tips on solving this sort of problem. You can turn vague requirements into precise requirements. Rather than make the precise requirements exactly equivalent to the vague ones, it's easier to make them more restrictive. Is playing a song again within 50% of the length of the playlist "too soon"? Maybe. How about within 80% of the length of the playlist? Definitely not. We can give ourselves the requirement "Songs must always play again between 80% and 125% of the length of the playlist." Much easier to solve, much easier to test.
Sometimes the extra restriction make the problem harder (not usually I've found). Still, this is a great trade because understanding requirements is harder than solving well defined problems.
[To the point of this whole post] Requirements can be turned into testable properties even if it's not programmatic. "When I look at a list of chosen songs, there must be no obvious patterns." Who says what's obvious? You do! Then, have someone else do the same.
Consider extreme cases. Extreme cases tend to be the most or least important. If they're least important, create a new set of easier requirements or drop it all together. "If 3 - 10 songs, always play within double the playlist, no obvious patterns, never twice in a row. If 2 alternate, if 1 repeat."
I actually had an engineering ask from somebody to produce "random sequences" which turned out to be anything but random. Short version: letter sequences, no repeats in a sequence or in adjacent sequences.
Took months and a lot of patience to extract that crucial piece of information from the customer. I actually coded a recursive algorithm which to my surprise generated every possible (three letter) sequence in an acceptable sequence, enough for them for several lifetimes.
> Extreme cases tend to be the most or least important.
There's an overlap in meaning between random, strange and unknown. "This random guy came up to me..." Keeping that in mind helps when talking to people about "random".
What is an example of a situation in which this is desirable:
> come close to outputting all possible numbers once before getting into re-runs
In a dice rolling game, you want as close to true random as your PRNG can get. In card drawing, you typically want EXACTLY all possible cards once before getting "repeats". Where do you want something in between?
Last time I encountered this was creating random IDs for things, either as a random string of characters or when selecting from lists of attributes and animals like all the tools that'll name things like "curious possum". It's not actually a hard requirement to hit every possibility before repeating, but if you do see 2 or 3 clusters in a sample it gives people the impression it isn't random.
> Can you roll a fair die a thousand times and only get 6s?
Anecdote time:
A few years ago with my friends we were discussing how easy is to roll 5 dices simultaneously and get the same result in all of them. This is a possible way to win a popular game here https://en.wikipedia.org/wiki/Generala We estimated how often you can roll the dices and the probability, and we estimated that you must try during 2 or 3 hours to get that result.
We were young, had a lot of free time, so one of my friend started rolling 5 dices while he was talking with us and eating. After about 2 hour he got the 5 equal dices in a roll. (IIRC he tried again, with a similar result.)
(Note that 2 or 3 hours is consistent with how this outcome is used in the game to win automatically. It's possible to get this in a normal game, but it's not super common.)
Also, each time you add a dice, the time increase exponentially. With 6 dice it's 14-21 hours, like a day. With 7 dice it's like a week. With 9 dice is like a month. With 10 dice is half a year. 1000 dice will take longer (and x6 if you want to get only 6s instead of any repeated number).
(Note that parallelizing this to all humans only adds 14 dices. If everyone on Earth start rolling 24 dices, it will take like half a year until one of us get 24 equal dices.)
Any single sequence of numbers has the exact same probability of being produced by a fair die (that's how the definition of "fair" goes). The probability of getting all 6 is the same of any other one you get.
For sure, but I think it's more helpful to this about "classes" of sequences. Sequences which have a uniform distribution of digits (within some margin of error), sequences which do not have repeating patterns, sequences that do not contain the same digit twice in a row etc... Any single one of these sequence is as likely as any other, but some "classes" are vastly bigger (and therefore, more probable) than others. By deciding which classes of sequence any result belongs to, you can decide if it's likely to have been produced by a fair die or not.
This intersects with the concept of entropy: assuming that you have a box containing a gas whose particles move randomly about the volume of the box, then at some point you take a snapshot of the position of every single particle in the box and you discover that they're all in the right half of the box, the left side being in a vacuum. Would you assume that it's just random chance? It could be. It certainly isn't.
Meanwhile any of the trillions and trillions of snapshots showing particles more or less uniformly distributed within the box are all more "random looking" and are what is expected from such an experiment. These configurations as a group occupy the vast majority of the phase space for the contents of the box.
That's true given that we have a fair die. But the question is, given some results, what is the probability that the die is fair? And there are statistical tools for that.
You work with crypto according to your profile. I hope that when you see a random generator return a series of a hundred 6s, you go and check what's wrong with it instead of assuming you just got lucky this time ;-)
This is much more relevant in crypography than on statistiscs. If your PRNG always returns the same 4, it's buggy, but the really problematic outputs look exactly like random numbers.
(Looks like my profile is out of date, by the way.)
We disagree on the definition (or divination) of "dice" and "sequence" and "roll".
The sequence length of a single die roll is 1. The probability of any particular roll is the same.
Let's do the next part with two-sided dice because it will be quicker.
The sequence length of a roll of two dice is 2. The roll is a bag; the probability of either [0] or [1] is lower than the probability of [1,2] because it's a set membership and [1,2] has the same members and is the same as [2,1]
> It's always impossible to tell, for any given sequence, whether it was produced by a fair die. There's nothing an experimental subject can do to make the impossible more impossible.
That's just not true.
Or feel free to play a game with me. We'll roll a 20 sided die. If it comes up 20, you give me a ten. If it comes up any other number, I'll get you a dollar. Nice EV on that!
Oh, the die has come up 20, 20, 20, 20, 20, 20, 20 the last seven times. Do you play?
It's just a question about what we perceive as random. It has nothing to do with the probability of the sequence being produced by a die, only with the probability of the sequence being produced by a human. A 20 20 20 sequence is not less random, it's just more likely to be produced by someone with incentive to cheat.
How did the researchers measure the "randomness" of a particular sequence in this experiment?
> Formally, the algorithmic (Kolmogorov-Chaitin) complexity of a string is the length of the
shortest program that, running on a universal Turing machine (an abstract general-purpose
computer), produces the string and halts.
I think we want (and expect) RNGs to produce sequences with high algorithmic complexity (which we regard as "random") and ignore the (almost impossible) possibility that they fail to do so.
An impossible to create modified RNG which always produces sequences with high algorithmic complexity would better meet our expectations of randomness, but would be less random because it could not produce uniform sequences (among others).
I would; would you? The Gambler's Paradox says you shouldn't (I'm assuming your 20-sided die isn't crooked).
Incidentally, you haven't made your case that you can ever tell whether a given sequence was produced by a fair die. You've just asserted it, and then suggested a game that doesn't illuminate anything.
It would be incredibly stupid to take the bet, as it's way more likely that the sequence was not produced by a fair die than that it was (i.e. the dice is rigged in the example)
Just because it's impossible to know for certain, doesn't mean you can't make a prediction with very high chance of being correct.
I'm not really a gambling man, but I'd expect a crooked 20-sided die to produce a biased sequence, not a running straight. I don't know if it's possible to make a die that always rolls the same, and I'd expect any such die to fail a superficial inspection (all sides but one bulge; one side is larger than the others; the die has a weird magnetic field; the die is heavily weighted on one side).
So I'd still expect a running straight to be rare, even with a crooked die.
Of course, if I watched the die produce 7 20s in a row, and was then asked to bet on the next roll NOT being 20, I'd be stupid to assume the die was fair without inspecting it.
All this is beside the point; the instructions invite the subject to produce a sequence that they think will convince people it was produced by a roll of dice. But there is no sequence that SHOULD have that power to convince.
> I don't know if it's possible to make a die that always rolls the same
The easiest way would be to put the same number on every side, which would probably fail a superficial human inspection (but might pass a surprising number of machine inspections).
> All this is beside the point; the instructions invite the subject to produce a sequence that they think will convince people it was produced by a roll of dice. But there is no sequence that SHOULD have that power to convince.
If you genuinely believe that, we can easily set up a sequence of bets where you will win infinite amounts of money from me. But of course, you don't genuinely believe that, so you aren't interested in making bets around your supposed "beliefs".
I guess my "supposed" beliefs must be the beliefs you suppose I have. Whatever.
If you're offering me a bet, and you can easily set it up, then what bet are you proposing? You haven't been very specific. I'm no Turf Accountant[0], but I can spot a three-card-trick when I see one.
> the instructions invite the subject to produce a sequence that they think will convince people it was produced by a roll of dice. But there is no sequence that SHOULD have that power to convince.
Let's gather a random sample of people 20 people. I will produce 10 manually generated sequences of dice rolls and 10 actual dice roll sequences. The sequences are added to a list and the list is shuffled. We will present each person with a sequence from the list (sampling without replacement), and the person should guess whether the sequence was manually generated or produced with a dice roll. For each person who correctly identifies a manual sequence as a manual sequence, I will pay you $1. For each person who mis-identifies a manual sequence as an actual dice-roll sequence, you will pay me $1000. As you said, you believe no sequence should have the power to convince a person of such a thing, you will obviously never have to actually pay me $1000, you would just collect 20 x $1 from me. I'd be happy to continue this up to infinity in batches of 20, so you will eventually get infinite dollars from me.
> For each person who correctly identifies a []manual sequence[] as a manual sequence, I will pay you $1. For each person who mis-identifies a []manual sequence[] as an actual dice-roll sequence, you will pay me $1000.
Payment only happens for the manual sequences here.
You promised me "infinite amounts of money", I only stand to win 20 bucks.
Also, you have specified that these are 20 random people; so I guess I don't get to brief them in advance that they MUST say manual each time. So you have replaced me, the bettor, with a panel of 20 people whose average IQ is 100, and who don't have my interests at heart. Why would I take that bet?
If my random panel say manual each time, you stand to lose.
But as I say, I don't bet often. Only once a year, only on gee-gees, and only as much as I'm willing to lose (because I always lose).
Look, you're merely pretending to disagree. You're pretending to believe something akin to "it's impossible to craft a sequence of numbers that convinces observes of its randomness". But you don't actually believe this [or whatever minor variation of that statement that you'd find agreeable in rhetoric]. If you did believe it, you would find a wager that we could do to settle this disagreement. But no such wager can possibly be formulated, because you don't actually believe what you pretend to believe.
It's not about my beliefs, except my belief that the die is fair.
I made a statement about a simple bet, using a fair 20-sided die, on the outcome of the 8th roll after the die has just come up 20 seven times. I'll take the bet that it doesn't come up 20 on the 8th roll. This scenario with 20 random people and 10 sequences appears to be a different scenario. I'm not sure how the odds work in that scenario, and I don't fancy that bet. That's all.
> I made a statement about a simple bet, using a fair 20-sided die, on the outcome of the 8th roll after the die has just come up 20 seven times. I'll take the bet that it doesn't come up 20 on the 8th roll.
How is this related to the topic at all? Our disagreement concerns the ability of humans to produce numbers that look random, our disagreement doesn't concern the ability of a fair die to produce numbers that look random. Of course a fair die is going to produce numbers that look random, that's not related to this discussion at all!
It's related only to your challenge with a contrived and complicated betting scenario.
"The topic" is whether the challenge faced by the experimental subjects makes any sense. It doesn't; the challenge is to produce a string of six symbols that others can't distinguish from randomness. There is no such string. Your contrived betting scenario doesn't illuminate the issue; it's an attempt to distract attention, and IMO it's not in good faith.
> "The topic" is whether the challenge faced by the experimental subjects makes any sense. It doesn't; the challenge is to produce a string of six symbols that others can't distinguish from randomness. There is no such string.
I strongly disagree with that statement. On a surface-level inspection, some strings appear to have more entropy than others ("can be distinguished from randomly generated strings"). This absolutely is a real thing, and it's measurably real. If you genuinely believe what you say, then we can easily wager on it and find out who's right.
> Your contrived betting scenario doesn't illuminate the issue; it's an attempt to distract attention, and IMO it's not in good faith.
You asked me to produce a specific betting scenario, and I did my best to entertain you with that. I crafted the scenario in good faith insofar as I tried my best to answer to your request. I suppose you can still argue that it's not in good faith because I never had any expectation that you would take the wager. But like I said before, that's because there is no way to formulate our disagreement as a betting scenario that you would accept, because you don't genuinely believe the claim you are making here, so you will simply weasel out of any wager.
I withdraw my accusation of bad faith, and I apologise for that remark.
I call on you to withdraw your claim that my remarks were made in bad faith.
I still don't know why you felt the need to contrive a complicated betting scenario, when we were discussing a "simpler" scenario that already involved an icosohedral die. If we're using betting scenarios to model good faith, then aren't simple scenarios more useful than complex ones? Ergo, coin-toss is the most appropriate.
But trying to talk about this stuff in terms of physical things like coins or dice inevitably turns into discussion about unfair coins and crooked dice, or whether the caster can influence the outcome; so argument by analogy quickly leads to dead ends, in this area.
> I call on you to withdraw your claim that my remarks were made in bad faith.
I don't want to offend you, but I still do not think you believe the claim you are making. If you actually do believe it, we should be able to formulate our disagreement in the form of a wager and use the scientific method to determine which one of us is correct.
> I still don't know why you felt the need to contrive a complicated betting scenario, when we were discussing a "simpler" scenario that already involved an icosohedral die. If we're using betting scenarios to model good faith, then aren't simple scenarios more useful than complex ones? Ergo, coin-toss is the most appropriate.
The die scenario you formulated is not suitable, because it is unrelated to our disagreement. We're looking for a wager that illustrates the disagreement, e.g. you would be taking one side of the wager and I would be taking the other side of the wager. Your die scenario is not like this, because both of us would be taking the same side of the wager in that scenario; it has no connection to our disagreement at all.
I tried my best to formulate a simple scenario that would illustrate our difference and allow us to wager on which one of us is right. Sure we could use coin-toss instead of dice. Feel free to formulate a scenario.
> But trying to talk about this stuff in terms of physical things like coins or dice inevitably turns into discussion about unfair coins and crooked dice, or whether the caster can influence the outcome; so argument by analogy quickly leads to dead ends, in this area.
The disagreement concerns whether a person can produce a sequence of numbers in a way such that another person will/will not not be able to determine whether that sequence of numbers was manufactured or generated by a random process. A generator like coin toss or dice roll is very appropriate here.
This is the telltale sign of someone who is wrong but refuses to admit it. They begin bringing up completely tangential points and ideas to distract from the original argument that they now realize they lost.
The obvious thing to do is then conduct a physical test on the die, for example trying to spin it on an axis with the 20 face near the top of the axis. An obviously loaded die will not be mass-symmetrical and will not spin well.
In general this is called an independent test for systematic bias and it's something often left out of statistical arguments.
> Oh, the die has come up 20, 20, 20, 20, 20, 20, 20 the last seven times. Do you play?
If you let me float the die in a cup of water and spin it to determine its not weighted so the 20 comes up, or have some magical means of assuring me the die is not weighted 100% yes I would play.
It's entirely possible that a fair die rolls 20 7x in a row, but it's more probable that you're cheating.
It introduces a 10 billion human second century i.e. 3.1 x 10^19.
If your chance times it is greater than 1 then it's definitely plausible to be done by someone, somewhere.
Yeah except that can happen. Its vanishingly a small probability but it's not impossible. I would feel that the die is crooked but feelings are not proof
Hey, out of curiosity, why did you choose a 20-sided die? I assume it was just to match the "ten bucks" bit. I'm not snarking, I just began thinking about 20-sided dice.
I do not believe there is a solid with 20 faces, and all faces congruent (I haven't checked). So we end up with a solid that is roughly ball-shaped, with faces of different sizes and shapes.
[Edit] I checked; I suspected I'd made a fool of myself. An icosahedron has 20 congruent faces, of course.
A ball-shaped die is much more likely to topple, and so more likely to be influenced by small differences in weight distribution, selective corner-shaving, whatever.
I can't think of a way of judging the fairness of a 20-sided die other than casting it many times, and analysing the results. I'd be much more confident in my ability to judge by inspection the fairness of a 6-sided die.
You’re comparing apples to oranges. The correct analogy is: you pick any other sequence of numbers between 1 and 20 and then tell me you’re more likely to win because your sequence is more random.
> That instruction is a flaw in the experiment. It's always impossible to tell, for any given sequence, whether it was produced by a fair die. There's nothing an experimental subject can do to make the impossible more impossible.
Baloney!
Say you measure the traffic to your website in the morning and the evening, every day for a week. And this is what you see:
day: 1 1 2 2 3 3 4 4 5 5 6 6 7 7
time: M E M E M E M E M E M E M E
visitors: 51 73 58 72 50 78 55 74 55 77 52 73 55 76
It could be that the traffic you get is uniformly random, and each day the number of visitors you get is a uniformly random number between 50 and 80. Sure, all the morning numbers are less and the evening numbers are more and there are suspiciously no numbers in the 60s. All that could be a coincidence.
You know perfectly well, though, that you're getting more traffic in the evenings.
You've answered a different question. The question is: can you construct a sequence of dice-casts that an adversary can't distinguish from a real dice-cast? Answer: you can't.
for some X. Feel free to explain[0] how you are able to distinguish which is which.
The real issue is that most people don't bother to produce random numbers in a way that's actually secure (which, to be fair, is rather tedious if you don't have a computer handy, and downright prohibitively impractical if you want to do it all in your head, so why would you bother?), either in the study or in general.
0: If you'd like a more black-box distinguishment, I can provide a longer list; obviously a adversary can get the right answer 50% of the time just by chance.
Can you contruct a sequence that an adversary CAN distinguish from a real dice-cast?
If you can't distinguish a spoof from the real article, then that blade has two edges. It's impossible to distinguish them, so the instruction in the pudding test that you are to make a sequence that is indistinguishable from a dice-cast is meaningless, because any sequence is indistinguishable from a dice-cast.
If you ask people to do impossible things before breakfast, then it's not sensible to do an analysis of what they end up doing. It's a waste of time.
> Can you contruct a sequence that an adversary CAN distinguish from a real dice-cast?
With high probability (over the distribution of possible dice-casts to distinguish it from), yes:
A: 6 4 3 2 1 4 5 1 6 5
B: 1 1 1 1 1 1 1 1 1 1
Same procedure (A=constructed,B=10d6,xchg A <=> B if another d6 is >3), but this time I clearly don't need to explain how the constructed sequence was constructed.
The adversary will sometimes get it wrong because 10d6 came up "1 1 1 1 1 1 1 1 1 1" itself (or something equally-plausible like "6 6 6 6 6 6 6 6 6 6"), but they'll do much better than random guessing. Whereas being able to do better than random guessing against a CSPRNG/stream cipher means the cryptographic primitive is completely broken. (I'm not sure it's a direct break for a hash function, but it's still pretty bad.)
Now I'm curious: in your opinion what is funny about the line "Nine Nine Nine Nine Nine Nine" in the following cartoon strip, vs. something like "Two Nine Eight Three Seven Eight":
Martin Gardner had a wonderful demonstration in one of his columns, where he had people write out 36 "random" numbers, then fold them into a 6x6 square. there were inevitably doubles or triples of the same digit going vertically, and almost never horizontally, since repeating a digit didn't seem random enough to the person generating the sequence.
> That instruction is a flaw in the experiment. It's always impossible to tell, for any given sequence, whether it was produced by a fair die. There's nothing an experimental subject can do to make the impossible more impossible.
I don't think it's a flawed instruction -- i.e., I believe most participants will read it as intended, and it will produce interesting results. I read it as follows:
"Suppose we show a person your sequence and a sequence generated via an actual random number generator. Minimize the average probability that that person correctly guesses which sequence was generated by a human, across all possible real random sequences."
I guess most other people would read it that way, even it is not formally correct from a statistical perspective. And I believe it gets at what they're trying to measure.
> I guess most other people would read it that way
Not me. But as I said upthread, that's the only way I can construe the question as being coherent. And I agree that the question thus construed invites the experimental subject to make predictions about the behaviour of others; so the question tests the subject's accuracy in psychological judgements.
If you construe it as being a coherent question.
I think it's silly to construe it that way; I just think it was a crap question, and you can't draw any conclusions at all from the results.
Too deep topic and maybe I don't understand it compared to the megaminds here on hacker News but as far as I get it when it comes to randomness every sequence has the same probability which is simply the inverse of the number of sequences for any given length.
In that case it seems to me impossible to tell from only the information of a single sequence.
But if you have multiple sequences where one is the binary coding of a Mozart symphony and another is the binary encoding of Shakespeare sonnet and another is the binary encoding of days of the week or a DNA sequence if you have enough data points like that I think you can start to say Well they're probably not random because we see coherent patterns.
The second aspect is coherent pattern depends on perspective. So what seems random to us maybe a very common physical constant to an alien civilization. so if we see a binary sequence that represents pi, maybe we say oh well that's not random because it's accurately pi to like a thousand places... but if we see another sequence that looks random maybe it's a physical constant to an alien civilization that's accurate to a thousand places as well. But we don't know that so it looks random...to us. So randomness doesn't just depend on entropy, I think in a realistic point of view, probability is not enough, it also depends on something which Maybe it's a little bit harder to measure perhaps which is perspective. Random depends on the context.
I think your way of looking at it requires an explosion of alien civilisations to find one in which our random 1000 digits is their special number. If you generate 1 extra digit, you'd need 10x as many civilizations to expect a match.
Getting back to the experiment, obviously it's measuring the human mind so the context is the person's mind. They might enter their bank account number which looks random to the researchers but is actually cheating because they're not using their brain as the source of randomness.
That's an interesting point about the random sequence selected what's the chance it matches an alien civilization constant. I didn't think about that way. I guess I thought about it more from your other example point of view, where say an alien civilization pretends to be a human and gives you a number that they say is random (just like the guy giving you his bank account number) but actually the alien civilization gave you a physical constant.
It's interesting that there's a difference but it's sort of like the difference between you picking a random sequence from a random source and the probability of that being actually random versus you being given a supposedly random sequence by somebody and the probability of that being actually random. I think.
Yea I guess we couldn't tell if somebody gave us a number that was special to them but appeared random. Afterall, the alien's physical constant (or rather, the arbitrary definitions of their units) would have effectively been chosen by a random number generator just like ours are.
If you pick a random sequence from a random source, then it's truly random by definition, isn't it? Even if it happens to be 1111111. It might score badly on a randomness measure but no randomness measure is truly perfect.
Let's not confuse P(observed_rolls|used_dice) with P(used_dice|observed_rolls). P(observed_rolls|used_dice) is always the same, independent of observed rolls, assuming the dice are fair. But P(used_dice|observed_rolls) can vary, because other ways of generating rolls which are under consideration may be biased towards certain answers, and this allows you to perform inference.
For example, suppose the rolls you will be shown were either generated by fair dice or by the program "always return 4" [4]. The rolls you are given are "4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4". Are you really thinking you'd make the same prediction for this sequence of rolls as you would for the sequence "1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1", or is there perhaps some SMALL INKLING OF A HINT as to which answer is correct?
The simple fact of the matter is that, in the real world, if you see a sequence like "4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4" you can be surprisingly confident that it was not generated by die rolls. This is because there are plenty of other ways to get sequences and those other hypotheses didn't just pay a Bayes factor penalty of a trillion. Seeing the instructions as incoherent is the mistake of trying to over-isolate the study to the abstract mathematical realm, instead of the world people actually operate in. If someone tells you their luggage combination is 1234, do you really think it's meaningless to guess that it was the default combination as opposed to being generated by secure die rolls? Do you not form opinions about whether or not someone is using secure randomly generated passwords when you find out their password is "password2"?
All sequences are equally likely to be produced by a fair die but humans are very biased in the kinds of sequences they produce. It might be impossible to ever be certain but you can certainly look at a sequence of all sixes (or that contains more complicated patterns) and estimate that it was much more likely to be produced by a human than a die.
I would have thought that “looking random” would have been calculated by just checking if there is a bias in people’s answers. If someone chooses something that has never been seen before vs something that has been picked a hundred times, then it might be “more random” as the bias clearly comes from humans
There's no way of telling. I'll guess you didn't, but that's a guess about people, not about sequences of digits.
See, it's not about how likely it is that OP rolled 123456123456; it could have been any "unlikely" number (where I suppose "unlikely" is a psychological quality). So OP keeps rolling until an "unlikely" number comes up, and exclaims "Wow, how unlikely was that?". Well, it's impossible to know, without knowing how many "unlikely" numbers there are; but it's much more likely than 123456123456 is.
Aren't all sequences really equally unlikely? So whatever you roll, it's as improbable as all-1s?
I agree with the article that the study is flawed in its unwillingness to exclude the all-H and all-T answers. But I’ll go further: the original study is just silly.
“Make a sequence that looks random” is sort of a nonsensical ask. Looks random to whom? To our algorithm, is what they meant. There’s no such thing as a randomness test that can look at a sequence and decide “is it random?”, so this algo measures something else and uses it as a proxy for “looking” random to…the study authors, I guess? There’s no ground truth here; it’s chasing a ghost.
I don’t know what information we could even hypothetically gain from knowing older people score lower according to this algo—-perhaps that the paper’s authors are younger than 60 and thus picked a different “randomness-looking” algo than they would if they were older? At best that older people have an equally incorrect but qualitatively different idea of “random looking”?
Of course we did not learn that; we only learned that older people pick all-H or all-T more. But my point is that there wasn’t really anything interesting at stake anyway.
111111 is just as likely as any other number. However, in practice, humans are far more likely to think of 111111 than other numbers, so we exploit the difference between the probability a human guesses a number vs. the probability of rolling a number on a fair dice. "Did I just roll a sequence X or am I lying?" vs. "What is the probability I roll sequence X next?" are quite different questions, if by "lying" you mean you are not drawing the sequence from a fair, random source.
111111 is just as likely as any other *SEQUENCE* of numbers. This is a little confusing because the "sequential" requirement is somewhat masked by the repeating sequence used by the example. however, the odds of rolling six 1's is 0.00002143347.
considering the non-sequential set of rolls 625631 you have the odds of exactly two 6's at 0.201 and also one 2 5 3 and 1 each at 0.402
0.402^4 * 0.201 = 0.00524928641, or ~244 x more likely.
There are many ways of slicing and dicing things so that you group different sequences, e.g. group permutations as the same as you have done. But by "number" I was referring to a sequence of symbols drawn from a uniform distribution, so order matters. (On the website, you also choose options sequentially, anyways.)
All 1s isn't equally as likely as the entire class of outcomes that are not all 1s, but you can also say that about every other outcome (as long as we're talking about distinguishable dice or a sequence.)
You are really confused about this. In general a randomly generated sequence on average will have equal 1s and 0s. Try generating 100K sequences and then count the sequences with:
You missed the parent’s point. “Equal 1s and 0s” is not a sequence; it’s a class of sequences. So the fact that sequences with that property are more common than the specific sequence “all 1s” is true but doesn’t answer the question. 11110000 is a different sequence than 10101010. The question this thread is exploring is whether 1111111 is more likely than any other sequence, e.g. than 1111000 in particular (or insert any other sequence). And of course the answer is no.
The point the parent is making is that 1) represents a class of results rather than a single result, and any single member of that class is equally as likely as 2) or 3). Obviously the class as a whole is more likely than any other single result, but that's a different assertion.
I believe what pessimizer means is that while "all 1s" is not as likely as "equal 1s and 0s", it is just as likely as any individual string of 1s and 0s - for example, 111111 is just as likely as 100110.
If you know it has been generated by a valid random number generator, then no.
But if you know there is a chance it came from something other than a valid random number generator, then you would have to classify sequences like 11111111 in a "highly suspicious" category.
The fact it’s equally likely as any other sequence means it should be very unlikely to appear in the study’s 3429 samples, and extremely unlikely to show up more than once.
Precisely. A difference only arises when comparing sets of strings. In our example, the number of elements in the set uniform strings aaaa...a of length L is equal to the number of distinct symbols N, while the number of district strings of length L is N^L. So if you ask "will the string be uniform?" (which is the only reason you thought up aaaa..a in the first place) you find the probability is exponentially small for long strings: N^(1-L). Really if you tend to choose a string for any particular reason other than chance, it can be exploited as long as the reason can be guessed (e.g. the dice rolls 31415 write out the digits of pi.).
Helps to think of it in binary, 11111111 (bin) == 255 (dec). Also helps to define the space of possible outcomes, 00000000 -> 11111111. Then ask, is this a discrete event or a sequence of discrete events? I.e. if we have a roulette wheel with 256 slots, and throw a ball in, then the chance of the ball falling in any slot is 1/256.
But what if we say, we're going to generate that binary sequence by 8 successive flips of a coin, and we are aiming for 11111111 specifically? Then we have to multiply eight times. (0.5)^8 == .00390625 == 1/256
Where it gets a bit tricky is if we ask people to place bets after each successive flip of the coin. For example, starting with no flips, ask players to bet on the likelihood of 8 heads in a row. Next round, bet on the likelihood of 7 heads in a row, knowing the first was a head, etc. What minimal odds should the house give after each flip in order to reliably turn a profit on this game? Does it matter how many players are at the table when it comes to calculating those odds?
It’s about the reference class. The class of allsame sequences has only a few members. The class of … random numbers all over ala 71833791 has a lot of members. So the probability of seeing the former class is tiny.
Think of it in terms of coin tosses. Two heads is less likely than a head and a tail, because there are two sequences, 10, and 01, that map to the class of having one tail and one head. But there is only a single sequence of 00 in the class of all heads.
You can estimate the probability distribution that generates this sequence as 0: 0, 1: 1. This is as far from 0: 0.5, 1: 0.5 (a fair coin toss) as you can get.
Comparing mean and std dev can be used to estimate the distance between two distributions. See also, statistical testing.
Why doesn't entropy or kolmigorov complexity qualify as a valid metric here? A random process will produce a highly entropic result or something with high kolmogorov complexity, whereas something biased will produce something with less (like HHHHHHHHH).
“Sequence looks random” is not nonsensical. The authors say your sequence should be indistinguishable from e.g. 12 coin tosses. This would have a uniform distribution.
One approach would be estimate the probability distribution from the input sequence and calculate the KL-divergence [1] of that to the uniform distribution. This gives one objective measure of randomness. There are many others!
TL;DR: There are definitions of randomness that can be tested against.
You can define terms however you want, and thus devise any measure you want. But that’s not a definition of randomness I recognize. Perhaps entropy. Randomness is not a property of the result; it’s a property of the process used to generate it.
Let’s imagine applying your measure to a whole bunch of sequences, each of length N, with the elements of each actually drawn from a uniform distribution. You measure the randomness of each one. You’ll get a range of DLK results, distributed from, in your interpretation, “very random” to “not that random”. All-H or mostly-H will come up sometimes, the distribution estimator will return a skewed result, and it will get a divergent “score”. But everything came from the same distribution. So we’re now measuring the output of an actually random process and saying “we’ll, it’s usually random but not quite always”
In contrast, let’s try your method on a different population of sequences, where instead of pulling the sequence elements from a distribution, every sequence is a hardcoded copy of HTHTHT… That gives “perfectly random”, even though it was very far from a random process.
That’s close to what the study authors are doing here, except with a different definition of “random looking”. It’s measuring a property of a sequence, but it isn’t whether it was randomly generated.
We could debate about whether this is a good measure of “random looking” and there could be lots of alternatives with no objectively best. But that is my point: if I ask “make me something random-looking”, I am only asking “how closely does your measure of post-facto randomness match mine?” An actual random sequence would be, well, random.
But they didn’t measure that. They’d have to measure that by showing a bunch of people each sequence and asking “is this random?” Instead, they threw a complexity formula at it, which measures some specific thing, but not “other people”
Though perhaps there is a body of existing literature showing that their complexity estimator matches people’s assessment of “randomness”? If so, does it include people over 60?
> They’d have to measure that by showing a bunch of people each sequence and asking “is this random?”
Agreed!
> Instead, they threw a complexity formula at it, which measures some specific thing, but not “other people”
Yep, they botched the study.
Nonetheless, the "ask" that you criticized was fine. It explained that the subject should craft a pattern which would appear random to other people, and imo there's nothing wrong with that ask. The mistakes came afterwards.
I feel like a lot of the comments here are written after only taking the test and many are not reading the rest of the article.
The authors of the website are stating that they believe the study is wrong. The below/above 60 answer is showing you it’s incorrect half of the time along with data backing up the claim.
But their data doesn't make sense to be personally...
Only 5% of their dataset is above the age of 60, making their claim that they are getting 50% of their guesses wrong seem like they are calculating it wrong. Surely their cut-off should be at the 95th percentile of the data?
They shouldn't be guessing 'under 60' the same proportion of times as 'over 60', because their population is mostly under 60.
Again though, they are arguing that there is no correlation between randomness and age. This was just a demonstration that when they use randomness to predict age, the results are wrong 50% of the time-- which is precisely in accordance with their hypothesis
Yeah but their guess shouldn't be wrong 50% of the time as again that means that they can’t have picked the 95th percentile result! Because it’s 50:50 I’ll assume that they are assigning people scoring higher than average the “under 60” category - which is obviously incorrect. Otherwise how do they pick the cut off?
To explain with another example - let's say that I have a dataset of 100 people's scores at golf (no handicaps) and I know that 5% of them are pro-players and others are 'advanced amateurs'. Because of this I might take the top 5 scores and guess that they are pro's and assign the others the guess of 'advanced amateur'.
Now let's say that there was actually no correlation between people's scores at golf and their 'pro' status - what accuracy would I expect in the above experiment? The answer is actually closer to 90% 'accurate guesses' than 50%! (Although obviously - that's 90% accurate based on random chance).
Now if someone told me they got 50% of the guesses wrong at this task, that implies that they guessed that the top 50% of those golfers were pro rather than picking the top 5% of scores, and I would question the methodology.
This % is similar to the dataset in the webpage - I downloaded it, filtered out exclusions and c4% of the valid responses are 60 or over.
If I inherently pick a small population (i.e. over 60's are c4% in this dataset) and I am guessing wrong 50% of the time, it means that my cut-off is incorrectly calibrated. Their score cut-off should, at worst, be picking the wrong 4% and missing another 4%.
Am I going crazy? It seems logical to me, but to be open maths isn't my strong point. I just know that if I designed the guessing rule, I would be getting more than 50% (my algorithm would be 'if the users average score across the three tests is less than -1.5, assign 'over 60' and that would get c95% accurate guesses, albeit it would still not prove anything and I agree with the authors overall premise!).
In your golf example, making that guess requires an additional knowledge of what "pro" means and it's frequency among golfers. The data doesn't know that just like the randomness data doesn't know that most humans are younger than 65 years old. If you really want to figure out how predictive the data is, you shouldn't include considerations like that in your model. I get what you're saying but ultimately I don't think their goal was to make the most accurate prediction, they wanted to make one that illustrated their point by basing their guess off the data alone.
The calculation involves knowing the age of the sample population though (if you don’t know the ages of your sample, how do you work out what the cut off is at 60 years?).
If I don’t know how many golfers are pro, I simply cannot estimate if it is 100 golfers that are pro or 0 (unless it’s a real gap in scores). Making an assumption that 50 are pro is no more valid than 0 or 100.
If you take the average score of 100 people and say that you estimate anyone scoring below the average is above 60, you are going to be wrong regardless of if your hypothesis is valid or not.
Putting that up and saying “see, it’s wrong 50% of the time!” doesn’t make sense when your calculation is incorrect.
In order to calculate the cut-off correctly they either need to take the 95th percentile result, or pick a sample where 50% of people are over-60 and 50% are under 60 and take an average of that.
Using a dataset where 95% of people are under 60 and then picking the mean clearly isn’t going to work.
I'd have read it if it weren't white text on a pink background. I'm not going through the trouble of pulling it up in a browser and undoing what they presumably did on purpose. Then to complain that people don't read the whole thing?
I understood my task as convincing another _human_ that it was randomly generated. Since N was low for each of the tasks, I was deliberate about sometimes having repeated values, and not ensuring that every option was picked an equal number of times, since that looks suspiciously algorithmic. Apparently I'm over 60.
As I understood it, their entire point is that younger people are not better at generating random sequences than older people, so guessing someone's age based on their complexity (randomness) score is completely unreliable.
Towards the bottom of the page they said they've only guessed age correctly 51% of the time, which lines up with there being no correlation between age and ability to generate random sequences
My point isn't the age result that I mentioned. (I believe their claim that it's bogus.) It's that the instruction to click "as randomly as possible" is ambiguous so at best they're measuring an average of the behaviours they think they are.
They are not, these are the instructions from the reproduction
> Tap a sequence of 10 dice rolls. Make it look as random as possible; another person should not be able to tell if you made it up or if it was from real dice rolls.
And this is the excerpt from the study they mention
> Click on a number between one and six as randomly as possible to produce the kind of sequence you'd get if you really rolled a die [...]
I made the same mistake as thombles, the new instructions make it sound like the objective is to trick a human. The original clearly states the objective is to be random.
They are not the same objective, as humans are terrible at recognizing randomness.
> so that if another person is shown your sequence of digits from 1 to 6, he/she should not be able to tell whether these numbers were produced by a real die or just “made up” by somebody.
I have a really hard time rationalizing why you would leave that part out of your quote and drew the conclusion you did. The original task was clearly also about creating patterns that a human would recognize as random.
Picking evenly seems to consistently produce higher "randomness" scores than picking unevenly or using an RNG. I wonder how this algorithm would rank random sequences vs shuffled linear sequences.
The fundamental issue with a randomness metric for sequences is that an idealized independent generator will under-perform vs a constrained generator that excludes low scoring sequences.
I got the same answer as you, over 60, the first time as I was also very deliberate then went back and did it again like a 3 year old, jabbed anywhere and got a higher random result. Maybe there is something to the study?
My idea of what random actually looks like has been affected a lot by generating random numbers with a computer. They just don’t actually look that random.
I read an anecdote about the iPod shuffle (hey kids - it was a music player with no screen so you could not choose songs directly) - they initially set it to be genuinely random in the way it chose the next song - people didn’t like it. It didn’t _feel_ random to pick a song you only just listened to again. So they had to make an algorithm that was sort-of-random but with some constraints to make it feel how we expect randomness to be.
The iPod shuffle thing wasn't really about randomness, it was a UX failure. "To shuffle" means a specific thing. If I ask you to shuffle a deck of cards and give it to me so I can draw them one by one, I very much don't expect you to put each card back in the deck and re-shuffle it before every time I draw a new card.
I mean, that makes sense. What I want when putting a music player to "shuffle" is not "give me something unpredictable" -- that's fundamentally what randomness means. What I want is "give me something new". Something new is not something random, it's something _different_ from before, if reasonably possible.
Not just that though. If you have an iPod filled with 20 albums from your favorite artist and 1 album from 5 others, you wouldn’t be happy even with random excluding previous.
Which to a machine may as well be the same thing in either phrasing. You want something different from what you just listened to. To it, anything not 'that song' is different and 'new' potentially if also not 'just listened to' within a certain set amount of songs. Even without that certain set of songs being logged and considered; any picking of a different song from the last is verifiably random.
Think of it all like a deck of cards. Shuffle is apt in that sense. You don't expect to see double aces each time you pick through the shuffled deck of cards, but sometimes you do. Sometimes, you also find double jacks, queens and kings; in a row. Sometimes you don't. That deck could be shuffled by the worlds best trick shufflers. Still gonna get doubles now and then.
True Randomness is not really technically possible. At least, not with our current technologies available; and we have a lot of aces up our sleeves.
The best we can manage for randomness right now, is creating random strings of numbers to serve as the seed for new randomness. At least, if I understand correctly. If I do, then this is why cryptography is so damn important for us in the computational side of things. Network Security requires randomness.
If I understand you correctly, I think you missed my point. You're explaining how with true randomness, you get different stuff most of the time and the same stuff some of the time. That is true. But it's not what people want when they press shuffle. What people want is something _different_, and giving the same song twice is not something different. As another commenter wrote, giving multiple (different) songs after each other from the same album would even be undesirable, even if that could occur perfectly well with random shuffling.
You would expect a shuffled deck of 52 unique cards. Not a deck of three 5 of spades. Likewise with a playlist: if I shuffle a playlist of 52 songs, I want those 52 songs to be played in a random order. Not for a random song to be played each time but a random shuffle of that list.
In casual language, random means not "uniformly random", but something more like "without a discernable pattern". Playing a song from the same album is the start of a discernable pattern.
I used to be a game designer, and I worked on a lot of games with randomness mechanics and I analysed a lot of player feedback. How people at large perceive randomness is NOT what randomness is, of course. A task to create something that is random, and a task to create something that people perceive is random are two very different tasks.
Spotify had a blogpost about it back in the day[1]. It was based on prior work that's a bit more general[2]. The basic idea was that you don't really want to randomize, you want to distribute.
I've had to deal with a similar issue for a product. We ended up summarising it in a fuzzy way that people's minds have a notion of 'micro' and 'macro' randomness.
In the case the coin flip, all heads or all tails is perfectly fine and will happen in macro random, ie if you took macro to mean > 1 million rolls let's say. But in micro random (the 12 rod we experience in real time) if that were to happen we'd feel uncomfortable and immediately assume it was cheated even if it was a product of true randomness because macro random suffers the same problems as very large numbers of slices of time
If you're forced to pick random numbers between 1 and X in your head, pick instead from a wider range of numbers and then modulo X. Your brain will legitimately have no idea what number you're picking.
e.g. for a range 1-6, pick from 100-250 instead and modulo 6 plus 1.
There are of course brand new biases at play (is your new range cleanly divisible by X?) But it's enough to tamp down the original biases you're worried about
I think you'd be better off taking a small pinch of sand, salt, pepper etc., throwing that on a smooth surface, then counting all the grains and then modulo it (just have the number of grains be >> than the range as in your example). This would reduce a lot of inherent biases, although perhaps introduce others.
At what point do we draw the line and say that these methods are sufficiently random? The task at hand is to come up with a sequence you perceive as random based on the numbers themselves, so adding layers like this seems to go against the concept of the experiment entirely.
How is this different from opening up my JavaScript console and doing Math.random() several times?
This whole experiment has more to do with defining what humans perceive as randomness. Of course, different humans are different. Ramunijan and his 1729 taxi, or a cryptographer studying some (apparently) random string, will keep going when others would not. Interestingly, the same is true for source code, or in the physical realm, looking at the innards of a modern car. It looks random to the layman! But of course it isn't at all.
I'm certainly no expert in randomness generators but the notion has been floating around for awhile that quantum indeterminancy is the best option. Here's a paper on it:
> "The presented above examples clearly show that classic random number generators may be exposed to various attacks, or may have the so-called backdoors. This justifies the need to develop alternative technologies that could replace the classic generators on a large scale. The most promising, because they have a fundamental justification for the randomness in the formalism of quantum mechanics, are quantum random number generators."
This would mean someone asks you to pick a random (integer) number between 0 and 1. That's just a coin flip, there's different/better methods for that.
Sorry, not following you here. How does 16 and 10 sharing a divisor make the mod trick fail? As long as you pick a base that doesn't match the range you're picking within, you should be good to go.
If someone asks me to "pick a random digit in [0x0, 0xF]", then I'll use base-10 as my start. 284. I have no idea which hex digit this will result in after I mod it by 16. It's 0xC, but I didn't know that going into it.
Oh, wait. I think I do see what you're getting at. I would still know if it's even or odd, because 10 and 16 share 2 as a divisor, right?
For any experimental science, the integrity of experiment (thus reliability of data) is important. For experiments with human subjects, the question is whether the subjects answered the questions in good faith. A sequence like 'HHHHHHHHHH' for the coin toss experiment looks like an answer in bad faith; it is mechanically easy to keep pressing the same button, and a subject is unlikely to think that such a sequence is a likely random sequence. Therefore, the replicating authors are fully justified to eliminate those poor faith answers. The original paper authors' claim that 'HHHHHHHHHH' is as equally probabilistic as any other sequence is irrelevant.
This is not a sound approach. You're declaring what humans think random is first, and then throwing out any data that doesn't match your declaration. There is no way to learn anything from this.
I also think 'HHHHHHHHHH' is unlikely to be a good faith response, but if the goal is to actually learn anything instead of merely reinforcing my prior beliefs, it doesn't matter.
You need to find a way to design the experiment that discourages bad faith answers or let's you judge them objectively. Alternatively if you have some outside knowledge about the 'shape' of bad faith answers for your kind of experiment, you may be able to use that to properly adjust your data.
But 'nah I don't think so' isn't an acceptable reason to throw out data. It's especially egregious to do so when the data is answers that are, at a bare minimum, technically correct.
It seems to me that if we reject a subset of experimental samples because they look like bad data (e.g. extreme outlier caused by sensor malfunction) we are still keeping all the bad data we are unable to recognize as such (e.g. sensor malfunctions producing less extreme data), which introduces a bias.
I probably should have clarified that I was responding to the content of the parent comment rather than the submission itself.
I think this is just the slop of language, but in this case it's obscuring all the important details so excuse me for being a bit pedantic.
Forming and accepting a hypothesis are very different things. You can't just come up with a new hypothesis after looking at some data and then immediately accept it because the data supports it.
It would absolutely be incorrect to look at the original data, form an alternate hypothesis, and then immediately go on to suggest it is an equally good explanation as the original hypothesis.
You don't have to accept the original hypothesis if you think the experiment is flawed, and you're free to propose any hypothesis you want, but that's the limit without new data.
Since two people have come to the same misunderstanding, I must have worded my argument inadequately.
Of course review is not the time to accept or form new hypotheses. Neither I nor the author of this article is suggesting that we should accept this new hypothesis "old people are more likely to give bad faith responses" from the data collected for this study.
But review is the perfect time to look for interesting features in the data that challenge the original hypothesis. In this case, it is very difficult for the original hypothesis to explain why older people are only worse at giving random responses in a very specific way: giving answers that are all 0s or 1s.
Although technically, this would be P-hacking. You aren't meant to change your hypothesis post-facto to fit the data. You'd have to conclude no effect, and then design a separate study to determine if age differences correlate with bad faith answers.
It would be p-hacking if we just took the same data to conclude that old people are more likely to give bad faith responses. That is just a possible explanation for the data being offered to reject the original hypothesis.
At the very least, it is an interesting observation that the entire trend line disappears on removing data points where people guess all same coin toss results.
If you're going to exclude bad faith answers, I think you should exclude all of them. But I don't think you can do that. Is HTHTHTHTHT a bad faith answer? Always or only sometimes? We're trying to infer the test subject's intent from their answer, and that's fundamentally impossible I think.
I think including all answers is a solid approach. If test subjects have bad faith, I think that can be filed under 'less random'. If old test subjects show more bad faith, I think it's not really wrong to say older people are less random. And it does have predictive power.
Arbitrarily (because they is no way to do it subjectively) excluding some answers and not others has, I think, a greater risk of skewing the results.
However, the study concluded that person's ability to produce randomness peaks at 25. An increase in showing bad faith doesn't tell us anything about the ability to produce randomness if desired. Thus, if we accept the bad faith answers as part of the data, the conclusion of the study becomes incorrect, at least in wording.
It’s hard to believe the original authors made their argument in good faith. They probably ran the numbers with the filters and saw they wouldn’t have a paper that way.
George Marsaglia suggested a very simple multiply with carry PRNG you can use in your head to generate better quality random numbers. [1]
1. Select an initial random number between 1 and 58. (This is accomplished by mentally collecting entropy from the environment, e.g., counting a group of objects of unknown quantity)
2. Multiply the least significant digit by 6, add the most significant digit, and use the result as the next state.
3. The second digit of the state is your generated pseudorandom number.
I really like the idea of using this when playing for example rock paper scissors (I'd take the last digit mod 3 and reject any 0's, so the distribution isn't as biased).
I believe this is conflating distribution with randomness.
Having played too many games and rolled too many real and pseudorandom dice, i know streaks are the rule not the exception, and that completely missed rolls likewise expectable.
Using this model of randomness, i tried to create sequences that matched. The result is it says i have 60 year old brain. I did the prompt again, but simply /ensured there were no missed rolls/ and i have an under 60 brain?
In reality, i believe the odds of rolling all 6 sides using 10 dice rolls are relatively low... but in order to be "random" enough that seems like a prerequisite??
I had the same issue. I generated random numbers and still got over 60 years old. I had to dive deep into it, and it turns out that the original paper is shit.
They ask participants to make sequences that are as random as possible but they use Kolmogorov complexity as a measure, which (surprise) doesn't actually measure randomness but complexity. Random distributions tend to have lower complexity than what humans generate; see [1] Fig. 4. Here are the complexity scores for different sequences if you want to lose more sleep over this: [2]. At least the authors were nice enough to be open about their data, so props I guess.
Maybe the null hypothesis needs to be that older people have more experience of "random" and therefore expect or tolerate more variance before rejecting "chance".
But they say that the effect is relatively small, or that their age guessing is "not as good as we had expected".
I'd be prone to misinterpreting the question.
If somebody asked me if the sequence "HTHTHTHT" is more or less likely to occur than “HTHHTHTT” I'd be confused. Ok, are they rejecting that they're equally likely out of hand, and if so why? At least subconsciously. If forced to suggest one or the other I'd offer the latter even though I know it's not mathematically correct.
Additionally most of the problems I deal with have to do with probability within a continuous run and in a run of 1000 heads/tails and some quick and dirty Monte Carlo I get:
It would be nice if they had a different page where you could see in real time or after you submit a sample of 10 how random their model thought the inputs you were giving were.
Not about randomness - but about curve fitting. It is actually very difficult to verify non-linear effects -- or maybe I should say the opposite. The statistical tools we use to identify non-linearities are prone to be very noisy, so even in the subset of data including the no-variation responses I am quite skeptical that downward increase is real, or just due to variance in the tails of the data.
So a common social science finding is `* graded effects` where you might not by default expect them, and that is the main headline of the paper. I think plateau effects are reasonable in many situations, https://blogs.sas.com/content/iml/2020/12/14/segmented-regre..., but the noisy data itself often won't be able to clearly differentiate between different curve shapes.
I'm surprised to see this comment not echoed by more people. Besides the inadequate sample size of older participants, my immediate response to the variance in the data was that I doubt the slope could be distinguished from zero.
> As for our initial idea to make an age-guessing game, we have guessed right 51% of the time. Pretty much what we had expected .
They guessed I was 'Under 60', which I am, but over 50% of people fit into the category of under 60... so the fact they can guess this with only 50% of accuracy doesn't really feel right?
I think they're saying over or under 60 is completely arbitrary when judging only from 'random' answers. It could be over or under a month, or 1000 years, the result would still be 50/50 because the ability to create a sequence that appears random has no link to age.
GP is saying it's even more than 50/50. Less than half of people are >60.
A more extreme example would be if it guessed: "You are younger than 99 years old."
I also guess that the segment of the population interested in a self-described "digital publication that makes data fun" probably skews even younger than the population. FWIW I was guessed to be over 60 but am not.
Yeah, most of us here got “over 60” because we inputted data with repeated values, because we know real random data has repeated values. I’m not sure if we overdid it or the study has a weird metric of “looks random”.
That way there's no confusion about how to interpret the instructions, it's just people trying to predict the results of an event they perceive to be random and fair.
When it asked me to create a "good" outcome, I didn't really know what it was expecting. If you roll a die 1000 times, the results are probably going to be evenly distributed, but if you only roll it ~12 times, the results could really be anything. Anyone who has played a board game can probably remember a time when it felt like the dice where loaded.
I felt like making the random numbers too perfect would make it obvious that they were picked by a human, so I purposefully picked a lot of duplicate numbers, and at the end it guessed that I was over 60 years old (I'm less than half of that...)
I'm not sure if guessing the result would be any better. Knowing anything about a fair dice, you might as well just stick to the same result for all tosses.
However, I approached the exercise in the same way as you did, as an adversarial game. I tried to generate numbers that would trick a human who is tasked with filtering out the series as random. No results repeating stuck me as a tell that the series was done by a human, so I included a fair number of repetitions of the same result that's still statistically plausible. Not sure how to work around that in the instructions.
Edit: I think the study might say more about what the player thinks about other's expectation of randomness than about the player's own understanding of randomness.
Speaking about fair dice, I have a vague memory of reading somewhere that as dice get used, dirt builds up in the pips, causing them to be slightly off-balance. I cannot find anything about this online at all, so it might well be complete nonsense made up by whoever told me. I have, however, found that dice are not random, and that 1 is a more likely result than anything else: https://www.insidescience.org/news/dice-rolls-are-not-comple...
> Paradoxically, participants with a scientific background may perform worse at producing random sequences, thanks to a common belief among them that the occurrence of any string is as statistically likely as any other (a bias deriving from their knowledge of classical probability theory), which further justifies controlling for Field of education, simplified as humanities v. science.
Very cool website. Kind of "someone is wrong on the internet" crazy levels of effort! Really I think you can see that there's no trend just by looking at the graph. Always be suspicious of graphs that look like they have no trend by eye but have a solid trend line superimposed.
Would be nice if they defined "complexity" somewhere. I think the sequence lengths are too short to distinguish true random number generators from poor random number generators.
In other words what kind of graph would you see if you used a real coin? Although I guess if you sample over enough people then it doesn't matter.
The way I read "so that another human could not tell" it was not random is that the question is truly asking about luck and probability. I have studied gambling enough to know that runs are the norm and wins are not evenly distributed across the space, I.e the gambler's fallacy of thinking one is 'owed' a certain outcome after it doesn't occur for some time period.
I understand that randomness is not uniform distribution and feel like people who are in similar situations are always going to skew results in some way.
The reason the trend line drops over 60 is that there are fewer people over 60 taking the test and thus, the questionable responses have a bigger influence. If you sample less from that cohort, it will be less likely that your sample is representative.
Yes, while I like the article, it kind of distorts the evaluation in the original paper, where they don't only show the trend line but also it's 95% confidence interval (which very obviously significantly widens with age).
After doing it for real, I tried it a few more times with a cryptographically random shuffle. Interestingly, the computer got a worse score on average.
Method:
I started `irb`, and picked like so:
require 'securerandom'
# Coin flip
12.times { p [:heads, :tails].shuffle(random: SecureRandom)[0] }
# Dice roll
10.times { p (1..6).to_a.shuffle(random: SecureRandom)[0] }
# 10 dots
10.times { p (1..9).to_a.shuffle(random: SecureRandom)[0] }
I disconnected from the internet before the scores were submitted so as not to taint the survey. (Ranking calculation happens offline based on a CSV of scores loaded into the browser early on.)
Results:
For: "Your answer got a higher random score than X% of people in the study."
Me: 46%, 71%, 59%, 72%
Computer: 32%, 20%, 47%, 31%
My sample size is obviously too small for anything conclusive. However, I'll admit it makes me a little suspicious that something else is amiss.
The measure of randomness chosen in this particular paper appears to be an approximation of the Kolmogorov-Chaitin complexity adapted for small integer/binary sequences.
This effectively looks at how easy it would theoretically be to compress/describe the data, for instance HHHHHHHHHH would be low complexity as it could be encoded as '10 H's'.
If something is truly random, it shouldn't be possible to encode it due to the pigeon-hole principle.
> If something is truly random, it shouldn't be possible to encode it due to the pigeon-hole principle.
This statement is obviously untrue.
“Random numbers” don’t really exists. The original authors were right about that. Every number/sequence is equally likely to occur. There’s even an XKCD about this [0].
I guess what you mean is: If you have a process that generates sequences randomly, most of those sequences are expected to compress badly.
They are equally likely, yes. But, the questions here pertain to whether or not an other person would perceive your sequence as random.
It wasn't too long ago there was an article on HN (can't find it now) describing what feels random to people.
Essentially, a sequence feels more random the harder it is to explain.
So, HHHHHHHH doesn't feel random because it's summarized as 8 H's and HTHTHTHT doesn't feel random either because it is HT repeating. Strings that are only really communicable by repeating the string verbatim feel the most random.
But if this is about perceived randomness shouldn’t people also guess which sequence was created by a human vs sequences created by some algorithm?
I understand that to score randomness you probably would create n-grams of the characters and look if these would be equally distributed. But for such short sequences it feels hard to do. Maybe a statistician can explain this?
For me, using my right thumb on a smartphone seems enough to skew the randomness. Just by doing it again with my index finger (after writing most of this comment), I raised it from being “more random” than 18% to 84%.
'Perceived as random' seems like a pretty junk measure of other humans' efforts at producing randomness. Garbage in, garbage out. Surely analysing this tells you absolutely nothing about anything?
I would understand if the actual measurement is not perceptual but mechanised, i.e. "how small can we compress this stream of random choices using our best known compression methods" or something. (But then a stream of 10 symbols is surely not enough to show you the humans.)
It gets pretty theoretical, but basically it's estimating the Kolmogorov complexity by looking at the size of the Turing machine required to generate a particular string, rather than Shannon entropy or implementations of common compression techniques.
Yes the visual layout and input method is an interesting bias. If you imagine drawing a line between all choices on the number pad (which I do), I realised my answer avoided doubling-back on itself. My 'random' shape looked more like a nicely distributed squiggle, which is less random.
> But if this is about perceived randomness shouldn’t people also guess which sequence was created by a human vs sequences created by some algorithm?
It's about what the test person think another human will perceive as random so there's a layer of indirection there. If these guesses you suggest would help the study or not I really can't say.
> As for our initial idea to make an age-guessing game, we have guessed right 51% of the time. Pretty much what we had expected .
I don’t know what the “guess” was for others. But for me it guessed “are you under 60?” If that’s what it’s doing guessing above or below 60, then I think it’s amazing they, only getting 51% right. I would expect that a strategy of ignoring data completely and always guessing under 60 would be significantly better.
By posting this study here on hacker news, you will somehow get a bias in your study. Here, we are usually very technical people. We work with randomness, we studied it, implemented it or proved something about it.
Many have studied it in university and have a different understanding of randomness than non-technical people.
Therefore I would be carful when evaluating this study when a lot of the participants of this study came from here.
A mathematical model sometimes used for shuffling cards is that after the cut, as you joke N cards in one hand and M in the other, the next card to fall has N/(N+M) chance of calling from the hand with N, and M/(N+M) chance from the band with M. This is used to come up with results like 8 riffle shuffles being needed to randomize a deck of cards.
I sometimes thing about how this model gives an extremely small but non-zero chance that you drop all the cards from the left hand before the right and end up with no shuffle at all. Or just one or two cards fall before the other hand. And I'm sure this has actually happened to people!
But if you're counting your shuffles trying to get to 8 to call it randomized, nobody would count that shuffle. They'd redo it.
I don't think anyone submitting HHHHHHHHHH or TTTTTTTTTT does so thinking they've generated a random sequence.
Interesting and fun. I did the test myself and then I wrote a program on OpenBSD to read /dev/urandom and used that in the test. I got similar results to what I picked manually.
So I do not know what they think about the test, I would have expected the utility I wrote to come up with different results than I did.
I was trying to find a mental way to do a true coin toss. Anyone has ideas for how to truly dig into some randomness? Maybe most would think it's impossible, but aren't we at least better placed to do this better than deterministic machines (or maybe we are not - the true free will debate :)?
If you have two people, you can have person A ask person B to “pick a random number”, and use heads for odd, tails for even. Don’t tell person B why you’re asking and my guess is you’re relatively random heads/tails. No studies that I am aware of to back this up, so people could be biased towards odd/even, but a bias correction could correct that too.
Look around the room for objects in sight. For each object, take its common name and count how many of the letters are "odd" letters acegikmoqsuwy, then mod 2. "Window" -> "wiow" -> 4 -> 0. Each word yields a single bit of very slow, pretty good entropy. Don't do this in the same room twice.
I suspect there's a strong bias in vowelCount % 2. A quick look at English 100 most common [1] has 1's at 73% and 0's at 27%. That would even out with longer words but I wonder how much. Maybe there's something else that's just as easy with less bias?
All sequences are equally probably. But that's not the thing being asked to distinguish. It's whether a human being produced the sequence. Humans choose patterns. Distinguishing between 'random chance' and 'patterned response' is much, much easier statistically.
I haven't looked at the details of the CTM method for calculating complexity (because the math symbols didn't render on mobile) but from what I can tell in the two state coin flip case more state switches in the sequence is more complexity? That's not 'more random'.
Very early on in their explanation of the experiment, I found myself asking "What is random?" I'm sure one of you smart mathematicians has a learned perspective on this, but to my thinking, a whole lot of hairless apes sitting behind their keyboards randomly beating patterns for entertainment is pretty random. So to say that one hairless ape is more random than another in their keyboard pounding is pretty subjective. I guess I'd further posit that a person over 60 might not really give a shit much so their keyboard pounding might be less enthusiastic compared to the 20-something. Is that really measuring randomness or something more like physical dexterity in mouse movement and keyboard pounding?
Very interesting, and I think this part sums up the crux:
“The researchers believe that you can only analyze the raw responses because, statistically, any sequence is equally likely to occur, so where do you draw the line?”
I’d say that, as it is a psychological study, making claims about a human behavior, treating humans as pure random number generators without considering _intent_ is a mistake.
It is entirely possible that older people fill in more “questionable” responses because they can’t be bothered with the study, and that this causes the “decline” in ability as people age.
But we don’t know for sure, because it was never investigated. Thus, the biggest problem is the original study not even bringing this into light, even though it appears the original authors were aware of it.
I think they are being overly kind. The conclusion itself shouldn’t be sensitive to just removing the all H if all T answers. Since their trend disappears from removing just those answers, were only left with the far more mundane “older people are more likely to write all Hs or Ts”. The true conclusion was hidden by the averaging that goes on when you make a best fit line.
One thing many surveys/studies do is to include "trap" questions (I'm sure there's a real name for 'em) which disqualifies any participants that answer them incorrectly.
- that site links to some other articles in its bibliography at the bottom, and mentions using "algorithmic probability" to approximate complexity: https://complexitycalculator.com/methodology.html
Your strings are more complex if there are fewer Turing machines that produce it, which is then normalized to the average of all strings to become "randomness". Too few or too many ways to create a string means less random, within this category of simple and very small Turing machines (otherwise it'd be impractical to compute).
I have no idea how broadly applicable that metric is though. It seems fairly niche to my complete amateur reading... but AFAICT all Kolmogorov complexity measures are extremely niche, as it's extremely sensitive to what the execution environment is.
But this was still an interesting rabbit hole. Figured I'd share:)
I see how the need for this profession is just increasing year after year, thus I see bright potential in this direction. When I was required to do my academic assignment, I looked for online buy assigmnents UK https://www.essaywritinglab.co.uk/buy-assignments-online/ because my goal is to demonstrate that I am knowledgeable about the subject and that I will be an excellent nurse. The firm did an excellent job, and I gained valuable experience working with actual professionals. My teacher praised the quality of my work, and I was able to keep my position of power.
This reminded me of a video I watched recently about the most common form of cheating in Magic: The Gathering: stacking the deck.
The video, by NuxTaku, did a great job of explaining how this method of randomizing the deck actually leads to a distribution which is uniform rather than random and which in turn is effectively cheating.
I haven't played MTG since the early 2000s but the video struck a chord with me because I remembered using this method and not even thinking about it (in my defence, I was a teenager) as being a problem.
So yeah, reading this article I was instantly like "er....yeah...don't ask people to make random sequences because they often have an incorrect idea of what that actually means".
I think you are conflating a little bit randomness and complexity.
Scott Aaronson in "Quantum Computing since Democritus" cites a study that shows that when people are asked to generate a random sequence the sequence that they write looks _more complex_ than a truly random sequence. For example, when generating a sequence of coin tosses people would try to avoid long sequences of heads of tails, making the probability of those sequences lower than in a truly random sequence.
I decided to try and test this using your game. I generated the results for all three tests using random.org, and ended up somewhere between 10th and 20th percentile of complexity.
> Sure, a probability of having many tails continously is low but how is it not random?
Randomness is never about the output; it is a property of the source.
So what is not random, is that the frequency of people that output repeating sequences is higher than the quotient of that sequence among all possible sequences of the same size.
The issue of the initial paper is that it claims to conclude that people aged > 60 are less random, when they simply output a fully-repeating sequence with a slightly higher rate (which is not surprising taking into consideration that there are much, much fewer of them).
P(all heads | bad faith) ≫ P(any given "random"-looking string | bad faith)
therefore
P(bad faith | all heads) ≫ P(bad faith | any given "random"-looking string)
So if you want to exclude bad faith responses, the best strategy (by Neyman–Pearson, if you want to think of it that way) is to remove "all heads" responses (and similarly for "all tails").
This is a key point in the article. The researchers (original) did not want to throw out the data that doesn't "seem random", this group argues in favour of doing so.
> so that if another person is shown your sequence of digits from 1 to 6, he/she should not be able to tell whether these numbers were produced by a real die or just "made up" by somebody
This explanation leads me to think that the decision of what is random in the study is based on human perceptions of randomness, not actual statistical randomness. Although any sequence is equally (un)likely to be rolled, 1111111111 would stand out from the other sequences much more than 3156263441.
No, they're just saying that no person who was following the instructions would produce a sequence of all heads or all tails as a "random" sequence, so they're throwing out those two specific sequences.
Look at the graph of responses. There are a few clear clusters and lines outside of the main, statistically random cluster. Those other ones can be dropped.
Those regression lines are absolutely laughable. Point cloud has line going through it, and beforehand I have no clue where the line will be (other than that it will go through the center, presumably). Even if there is a statistically significant effect, its meaning will be a rounding error.
I sort of knew before going in to the experiment the "trick" that most humans perceive randomness wrongly. So I knew coin flip sequence "hhhhttthhht" is more "random" than "hthhtththhth", which most people would choose.
I guess my knowledge kind of ruins the experiment. Note I chose the first sequence because I know, but I have to battle with my intuition which urges me to pick the second (wrong) option.
> We believe that there are obvious responses that are candidates for removal that make the data more true. The researchers believe that you can only analyze the raw responses ... filtering any data would be tampering with the results
interestingly millikan was accused of removing observations from his oil drop study to reduce the error in the electron charge. (but from like 2% to 0.5%, both far better than previous uncertainties)
The more surprising part to me was that quote about the 2015 study that tried to reproduce 100 studies and managed to replicate 39. That's honestly way better than I expected for psychology.
The data that this study will gather depends hugely on what internet forums it is spread around. I would guess that HNers would have very different results than a randomly-selected sample of the population. I don't see how a study that gathers data via internet virality can possibly credit or discredit a theory like this.
I'd imagine it's hard to prove randomness. Yes, HHHHHHHHHH can be chosen randomly as equally as HHTHTHTTTH. I don't remember seeing anything in the article speaking to the statistical significance of the data...though, I'm not a scientist, so maybe it's inferred from the discussion of the results.
Not a scientist but perhaps one can help me understand, wouldn’t posting this to a forum with many people interested in software add potential for selection bias. I’m guessing that people like this are more inclined to understand what a “random” sequence might look like and therefore skew the results. If not, why not?
> As for our initial idea to make an age-guessing game, we have guessed right 48% of the time.
For me, the guess was just "under 60". This seems pretty coarse (I'm 40, so it was correct). Were they any more granular with other people? Or were they only 48% of the time, even with this very coarse prediction?
It seems to me if you correct/don't correct for people who didn't follow the directions, what you've actually come up with is a graph of how well people follow directions, which peaks at age 30. I don't feel bad for being 57% random at age 58. 8)
I guess my ability to follow directions is about to nosedive.
This is what surprised me the most. It's admittedly been a long time since I took a stats class, but the trend line hardly looks like a trend. There hardly seems to be any correlation at all on either of the two charts with trend lines
Since this is a study about the predictability of humans, perhaps “less random” should be defined by how similar or predictable a sequence is based on the dataset of human inputs.
When our concept of random is so precise as to permit the exclusion of all results that do not conform with the image of randomness, is there anything random about it at all?
Confounding factor - I'm using a trackpad, so I'm more likely to click things next to the things I just clicked. My randomness is mitigated by my laziness.
Hm I'm curious about the complexity measurement for the coin tosses. How does it work?
There is suspicious clustering in the plot, at complexity slightly lower and higher than -1. Complexity -1 is also more sparse than seems reasonable from the shape/density of the cloud.
I think, that's a common misconception about randomness: It doesn't mean the number has to change with every roll of dice. On the contrary the longer the sequence large clusters of the same number will occur.
It's maybe not so counter-intuitive as the Birthday paradox, but maybe already showing how bad our intuition is at grasping randomness. Initially I had hoped about research on that, which is probably very hard. - Or not? Couldn't you look at the n-gram distributions of the data and look how much that is "random", i.e. are people avoiding clusters harder than they should?
It picked me as under 60 though I am actually over. Does it give any more granular age guesses or is it stuck on over 25, under 60, over 60? I may retake this in the morning to see if anything changes for me or for the test.
> As for our initial idea to make an age-guessing game, we have guessed right 51% of the time. Pretty much what we had expected .
Yeah... you thought I was 60. Seems from the comments this is a common thing.
You might want to check your algorithms. But then again, you do say in the end of the results that you need more 60+ year olds to help make this more accurate.
Also, a bone to pick. You claim that people get less random as they get older over 25, with 25 being the peak. I would wonder if maybe that has some correlation with the brain finally fully developing from adolescents into true/full adulthood. (Remember folks, we do call people 'young adults' for a while in their 20's.)
Also, while you make that claim about people as they get older, I still managed to get a coin flip result that was 13% more random than others at my age group of 33. Or something like that. I forget how it was worded exactly off hand this moment without going and checking again in my history. My point here is this.
If randomness declines with age past 25, but my score at age 33 is 13% more random than others in the same age range; then is it truly declining for everyone equally or is it just some people more rapidly than others?
I think this maybe correlates potentially with the findings of the trend disappearing once the non-random data is removed. (The all heads/all tails results.)
Anyways. With all this said, I do agree you need some more participants above the age of 60. Have you considered using facebook at all?
Did we read the same article? They aren't claiming those things at all. Those are the claims of the original study, which are being disputed by this attempt at reproduction. The writers suspect those claims to be false due to the choice of the original study to not remove likely intentionally non-random data.
I believe the 13% stat you saw is that your score had a higher random score than than 13% of other participants, so not very random.
That instruction is a flaw in the experiment. It's always impossible to tell, for any given sequence, whether it was produced by a fair die. There's nothing an experimental subject can do to make the impossible more impossible.
> the kind of sequence you’d get if you really rolled a die
Well, there is no such sequence. The instruction is incoherent.
You could take it as meaning "Construct a sequence that you think will convince others that it came from a random source". That would be coherent. And then it would be legitimate to eliminate responses that were all-heads ("clearly didn't even try"). But then what are you measuring? The comparative understanding of older and youger people concerning random sources, or the Gambler's Paradox? Their comparative expertise in human psychology? Their comparative willingness to move the mouse-pointer over the screen?