Quoting Donald Knuth / The Art of Computer Programming:
> “Many random number generators in use today are not very good. There is a tendency for people to avoid learning anything about such subroutines; quite often we find that some old method that is comparatively unsatisfactory has blindly been passed down from one programmer to another, and today’s users have no understanding of its limitations.”
//
// This class is used to generate a stream of pseudorandom numbers. The class
// uses a 48-bit seed, which is modified using a linear congruential formula.
// (See Donald Knuth, The Art of Computer Programming, Volume 3, Section 3.2.1.)
It's still not entirely clear why this particular algorithm was chosen (it'd been cool if some explanation had been given in this blog post). I suspect the reason was related to the fact that, in javascript, you can only multiply two 16 bit numbers without fear of loss of precision (without doing a fair amount of gymnastics), and they wanted something they could easily port into pure javascript to be used as a optimizer test. As partial support for that theory, the new code has now been put back almost entirely into the C++ layer.
Then again, the author might be looking at that commit and wondering "Why did we pick that algorithm?" :)
> here was the comment above the previous implementation
That's about a different RNG than the one that Mike Malone's article was about. The one you're pointing to was used on the C++ side of the world (and so doesn't have the concerns of JS numbers), the one the article was about was pure JS.
Part of the V8 update was to unify the RNGs (see the patch history), so now both use the same (better) RNG. On the JS side, the new C++ RNG is now used to fill a buffer of random numbers to amortize the cost of calling into C++ code from JS.
As for why these implementations were chosen: Mike Malone covers the commit history and probable source of the JS implementation fairly well (which is probably all you'll really get since it's been 6 years or so now). The old C++ implementation is relatively recent, with some hints to its choosing here: https://github.com/v8/v8/commit/eb381b9444c6b1ec78414d1c9375...
Inaccessible rather than irrelevant. I have yet to meet an intermediate to advanced programmer who would not be improved by at least trying to get through parts of TAoCP. Sadly, I have also yet to meet more than a handful who have.
Probably the source of the shuffle (though it was not mentioned, its visualisation was used) considering the old implementation had been there for about 6 years (and had been partially busted until a few weeks ago)
This has been pointed out to us, and having understood the
problem and after some research, we decided to reimplement
Math.random based on an algorithm called xorshift128+.
Do be sure to note the mention that the new algorithm is not considered a cryptographically secure PRNG.
Also because the specification has very little by way of requirements in that regard, no matter how good some implementations may be you should always assume you code may end up being used in an environment where Math.random() is no better than the worst generator you can think of.
If you need specific properties in your PRNG then you still need to provide something in place of Math.random().
This is the most important point. I didn't like that "TIFU by using Math.random()" article because it felt like it was deflecting some of the blame by focusing so much on the details of V8's PRNG, when it reality it's 100% his fault for not using a CSPRNG (in a gambling application, no less).
His application was using the RNG to avoid collisions in high performance random ID generation.
Without knowing more about the system, it seems like a good case for a good non-CSPRNG.
This is good to think about! Often when programming games that need certain properties, I'll write my own toy PRNG, but I'll admit I'd love to learn more about the craft.
Would you have any suggested reading material on the topic?
For games, the most important features (other than the features of an RNG that all applications desire, such as good performance, high period, etc.) are being able to seed the state, and save/restore the state (required for save game support if you need seeding and want that to work after they save and restore the game).
For a modern game, is there any particular reason why 'seed = SHA256(seed + "reseed"); return SHA256(seed + "value")' isn't a good enough, performant enough RNG?
I don't know enough about SHA256 to say whether or not it's good enough, I'll assume it is.
Performant enough strongly depends on the game. However looking at the implementation of the algorithm on wikipedia[0], that looks extremely slow by RNG standards. Most RNGs finish in a handful of instructions, but building the 'message schedule array' takes 64 steps, and the 'main compression loop' also takes 64 steps. Consider that to reduce these to a range you'll have to throw out iterations (you need to do this with any RNG to avoid bias), you're looking at a lot of wasted time.
Maybe you can afford to waste that time, but given that there are much simpler and more efficient RNG algorithms, I'm not sure why you wouldn't use one of those.
As long as your initial seed is sufficiently random that would be perfectly fine for a game, the bits returned should be unpredictable enough that a pattern shouldn't be spotted (assuming you use a hidden value rather than "reseed" being a fixed string - otherwise someone with a rainbow table for your chosen hash&salt will be able to predict the generator's output. Also you know the output for a given start seed which is good for testing purposes.
Of course you need to make sure the initial seed is sufficiently random otherwise you'll risk generating the same sequence each time, which could make this a chicken->egg->chicken situation!
> > For games, the most important features (other than the features of an RNG that all applications desire, such as good performance, high period, etc.) are being able to seed the state, and save/restore the state (required for save game support if you need seeding and want that to work after they save and restore the game).
Given that restriction, what's wrong with the generator I suggest?
No, it's a pseudorandom generator. It's a key attribute of a cryptographically secure function like SHA256 that its output is indistinguishable from random bits.
> You are just returning a different representation of the seed you already have.
I'm returning a function of the seed, yes. That's the whole point of a seeded PRNG: that seed(x); rng(); rng(); rng() will always return the same sequence of results. This is used for example in procedural generation, to ensure that a dungeon level or map chunk is always generated the same way (a cool hack, then, is to just store the seed instead of the level itself, and rerun the level-generation algorithm with the seed when one wishes to regenerate the level). In such an application one wants the appearance of randomness, but with repeatability.
Which was what the original post asked for.
What I suggested is not secure for generating keys or for other cryptographic purposes.
Yes, if you fed the output of a SHA256 calculation back into the algorithm repeatedly you would get a pseudo-random stream. I think that is what you intended. Your original post was not clear on that and I thought you were just taking a value and returning a mutation of it using some unknown "reseed" and "value".
No, it is a property of all hash functions - if you reduce input of arbitrary length to few bytes there will always be short circles. It is why you should always think twice before feeding output from one hash function call to another as the set of possible output values might be smaller and smaller with every repetition.
The Art of Computer Programming, Vol. 2, (c) 1969, by Donald Knuth, has quite an extensive discussion of random numbers, their properties, algorithms, and tests. It's about 160 pages in my edition. It's very readable, and even if you skip all the exercises you will still get a very good grounding.
I don't know how much has changed in his more recent editions. And I don't know how if there are better books. But, in general, you can't go wrong by starting with Knuth. He's one of the all time greats!
All true. And the 'specific properties' you might need might not be cryptographic security. For example, I did some stuff recently in Javascript with procedural generation that made heavy use of random numbers. I used my own implementation of a mersenne twister, rather than Math.random, precisely so I could have predictable long term and cross-engine repeatability of the random number sequences generated based on the same seed.
So how good does the default random number routine in JavaScript really need to be?
It seems to me that defaulting to a non-CSPRNG these days is a premature optimization: for many purposes a decent CSPRNG (e.g. Fortuna) is fast enough, and avoids all the pitfalls of a non-secure or poorly-random generator.
Maybe it's time to have Math.random and equivalents call a CSPRNG, with a Math.insecurerandom when performance matters?
Are there any use cases for which a good CSPRNG, with a fixed 256-bit seed at process startup from /dev/urandom or equivalent, is not fast enough?
Last time this came up on HN, I measured individual reads from /dev/urandom, and those are fast enough for many large-scale applications, like one per tweet: https://news.ycombinator.com/item?id=10608843
(Obviously there are bad CSPRNGs, but there are also bad non-CS PRNGs. So let's not count those.)
One major issue is much of the "fast enough" problem is about benchmarks, and possibly even microbenchmarks. When a popular benchmark (or some rando's microbenchmark) really just exercises the RNG, the headline will still be "$FOO IS SLOW AS MOLASSES AND THAT'S BULLSHIT" not "$foo courageously uses a non-garbage RNG by default despite some performance impact in RNG-heavy scenarios". So the primary incentive for language/stdlib developers is to have a fast rng, not to have a good one. They probably aren't going to sabotage the rng entirely, but given the choice between "fast and same as everybody else" or "excellent but not as fast" chances are the first one will win, even if it's crap.
Hash functions are supposed to be fast, though. (If you need something slow like a keystretching function, you iterate a hash function, and you pick the number of iterations based on wall clock time on a representative machine.) BLAKE2 is competitive with MD5 and way, way more secure, for instance: https://blake2.net/
First, apparently that was unclear and I left it unspecified so note that my mention was intended about general hash functions (as you'd use in a hash table, which could easily bottleneck a benchmark without its developer realising it) not specifically cryptographic ones.
Second, so are number generators and more or less everything else (aside from KDFs), the issue is none of these are only supposed to be fast, and while they should be as fast it's usually more important that they're good, yet the incentive for language developer is to ignore good (whatever that is for the domain) and focus solely on fast. That is what my comment was about.
Check out Random123, an implementation of counter-based random number generators. It won best-of-conference at the ACM Supercomputing '11 conference.
Basically the idea is that you have a seed which is used as an encryption key, and then you encrypt a counter value to produce an output. Using the next counter value represents the second output from the generator, and so forth. Different seeds produce different output sequences. If you accept that the encryption algorithm works properly, then the output is uncorrelated for a given counter sequence or seed sequence (although seed sequences are slightly less random than counter sequences).
You don't need to store state, and you can seek to an arbitrary position in the PRNG stream in O(1) time simply by recomputing the output for the desired counter value. For non-cryptographic applications, you can derive the seed value from some unique property of your dataset (eg id number of an item being operated upon) and then you don't even need to explicitly store the key value.
On AES-NI compatible hardware the cryptographically strong varients outperform even crushable PRNGs like Mersenne Twister. These are a fantastic choice where you have that acceleration. Your computer is built to do it really fast, and this lets you utilize that.
They also offer non-cryptographic but crush-resistant variants that roughly double that performance. These algorithms parallelize really well and those advantages are even more significant on GPUs, where memory is at a premium and cryptographic acceleration is usually not available.
Somewhat off-topic I guess, but aren't PRNG, hashing and (symmetric) encryption in some sense all the same thing?
If you have some magic, keyed mixing function that permutes a block of data without leaking information about the key, it seems like you can implement any of the three operations.
In oversimplified pseudo-code:
def encrypt(data, key):
counter = 0
for block in data:
yield block ^ mix(counter, key)
counter += 1
def generate(key):
counter = 0
while True:
yield mix(counter, key)
counter += 1
def hash(data, key):
state = 0
for block in data:
state = block ^ mix(state, key)
return state
And similarly, if you have any one of those functions you can implement the rest. Xor with an unpredictable stream of numbers is a good encryption method, encrypting a string of zeroes should give you unpredictable random numbers and the hash can be used as the mix function when implementing the others, etc.
Your keyed hash has the following vulnerability: if I want to create a message with a given hash, I can simply take an arbitrary message, compute target_hash ^ mix(current_hash, key), and append that block.
More fundamentally it's keyed, and finding a way to make this work for an unkeyed hash is somewhat more complicated. But yes, I'm pretty sure that a secure stream cipher and a secure deterministic CSPRNG are basically the same thing.
They're all kind of the same idea. As you note, each has some magic mixing function to get some output that's (hopefully) entirely uncorrelated to your input, but the specifics are a bit different.
When you encrypt something, the output is the same size as the input. The key is not mutated. You have to be able to go the other way if you know the key (for symmetric encryption).
When you hash something, the input is variable but the output length is fixed. Typically this is one-way - you should not be able to reconstruct the input for a given output.
When you have a stateful PRNG you get a fixed-length output but you have some internal state that gets changed with every call. So its inputs are generate(state) and you yield output and newstate. Going the other way is possible in theory but not typical.
The state for PRNGs is usually larger than the amount of output yielded, sometimes by quite a lot. E.g. Mersenne Twister has about 2.5 KB of state but yields only 32 bits of output per iteration. You generally want to store newstate every time, because setting them up is often expensive (MT takes a lot of calls before it's fully initialized and providing good randomness) and iterating through N-1 steps every time can take a very long time in a long-running program.
Usually you want PRNGs to be fast (but strongly random), but encryption and hashing should sometimes be slow (eg to protect passwords in a DB, or as a proof-of-work as in Bitcoin/Hashcash). It all depends on what role you're using them in.
You can sorta compose them into each other - like you can make a stateful PRNG by doing a hash on some internal state and then mixing some of it back into the state (eg like RDRAND). Or by encrypting a counter with a seed to make a PRNG (like Random123). Here be dragons, though - there's nothing guaranteeing your hand-rolled encryption algorithm will actually resist an attack, or whatever. So stay with a known, tested implementation.
Ah, that makes more sense, but I'm also used to cryptographically-secure hashes being used as general-purpose hashes. :) https://www.131002.net/siphash/
It's complicated. For starters, if you're looking for a _secure_ PRNG most practitioners would strongly advise _against_ the system you described[1]. There's a difference between a cryptographically secure algorithm and a secure system. For the overall system to be secure you need to make sure the seed data is secured, which you're bound to fail at unless you really know what you're doing.
Providing a "cryptographically secure but not really secure" alternative implemented in user space alongside a "really secure" alternative that's been thoroughly vetted would likely just add to the confusion. If you _really_ need secure numbers you should be using the really secure one. The really secure one is almost definitely going to require a syscall worth of overhead, at least, so it's _definitely_ going to be slower than a PRNG like xorshift128+ which literally takes a single digit number of CPU cycles.
Now to answer your question, if you're just talking about a CSPRNG algorithm they're probably fast enough for almost every use case. I think there's still an argument for some things like array shuffling, but those implementations could go out of their way to use the non-CS algorithm as an optimization. Unfortunately in Javascript land there's aren't many batteries included, so if you switched Math.random() to use a CSPRNG most people would end up using it for trivial stuff like shuffling arrays. If you're talking about a secure system... it's probably still fast enough for most things, but it's less obvious.
The benchmarks I linked to imply that even getting your random numbers straight out of /dev/urandom is fine, even for the "trivial stuff".
On my current machine (Chrome 47, new-ish MacBook Pro), it takes me 4 seconds to run `for (var i = 0; i < 1000000; i++) {Math.random()}`. Each call to Math.random() generates an 8-byte floating point number, and it takes 0.6 seconds to `dd if=/dev/urandom bs=8 count=1000000 of=/dev/null`. If you don't want to buffer, and switch it to a million 8-byte reads, it goes up to 1.5 seconds. I don't think the system call overhead is a problem here.
Eh, I'm not sure where you're getting your numbers. Here's a quick test using node 0.10.40 on my machine:
mmalone$ node
> var f = function() {
... s = Date.now()
... for (var i = 0; i < 1000000; i++) {Math.random()}
... return Date.now() - s
... }
undefined
> f()
3
> f()
1
> f()
1
>
So I'm getting a couple milliseconds for 1,000,000 numbers. I'm also toying around with a PRNG package in Go currently so I have some numbers there too:
Go's crypto source pulls from urandom. This is more like what I'd expect in terms of performance. The difference is orders of magnitude, but we're still talking about microseconds so I think your argument is still valid. A proper CSPRNG is not "as fast" but it is probably "fast enough" for most use cases.
Probably me not actually knowing what I'm doing. :) I was testing in the Chrome JS console, and if I define your function, it's way faster than the standalone for loop.
But yeah, I think my underlying argument is roughly that if you do a naive / obvious shuffle of a list in JS using Math.floor(Math.random() * list.length), given the inherent overhead of manipulating a JS list, you aren't going to notice the overhead of a proper CSPRNG. I'm not totally sure if it's true, but I think it is likely to be true.
"good CSPRNG, with a fixed 256-bit seed at process startup from /dev/urandom" does describe a secure system. It doesn't take a syscall, and it's not fake-secure.
The failures come when you try to seed in userspace. Not when you use urandom to seed a standard algorithm, with no customizations.
To be fair, the failures in that article are from userspace CSPRNGs that got even that part wrong. Debian's problem was commenting out the lines that seeded the RNG from /dev/urandom, and Android's problem was developers that used the RNG unseeded. They trouble wasn't re-seeding -- they weren't seeding in the first place.
However, if we are going to decide as an industry that we need a userspace CSPRNG, it's not that hard of a job to write a single, high-quality, cross-platform implementation of a seeded-once CSPRNG that introduces no vulnerabilities beyond the kernel CSPRNG.
You could do that, but the code to do something like create an instance of ChaCha20, fill the parameters from urandom, and pull numbers from it is a handful of trivial lines.
It's not just the initial seeding, you also need correct reseeding, you need to make sure the state doesn't leak somehow, and you need to make sure you're actually getting the seed data securely. If you're doing something that's crypto related there really are a lot of things you can screw up in user space. Using urandom directly is your best bet. It's not fake secure, it's just non-trivial to get right.
Incorrect. You may be thinking of the "/dev/random vs. /dev/urandom" argument, which says it's stupid to _block_ on reseeding based on some arbitrary heuristic (and which I agree with). However, that doesn't mean reseeding is unnecessary. Even a CSPRNG cycles and you need to reseed before it does so or the wheels start to come off. Generators become less random looking even before cycling. The general rule is that you shouldn't use more than sqrt(cycle length) from a single generator.
If you use a secure n-bit block cipher in counter mode, for instance (generally considered a "good CSPRNG") then the generated numbers start to become distinguishable from random after 2^(n/2) generated blocks due to lack of collisions. This is a weak statement, but the point is any proofs that assume randomness (likely all proofs of hardness of crypto reversal) no longer hold. If you don't reseed before this point then you're doing it wrong. Other desirable characteristics of CSPRNGs, like forward secrecy, may require periodic (or frequent) reseeding depending on algorithm.
Best practice for a source of secure "randomness" is to use an entropy accumulator that finds as much "true entropy" as it can and constantly mixes that into a pool (while protecting against injection attacks), then to use a good CSPRNG algorithm to "stretch" the available entropy since true entropy is a scarce resource. That's what urandom and OpenSSL do, for instance (see Yarrow/Fortuna for a good design).
A lot of thought has been put into the design of urandom. Pulling one random seed from urandom and stretching it forever in user-space is _not_ equivalent. You could probably design something in user-space that _is_ as secure, but it's not trivial.
If you're using 384 bits of state as with ChaCha20, you do not need to reseed for quality reasons.
Adding in more is nice of the kernel but not necessary. If the pool doesn't already have enough entropy to last the lifetime of the computer, I would prefer not to trust it yet.
Oh, one more thought since I didn't really address state leak. Let's pretend we're in a C-like language that is not memory safe. You're not reseeding and you don't have forward secrecy. Someone finds a buffer overflow and manages to get your CSPRNG state. Now they can recover the sequence of randomness you've handed out in the past (maybe they're access tokens, for instance). It's not a trivial attack, but it's plausible.
If you don't like the buffer overflow example, let's make it a misconfigured website that leaks the state in an error message or stack trace served to an end user.
The simpler design is probably to just fill a pool of entropy from urandom in user-space and give out numbers from that pool, thus amortizing the syscall overhead. Even that would be not-quite-as-good-as-urandom (still more that could go wrong), but it's probably as close as you can easily get.
Yes please. It's pretty silly and dangerous to have the most accessible random number generator not be cryptographically strong. It's really not that much of a cost performance wise.
Safe defaults, make the developer go out of their way to use something less safe if they really know what they're doing.
I'm the average Javascript programmer. I'm writing some web app and I need to do some cryptography. Let's see, there's Math.random and Crypto.getRandomValues. Let's go with Math.random, for shiggles. I'm probably copy-pasting from Stack Overflow.
Will renaming a function or two fix my bad behaviour?
The answer to "Should I use Math.random or X" is neither! I should be using a cryptography library like SJCL or tweetnacl-js:
Probably not to TLS, but TLS is typically solving a different problem than the one people are trying to solve using JS crypto.
One valid use, I think, could be around ensuring you don't pollute your backend with user data to avoid liability.
For example, if you wanted to use a distributed database in your backend and needed guarantees that when you delete data it is deleted everywhere at once, you could use JS crypto to do client-side encryption of the data and only have to delete the key to render the distributed data inert.
When you (or the recipient) don't want the recipient to be able to read the data, but only to store/transfer it somewhere.
In some cases its impossibly/impractical to create a direct connection between 2 computers: you must use some kind of relay. If you don't want the relay to comprehend the data then you can use client side crypto on the payload. TLS still matters because it prevents 3rd parties from peeking at the communication.
Imagine a secure chat app where you want to guarantee that only you and your friend can read the communications, but want to be able to send offline messages to each other that get delivered upon connection.
Well, in his web app example above, it's not - but these days more and more clientside apps (which remain on the client - e.g. mobile apps built with React Native) are implemented fully in JS.
Also, installed browser extensions are download-once and may need cryptographic functionality.
I can name exactly one. Cyph (cyph.com) accomplishes secure E2E encrypted messaging by pinning a bootstrap in the browser that validates and executes cryptographically signed packages.
Right, well, Ptacek and Kaminsky have hashed and rehashed this argument over and over ad nauseum in a different context. I don't think the community will ever agree on this.
>"Please keep in mind, if you find areas of improvement in V8 and Chrome, even ones that—like this one—do not directly affect spec compliance, stability, or security, please file an issue on our bug tracker."
Worst idea ever, this not-a-real-bug got a correction in just a few days without even being in the bug tracker, while there are real bugs stalled for years in the tracker. Writing a blog post and making a lot of noise on the internet works way better than using the bug tracker.
This RNG problem had a bug report with hundreds of comments where a v8 dev downplayed its importance and insisted it shouldn't be fixed. So, bad example.
Where is that? There have been other issues where this has happened (like the infamous trig optimization bug), but I'm not aware of one for Math.random().
It's also worth noting that a good number of people (like [1]) outside of the V8 team were also skeptical because of lack of specification for the RNG, speed downsides to switching, and the availability of a CSPRNG in the browser and node. You can argue that they're wrong, but that's not immediately clear, Bug reports sometimes become that site of those arguments, and sometimes even the right arguments end up losing a round and having to come back in when the right people are noticing. Working as intended.
Regardless, the blog post or bug report dichotomy is a false one anyways, as obviously you can write a blog post and file a bug (as Mike later said he meant to). It's not like he was assailing the V8 team or something. The post was written a little dramatically, but he was engaged with the team in the followup bug (filed by a V8 dev) and elsewhere.
> this not-a-real-bug got a correction in just a few days without even being in the bug tracker, while there are real bugs stalled for years in the tracker
That's survivorship bias talking. How many blog posts and attempted noise never make any mark?
Better idea: file a bug and write the blog post. Yes, you won't always get all your issues looked at, but that's true of all major open source projects.
This article[0] hints that Microsoft's Edge browser might also being using MWC1616, or something else that has a lot of the same limitations. Hopefully they jump on the xorshift128+ ship.
The article actually hints that Firefox (not Chrome) and Edge used the same RNG. The article says "LCG imported from Java decades ago", but actually Java(OpenJDK)'s java.util.Random still uses the same RNG today, and it's not that surprising Edge also used it.
The generator (LCG with multiplier 0x5DEECE66D) is not bad. Its main problem is that it has only 48 bits of state.
I'd think that having a "secure" random number generator isn't that important of a deal given the fact that all code runs client-side anyway (so why the need for cryptographic security?).
That's actually not always true. Any application running on node.js is using the V8 engine as it's serverside backend, and might thusly be affected. There are also a handful of applications that do clientside encryption. (MEGA comes to mind, someone here should chime in with a better example.) Just because the use case is uncommon doesn't make it invalid.
So in one application I have I use Math.random() as a fallback when window.crypto isn't available to generate UUID's. I am less concerned with security, and more concerned with collisions between different clients. Making Math.random() more uniform helps with that, though only marginally since browsers that did that also support window.crypto.
For people concerned with cross-browser reproducibility as well as resuming a PRNG between sessions (e.g., to reproduce random sequences in replays or multiplayer), check out arbit, an NPM package I made. It performs close to Math.random and uses floats for state internally for max resolution (i.e., length and number of unique sequences):
Is there an explanation anywhere of what technical or other criteria they used to pick xorshift128+? I haven’t seen any from the handful of blog posts, etc. I’ve seen about the change. “[...] having understood the problem and after some research, we decided [...]” is hardly a persuasive analysis.
Were any professional experts on PRNGs asked for advice?
The first one looks more random to me? It has more runs. There was some article (can't remember the source) about generating a random coin flip - the way to tell the difference between a real physical coin flip and a computer is that the real physical flip will have a lot of runs, like 10 times in a row where you get heads. A computer random will attempt to "even out" the spectrum. The two images presented in the article looked similar to these but were reversed, true random looked more like a pattern, while computer generated random looked more like even noise.
TL;DR long strings of repeated results are a sign of true randomness. Am I misinterpreting the relationship between that and this article?
That's true in general, but note that the runs should show up in 2 dimensions in the image. The second one appears clumpy, which exactly what it should look like. People tend to avoid runs, which would make it look more uniform. The first one has an obvious bias because it doesn't look the same in the x direction as in the y.
Randomness looks like white noise. You can't distinguish computer randomness (provided that it's a proper PRNG) from physical coin flips. Computers don't "even out spectrum". The longer the coin flip run, the less likely it is to happen, e.g. you would need on average 2046 coin throws to get 10 heads in a row.
TL;DR probability theory
(If you have free time and want to have some fun, throw a coin and draw a picture recording throws. I once generated a password by throwing a coin for more than hundred or times — each bit taking more than 1 throw to avoid biases — https://en.wikipedia.org/wiki/Fair_coin#Fair_results_from_a_...)
Not "on purpose", of course. But with a bad PRNG, the probability of 10 heads in a row is lower compared to a proper PRNG so in a sense the bad one is "evening out spectrum"
Yeah, it can also be higher. Or lower. Or bad PRNG can give the same number every time. Flipping a biased coin can also produce bad randomness. I'm not going to mention every edge case in my comment ;-)
Computers are trying not to produce results different from fair coin flips.
Those lines are showing up because the old algorithm can't produce all possible values in the 0-1 range. 32 bits of internal state simply aren't enough for that.
The problem is highlighted by the images in this article:
This kind of stuff seems scary to me. One JavaScript engine decides to use this algorithm, another that. This type of change could lead to higher potential of bugs and unexpected behaviour, the average developer just can't figure out when say, using Firefox or Chrome for testing.
When algorithms are getting tinkered with behind the scenes, this leads me to believe there's still way too much churn in the JS space.
There are applications in creative coding and gaming where it's critical to have a reproducible random sequence that gives the same results on all platforms.
Edit Yes, I know obviously the intent was to refer to a pseudorandom sequence, but that’s not what was said. It’s also not what Math.random promises. It promises random-seeming output, nothing more, nothing less. It did not promise a cross-platform, reliably reproducible stream of pseudorandom numbers, and it would be in error to abuse a black-box "random" API by considering it to be a fully specified pseudorandom stream.
rand(3) isn't really the OS's concern, it's part of POSIX and the C standard which don't indeed, but could. Java specifies exactly how java.util.Random should be implemented (LCG from TAOCP with a 48 bit seed), which on the one hand means its well-understood and stable across machines, and on the other hand means it's kinda crap and can't ever be upgraded.
Yep and with the new SplittableRandom they seem to have opted for a more subjective definition with a guaranteed minimum cycle length and passing marks on Diehard. Seems like a happy middle ground.
Quoting Donald Knuth / The Art of Computer Programming:
> “Many random number generators in use today are not very good. There is a tendency for people to avoid learning anything about such subroutines; quite often we find that some old method that is comparatively unsatisfactory has blindly been passed down from one programmer to another, and today’s users have no understanding of its limitations.”