Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A shuffle is type of permutation. There is room to disagree on the constraints on the type of permutations allowed and how they are made. Nevertheless, I 100% agree that sampling with replacement is not a shuffle.


While I agree with you, as soon as the semantics of “random” vs “shuffle” enter the conversation, lay people are lost.

To me “shuffle” is a good metaphor because a shuffled deck of cards works a specific way (you’d be very surprised to draw the same card twice in a row!)

But these things are implemented by programmers who sometimes start with implementation (“random”) and work back to user experience. And, for a specific type of technical person, “with replacement” is exactly what they’d expect.


If you let programmers do randomness you're in a world of pain.

On the whole programmers given a source of random bytes and told to pick any of 227 songs at random using this data will take one byte, compute byte % 227 and then be astonished that now 29 of the songs are twice as likely as the others to be chosen†.

In a class of fifty my guess is you're lucky if one person asks whether the random bytes are cheap (and so they should just throw away any that aren't < 227) showing they know what "random" means and all the rest will at least attempt that naive solution even if some of them try it out and realise it's not good enough.

† As a bonus in some languages expect some solutions to never pick the first song, or never pick the last song.


My favorite example of RNG misuse resulting in sampling bias is the general approach that looks like `arr.sort(() => Math.random() - 0.5)`.

> you're lucky if one person asks whether the random bytes are cheap (and so they should just throw away any that aren't < 227)

If you can't deal with the 10% overhead from rejection sampling (assuming your random bytes are uniform), I guess you could try mushing that entropy back into the rest of your bytestream, but yuck.


Wow, that's an abusive ordering function. Presumably this is a thing people might write in... Javascript? And I'm guessing Javascript has to put up with them doing this and they get a coherent result, maybe it's even shuffled, because eh, it worked in one browser so we're stuck with it.

In Rust this abuse would either "work" or panic telling you that er, that's not a coherent ordering so you need to stop doing that. Not certain whether the panic can only arise in debug builds (or whether it would detect this particular abuse, it's not specified whether you will panic only that you might if you don't provide a coherent ordering).

In C++ this is Undefined Behaviour and there's a fair chance you just introduced an RCE vulnerability into your codebase.


It is shuffled badly, and not at all uniformly. And heavily implementation-dependent.

https://stackoverflow.com/questions/962802/is-it-correct-to-...

An example of this out in the wild: https://www.robweir.com/blog/2010/02/microsoft-random-browse...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: