Is there an easy way to adapt the algorithm to get the cards drawn in a random order? The article says "it’s easy to randomize the order after the fact" but if we can't store them in memory then that's a no-go.
You can draw the cards in a random order by using the output of an accumulator fed to a block cipher as an index. The fixed output size of a block cipher does entail extra work in filtering numbers outside the range you'd like, just as you would with an LFSR. (as a direct power of 2, you could directly use an LFSR or some ECB-mode block ciphers as if it were format-preserving, but that is "coincidence").
You can produce an exact random permutation with a Feistel network for any number, in this case selecting every number between 0 and 2^64 exactly once, with no appreciable space requirements.
procedure feistel(input) {
result = input;
h = 2<<32;
for (let i = 0; i < 8; i++) {
l = (input / h)|0;
r = input % h;
temp = r;
fn = (r + i) % h;
r = (l + fn) % h;
l = temp;
result = h * l + r;
}
return result;
}
Good eye, `input` should be replaced with `result` when building the subkeys within the loop.
Here's a working implementation:
function feistel(ctr, lr, ll) {
let result = ctr;
for (var i = 0; i < 8; i++) {
let l = (result / ll)|0;
let r = result % ll;
let temp = r;
// alter round fn's multiplier & increment accordingly.
let fn = (r * 16807 + i) % lr;
r = (l + fn) % lr;
l = temp;
result = lr * l + r;
}
return result;
}
let rsize = 8;
let outputs = new Set();
for(let i = 0; i < rsize * rsize; i++) outputs.add(feistel(i, rsize, rsize))
for(let i = 0; i < rsize * rsize; i++) if (!outputs.has(i)) console.log('failure: ', i);
There are kinds of PRNGs that are designed to emit every possible ouput value in the range. There are some that will emit every possible value before a repeat.
If you look at his code, the first function seems to be calling a method that claims to do the latter. I think his documentation may be out of sync.
I understand "you can't use any kind of data structure to remember which cards you've already picked among 2^64 of them" as "N cards don't fit in memory". Or is the relevant distinction memory vs disk, as it's feasible to randomize on-disk data?
The problem asks to find k cards (integers) out of n, in time O(k) and O(1) additional space — this means additional to the k required to store the output. So while holding n items wouldn't fit in memory, it is implicit that k would. But we still want to avoid Ω(k log k) time, or more than a constant amount of additional space.
Not sure that this qualifies as uniform distribution, even if you spread the N across the M. Because the selection must be predictable, so that any next N don't include the previous ones. That's presuming that the ‘user’ might iterate over M items in total, even if you pick chunks of N internally.