Perhaps I'm missing something here, but where's the guarantee that these ID's are unique? As far as I can tell, an ID is made up of a timestamp and a random number, generated by the JavaScript enginge - and typically seeded with the current time.
Wouldn't that give relatively good chances that for a system with many users, two users would actually generate the same ID?
Best-case scenario (for them) is that the random data is "truly" random. 72 bits of the ID are sourced from Math.random(), which means you need to start worrying about collisions when clients attempt to generate 2^36 events with the same timestamp. That's about 70 billion, so it's not something that would keep me up at night.
What's harder to quantify is the role that sub-optimal random number generators would play in this. Implementation of Math.random is left to the browser, but you could imagine a simple LCG with just a 64-bit long as its state, initialized with the current time. Having only 64 bits of state is the limiting factor here, both for properly filling out the 72 bits of entropy (you can't) and for 2 users to have the same internal state in their LCG.
I don't understand this sentence: "One caveat to the randomness is that in order to preserve chronological ordering if a client creates multiple push IDs in the same millisecond, we just ‘increment’ the random bits by one."
The ID consists of a timestamp to millisecond precision and a random value. Since the timestamp only offers millisecond precision, two IDs generated in the same millisecond could be out of order depending on the random value because the timestamps will be the same. In order to prevent that, if two IDs are generated in the same millisecond, instead of generating a new random value, they take the random value from last time and add 1. This ensures that even though the timestamp of the second ID is the same as the first one, it will sort after the first one.
Yeah, you absolutely can't trust client-generated timestamps. We have a table tracking metadata on customer-uploaded photos (dimensions, file size, color profile, &c), including the camera's timestamp. According to their cameras, our customers have uploaded pics taken before the Buddha sat beneath the Bodhi tree, and after the Khitomer Accord was signed.
Fortunately, we also store an upload timestamp, and our servers all sync to local NTP.
> "This guarantees chronological ordering, since the ServerValue.TIMESTAMP will be resolved on the Firebase server to be an accurate timestamp."
To be sure, this can't possibly guarantee chronological ordering. But it can get you fairly close and perhaps almost predictable chronological ordering...
If at any level there is a clock which can skew and is being adjusted with a stochastic process, there is jitter, and you can get out of order. The solution is simply to apply corrective factors in a continuous manor -- slow down or speed up time. That's what the client adjustment should have been doing all along, and then you would never need the ServerValue.TIMESTAMP hack.
If Firebase is relying on a ntp utility, I'm not sure but I would assume that they are smart enough to slow or speedup time and not jerk to a new setting. Or at least they would have a flag to enable that functionality.
Edit: Distributed auto-incrementing counter is not that hard. A field for time, a field for ID. I don't see a reason to conflate the two. I guess it makes object instantiating easier, since you don't have to fetch an ID in any sense. 64 vs 128 bits is not a big difference. So in the end it's a good design decision and is a net positive for them.
Seems way easier than synchronizing a clock in the browser and one on the server, and you don't have to try to account for network latency between browser and server.
> If at any level there is a clock which can skew and is being adjusted with a stochastic process, there is jitter, and you can get out of order
My understanding is that Linux has support for a monotonic clock. As long as its a single box resolving the times this shouldn't be a hard problem.
Could you please elaborate on time speed up or slow down? Or if you could refer me to some reading material? It's an interesting problem and I'd love to deep dive into it.
This has always been my favorite way to generate IDs, though I typically do it with a 64bit integer and a custom epoch.
Being able to store keys as 64bit integers is much easier to work with, and also much more efficient than b64 encoding. In the odd case you're working with a language that can't handle numbers (ahem… javascript), you can always encode it as a string.
Yes, if you need unique, and generated on the server (not remotely) then also include getmypid() and the IP address (or name) of the webserver (if you have more than one).
Then use the option for more entropy in case the function is called more than one in a milliseconds.
So in total:
uniqid(gethostname().getmypid(), true);
Will be unique if you give sane names to your machines.
Wouldn't that give relatively good chances that for a system with many users, two users would actually generate the same ID?