Hacker News new | past | comments | ask | show | jobs | submit login
A brief history of the UUID (2017) (segment.com)
113 points by tosh on Feb 3, 2019 | hide | past | favorite | 21 comments



Comments from the first post in 2017: https://news.ycombinator.com/item?id=14508413


Reading through this, I kept thinking that ULIDs[1] give the same benefits described, with wider adoption/support.

Luckily it looks like the author has already written up his thoughts on the differences[2].

[1] https://github.com/ulid/spec [2] https://github.com/segmentio/ksuid/issues/8


UILD is pretty much lying though:

> UUID v1/v2 is impractical in many environments, as it requires access to a unique, stable MAC address

RFC 4122 Section 4.1.6 "Node"

> For systems with no IEEE address, a randomly or pseudo-randomly generated value may be used; see Section 4.5. The multicast bit must be set in such addresses, in order that they will never conflict with addresses obtained from network cards.

There is no requirement of "a unique, stable MAC address" in UUIDv1, and most UUID API should allow overriding the node (and probably clock_seq) fields.

> Canonically encoded as a 26 character string, as opposed to the 36 character UUID

> Uses Crockford's base32 for better efficiency and readability (5 bits per character)

> Case insensitive

> No special characters (URL safe)

You could just encode your UUID in base32…

> correctly detects and handles the same millisecond

I mean, that's worse than UUIDv1 by 3 orders of magnitude.

The lexical ordering is not a lie at least, so there's that.


Here's the Google cache until the server regains it's bearings:

https://webcache.googleusercontent.com/search?q=cache:xWcDCg...


We changed from UUID-4 to ZID (https://github.com/zidplan/zid) because it's faster and easier for many of our typical projects, including ones with distributed computing and concurrent computing.

ZID is a secure random number represented as lowercase hex. No embedded timestamp, no MAC address, no reserved character, etc. ZID-64 uses 64 bits, ZID-128 uses 128 bits, same as a UUID, etc.

KSUID describes a hybrid ID approach i.e. the ID is a hybrid of a timestamp as a string and random bits as a string. Our projects use a similar approach, creating a timestamp and ZID (which is more flexible than a KSUID) or if we want embedded time sortability then we use a ULID.


> ZID is a secure random number represented as lowercase hex. […] ZID-128 uses 128 bits, same as a UUID, etc.

So… A UUIDv4?


ZID comparison with UUIDv4:

1. ZID specifies secure random number generation. UUIDv4 does not. Thus ZID is useful in higher-security areas such as creating a unique ID that functions as a password, or bearer token, or proof of knowledge, etc.

2. ZID specifies that it can be as many bits as you want in multiples of 8, and a notation suffix that says the bit count e.g. "ZID-128" means ZID with 128 bits. UUID can only be 128 bits. Thus ZID is more flexible e.g. ZID-64 is a good fit for 64-bit systems, ZID-256 is good for fulfilling requirements for 256 bits of randomness, etc. This notation suffix is akin to the SHA algorithm, which has SHA-128, SHA-256, SHA-512, etc.

3. ZID specifies lowercase for hexadecimal string representation. UUID does not specify lowercase or uppercase. Thus ZID is more-specific; ZID parsing is one step easier/faster/clearer; ZID string comparison uses exact character matching rather than case-insensitive matching. Thus ZID skips entire areas of UUID bugs that we see in practice, such as one UUID system that emits lowercase, one UUID system that emits uppercase, and an integration system that needs to do string comparisons.

4. ZID is always random. UUID has multiple algorithms, as you point out. In practice we have seen the UUID multiple algorithms cause confusion and bugs e.g. when a spec says "UUID" and the implementation uses a UUIDv4 yet the spec's intent was a UUIDv1, or vice versa. Thus ZID makes it easier to write a better spec.

5. ZID subsections all satisfy proof of randomness e.g. computational statistical analysis. UUIDv4 does not, because UUID4 uses 6 fixed bits to indicate the algorithm. Thus ZID is easier and faster to prove as random, both as a whole and also as any subsection such as by subsampling.


Not quite - a UUIDv4 has 121-122 random bits, plus four bits indicating it's version 4, plus 2-3 bits indicating its "variant" (basically its canonical endianness).

If you want UUID compatibility, six hard-coded bits plus 122 secure random numbers gets you a real UUIDv4.


It's remarkable how much influence Domain/OS and Apollo had on later computing and how few people actually remember them. I have an HP 425t here with a Domain keyboard port, but after someone upgraded it to a PA-RISC 715, the keyboard port is no longer connected to anything internally. Somehow this seems metaphorical.

I also remember their computer graphics division. "Fair Play" made the rounds at a lot of CGI festivals around that time.


I guess the NCA/NCS rpc stuff explains why UUIDs are so pervasive on Windows, since DCE/RPC was based on NCA, and MSRPC is based on DCE/RPC.


I don't understand the desire to store timestamp information into a UUID. Why not just add an extra timestamp field to your data? That seems like such a simpler solution then embedding it into your UUID. I would go further and argue that embedding anything but randomness into your UUID is a bad idea that you will pay for in the future.


> "I don't understand the desire to store timestamp information into a UUID"

One reason is to be able provide sortability with respect to what is often a surrogate key attribute, as listed in the introduction:

> "It borrows core ideas from the ubiquitous UUID standard, adding time-based ordering and more friendly representation formats."

You can find additional motivations in the "Time is on our side" section:

https://segment.com/blog/a-brief-history-of-the-uuid/#time-i...

> "In Cassandra, TimeUUIDs are sortable by timestamp, quite useful when needing to roughly order by time."

While you may not agree with the the reasons, I think they are understandable.


> embedding anything but randomness into your UUID is a bad idea that you will pay for in the future

It's not that simple. Depending on your case there could be plenty of reasons to avoid randomness altogether. Randomness doesn't guarantee uniqueness within the system, randomness is slow, randomness gives you false sense of security that you could accidentally rely on, etc.


Very well said.

In looking for an id scheme that focuses on unpredictability and readability, rather than on encoding metadata and sortability, I feel I've finally found what will work well: https://github.com/ai/nanoid

And others seem to agree (though unimportantly on a different platform): https://www.npmtrends.com/nanoid-vs-shortid-vs-cuid-vs-slugi...

(Found through: https://github.com/grantcarthew/awesome-unique-id)


Timestamp in the UUID will make sense if these are generated by one computing node. Even if the nodes are off by a nano second in a cluster, we lose the accuracy.


Timestamps in UUID values shouldn't be (and generally aren't) used for coordination between nodes (where such precision an accuracy would be important): they're used for rough sorting and partitioning of values.

Indeed, node-generated timestamps should never be used for coordination regardless of whether they're encoded in UUIDs or not.


Credit where credit is due: Apollo Computer founder Paul Leach dreamed up and implemented the UID concept, and later took it to Microsoft.


what a strange article. No, networked computing was not invented by Apollo and indeed, I like how the author describes the first UUID as having been based on prior UUIDs. I feel dumber after reading this.


Did you read it, though?

It absolutely does not say that Apollo invented network computing, it just says it was one of the companies at that time working in that field.

Of course there were unique identifiers before the first UUID standard was defined, and the author gives examples.

Acknowledging precursors, following the threads of how a particular implementation or standard developed, is the only intelligent way to read up on its history. The dumb thing would be to read into this things the author simply never said or implied.


I think the clearly wrong statement here is "Workstations were really the first networked computers."


Can you provide indications as to more correct information or sources, as you obviously have useful information for us here?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: