I just read this over and it's a very dated Windows-centric view. Several glaring errors - glosses over the difference between UCS-2 and UTF-16, no mention of surrogate pairs for UTF-16 (thinks only 65k code points), says UTF-8 can be up to 6 bytes (no it can't, this was proposed but never standardized), the idea that ASCII standardization dates to the 8088 (its much older), mentions UTF-7 (don't), no mention that wchar_t changes size based on platform, no mention of Han unification, no mention of shaping, and no mention of normalization.
UTF-8 was originally designed to handle codepoints up to a full 32 bits. It wasn't until later that the codepoint range was restricted so that 4 octets would be sufficient.
Kinda half-sad it didn't make it. Would have been cool to able to "see" behind the curtains of UTF strings. As it is now, you can only paste a UTF string in a UTF aware environment, and you also need the correct fonts etc.
It would have been cool to be able to incrementally upgrade legacy environments to use UTF via UTF-7. Unaware parts would just have displayed the encoding. String lengths would have sort of worked.
(All of these things would of course have come with horrible drawbacks, so in that alternative universe I might have been cursing that we got UTF-7...)
Sure, but there is no way this should be used as a reference in 2019. It was wrong even in 2003 when it was written - Unicode 3.0 from 1999 defined the maximum number of code points, surrogate pairs, and code points above U+FFFF.
His single most important fact still rings true though, "It does not make sense to have a string without knowing what encoding it uses."
https://www.joelonsoftware.com/2003/10/08/the-absolute-minim...