Google: Unicode conquers ASCII on the Web

pmjordan · on May 9, 2008

One disadvantage Unicode has over ASCII, though, is that it takes at least twice as much memory to store a Roman alphabet character because Unicode uses more bytes to enumerate its vastly larger range of alphabetic symbols.

LIES. A decent (though brief) article about increasing Unicode use on the web, and then they write this in their last paragraph. They even mention UTF-8!

Here's the original source of the data:

http://googleblog.blogspot.com/2008/05/moving-to-unicode-51....

aston · on May 9, 2008

Clarification on parent: One of the best parts about UTF-8 is that ASCII text is also valid UTF-8-encoded text.

pmjordan · on May 9, 2008

Sorry, I kind of assumed that the HN crowd would know this. (judging by the comments - maybe the silent, lurking masses don't, I don't know) I've read a few too many articles with glaring mistakes today.

If you're reading this, are even vaguely technically inclined and don't know what we're talking about, please read

http://www.joelonsoftware.com/articles/Unicode.html

http://en.wikipedia.org/wiki/UTF-8

mullr · on May 9, 2008

The fact that this is so often gotten wrong by very smart people (in my experience) that I think we should consider it a design defect.

mullr · on May 9, 2008

To be clear: Unicode v.s. specific encoding terminology is my gripe. The 'UTF-8 is ascii at the bottom' is quite cool.

pkrumins · on May 10, 2008

is all wrong!