Hacker News new | past | comments | ask | show | jobs | submit login

That's not a great metaphor though, because there are many types of cars, but only one type of Unicode. 'Unicode' is not a generalization for character encodings the way 'car' is a generalization for car models. I think this is exactly the point the article author is trying to make.

A better analogy would be color pixels. Consider a pure-red pixel; there is only one particular shade of red a pixel can have and be pure-red. However, there are multiple ways to represent that color: RGB, HSV, HSL, YUV, CMYK, etc. These are all encodings of the same color. None of them /are/ pure-red, but they all /represent/ pure-red.

Similarly, the 1-4 byte sequences within a UTF-# encoded string aren't Unicode characters themselves; they represent individual Unicode characters. There is only one A in Unicode, but there are multiple ways to encode that A in a stream of bytes.




Well there are at least a half-dozen valid unaccented capital a characters in Unicode, but there is only one LATIN CAPITAL LETTER A, which sits at U+0041.

You are correct that there are many ways to encode U+0041 in a stream of bytes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: