Hacker News new | past | comments | ask | show | jobs | submit login

UTF-16 has all of the disadvantages of UTF-8 and none of the advantages, and originally comes from the UCS-2 era where they thought, "64k characters are enough for everyone!" Unfortunately, all of Windows uses it, so we as an industry are stuck with it.



You mean that it is not ASCII compatible?

Also, Mac OS X, Java, .Net, Qt, ICU... there are a lot of support for UTF-16, for other reasons than backwards compatibility. Processing UTF-16 is easier in many situations.


Processing UTF-16 would only be easier if you have a valid byte-order mark and/or know the endianness in advance and you can guarantee that no surrogate pairs exist. Otherwise the same pitfalls exist as when processing UTF-8, plus the endianness issue and the possibility of UTF-16 programmers not knowing about surrogate pairs.


What you're describing is UCS-2, not UTF-16. That's why the latter is frustrating to deal with


UTF-16 is not ASCII-compatible, if that's what you were asking.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: