Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>So there's a motivation to store simpler strings using 1 byte per char.

What advantage would this have over UTF-7, especially since the upper 128 characters wouldn't match their Unicode values?




> What advantage would this have over UTF-7, especially since the upper 128 characters wouldn't match their Unicode values?

(I'm going to assume you mean UTF-8 here rather than UTF-7 since UTF-7 is not really useful for anything, it's jus a way to pack Unicode into only 7-bit ascii characters.)

Fixed width string encodings like Latin-1 let you directly index to a particular character (code point) within a string without having to iterate from the beginning of the string.

JavaScript was originally specified in terms of UCS-2 which is a 16 bit fixed width encoding as this was commonly used at the time in both Windows and Java. However there are more than 64k characters in all the world's languages so it eventually evolved to UTF-16 which allows for wide characters.

However because of this history indexing into a JavaScript string gives you the 16-bit code unit which may be only part of a wide character. A string's length is defined in terms of 16-bit code units but iterating over a string gives you full characters.

Using Latin-1 as an optimisation allows JavaScript to preserve the same semantics around indexing and length. While it does require translating 8 bit Latin-1 character codes to 16 bit code points, this can be done very quickly through a lookup table. This would not be possible with UTF-8 since it is not fixed width.

EDIT: A lookup table may not be required. I was confused by new TextDecoder('latin1') actually using windows-1252.

More modern languages just use UTF-8 everywhere because it uses less space on average and UTF-16 doesn't save you from having to deal with wide characters.


Latin1 does match the Unicode values (0-255).




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: