Hacker News new | past | comments | ask | show | jobs | submit login

Others have already answered why surrogate pairs are irrelevant (and not UCS), but I think it's worth saying what the probable actual reason for 5-6 byte variants was. Remember that UCS and Unicode were at this point still two separate things; Unicode was supposed to be 16-bit (and later it got expanded, causing the whole surrogates mess), while UCS was supposed to be 31-bit. I assume the 5-6 byte variants were for UCS (back before it got merged with Unicode).



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: