Hacker News new | past | comments | ask | show | jobs | submit login

Yeah, everyone seem to agree that the Han unification was a mistake. But I also think its impact was also exaggerated. The latest IVD contains 12,192 additional variants [1] for the original CJK unified ideograph block (currently 20,992 characters). They indeed represent most variants of common interest, so if the Han unification didn't happen and we assigned them in advance, we could have actually fit everything into the BMP! [2] Of course this also meant that a further assignment is almost impossible.

[1] https://unicode.org/ivd/data/2022-09-13/

[2] Unicode 2.0 had 38,885 assigned characters (almost in the BMP) and ~6,200 unallocable characters otherwise (mostly because they were private use characters since 1.0). 38,885 + 6,200 + 12,192 = 57,277 < 2^16. A careful block reassignment would have been required though.




IVS is going to work out if everyone switches to always use it by default, but it's currently only used for special cases where the desired letter shape diverges from language default(the "bridge-shaped high" cases) rather than the-default and I don't see a perfect IVS world happening.

I mean, it's clearly tied to languages, so I think only viable options are either we split languages into separate tables into those reserved-for-unknown-reasons planes, or add a font selector instruction like ":flag_tw: :right_arrow: 今天是晴天 :left_arrow:", OR, if there's going to be a WWIII and one or more of Zh-Hans or ja-JP becomes obsolete, it's going to be not an important issue.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: