Hacker News new | past | comments | ask | show | jobs | submit login

Yup. A long time ago, while working on a project with some particularly gnarly Unicode issues, I got in the habit of thinking in terms of grapheme clusters instead of code points (or "characters", for whatever definition of "character" one wishes to use), and it has served me very well. Combining characters pop up in the most interesting places, often where and when you least expect them! ٩(•̃̾●̮̮̃̾•̃̾)۶

Ruby's unicode_utils gem has a nice implementation of the standard grapheme cluster segmentation algorithm, and Python's wrapper around ICU works quite well. Go's concept of runes is certainly an improvement, but it doesn't handle combining characters out of the box...




> Combining characters pop up in the most interesting places, often where and when you least expect them! ٩(•̃̾●̮̮̃̾•̃̾)۶

The good news is Unicode 8 will make them way more frequent! (alternate emoji skin colors are specified via combining characters) much as Unicode 6 made astral characters way more "in your face" (by standardising emoji in the SMP)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: