Hacker News new | past | comments | ask | show | jobs | submit login

I also hit this issue when saving wikipedia pages to database, since some languages use these "4-byte UTF-8":s.

AFAICR, there is performance issues about this. Western text does not "need" this representation and thus the MySQL utf8 handling could be faster.

I was also shocked to learn this while using MySQL 5.3 (where utf8mb4 is missing), which led me to upgrade to the 5.5 alpha.

From https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8...

"The character set named utf8 uses a maximum of three bytes per character and contains only BMP characters.

As of MySQL 5.5.3, the utf8mb4 character set uses a maximum of four bytes per character supports supplemental characters"

PS. I have migrated to MariaDB. Take this chance to move away from Oracle MySQL!




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: