Hacker News new | past | comments | ask | show | jobs | submit login

Last time I checked, even though the web front end counts characters, the Twitter back end counts bytes (as does, I believe, SMS). So unicode doesn't actually save anything (for twitter).



Conveniently, it does save characters on the website at entry time as their site only checks by character count. Also their backend via website doesn't validate by bytes either.

Their API has fluctuated from checking bytes and characters over time-- I think right now it checks bytes. SMS does it by bytes.


Their API has also done both simultaneously at various times, mogrifying tweets as they transition between queues/memcache.

SMS doesn't actually use bytes natively -- it's 160 7-bit characters packed into 140 bytes. As is their way, Twitter fucks this up: they use the 20 spare characters for the "username: " prefix, but limit usernames to 15 characters -- 3 are completely wasted! Why not allow usernames to be 18 characters?

They've historically fucked up plenty of other SMS encoding details like sending & escaped as & and murdering unicode in weird ways -- always truncating the message at an arbitrary tier instead of validating/refusing it up front.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: