Hacker News new | past | comments | ask | show | jobs | submit login

It's unwise to lazily adopt a silver bullet without understanding the context and thinking through the consequence. I can say if you are not using XML with encoding specified to encode everything everywhere, then you are doing it wrong. You should get all your data into XML as soon as possible. Of course it sounds ludicrous.



XML is just one data storage and exchange format above many, with no particularly interesting properties and no compelling reason to use it. UTF-8 is the only encoding that's ASCII compatible, widely accepted/expected, and can represent any text you'll ever encounter.

I can come up with half a dozen reasons to use something other than XML for data storage. I've yet to hear anyone give me a compelling reason to use something other than UTF-8 for encoding strings. Just because what I said is absurd when you replace UTF-8 with XML doesn't mean the original was absurd.


UTF-8 is not efficient for random access.

I don't have problem with UTF-8. I have problem with the silver bullet attitude advocating using an approach for all cases without thought. That's just intellectually lazy.


No encoding that can handle all the necessary languages will be efficient for random access.

I'm not saying don't think about it. But once you think about it, I think there's really only one sane conclusion to reach.


Never say never. UTF-32 handles them just fine.


Precomposed versus decomposed accents? Jamo versus precomposed Hangul characters? The Unicode code point is rarely useful thing to know about on its own, and code which assumes that one code point equals one "character", for whatever definition of a character is in use, is likely to work poorly with UTF-32.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: