Hacker News new | past | comments | ask | show | jobs | submit login

I live in the USA. We get labels in English, French and Spanish so that products can be sold in Canada and Mexico. The English labeling is almost always visibly shorter than the French and Spanish. So I hypothesize that English would compress less.

My conjecture is that artificial languages will be more compressible because they haven't had time to get honed down, like English losing "thee" and "thou", that personal mode of address. Esperanto and Loglan are completely regular, which natural languages are not, and thus has a lot of use-cases where the regularity doesn't matter - they haven't had time to lose the mostly-unused features.

For better or for worse, compression of text only uses the surface form to compress, because that's the level that compression works on - letters or bytes or some other unit. You can't compress meaning. Meaning doesn't exist per se: colorless dreams sleep furiously, after all. That is, you can use perfectly sensible words and letters and even legitimate syntax, and still create strings devoid of meaning. A document consisting of perfectly spelled words, and legitimate syntax, yet without meaning like the colorless dreams sentence, will compress identically to ordinary text with the same orthographic and syntactical validity.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: