Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The intermediary language is not English, but due to the way the training set is constructed (pairs of texts in various languages, with the vast majority of pairs having English as one of the languages, is my understanding) it can be very hard to tell apart from English sometimes.

For example, translating "рубанок" ("plane", in the sense of the carpenter's tool) from Russian to Polish used to produce "samolot" ("airplane") in Google translate up until sometime earlier this year, because in the intermediate representation "plane" was ambiguous just like it is in English. It looks like that particular bit is fixed now, which is at least progress! Maybe they've been adding more non-English text pairs...



That's almost certainly accurate. The metalanguage / interlingua isn't English, but being based on A <-> English and B <-> English training, is all but certainly influenced by English grammar, words, and idioms, in ways that direct A <->B training would not be.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: