Hacker News new | past | comments | ask | show | jobs | submit login

I think you missed something.

Previously, machine translation required being trained on a bilingual corpus, that is, a corpus of the same set of sentences in eg English and French. These corpora are pretty hard to come by and expensive to produce.

The paper describes a technique to use two monolingual corpora instead, i.e. one set of sentences in English and a different set of sentences in French. That's way easier to find.

It's far from just a definitional trick.




>> These corpora are pretty hard to come by and expensive to produce.

Actually there are lot of texts translated by qualified translators in several languages, for political reasons: EU's commission websites, perhaps some other countries official websites.

You can have a look at Linguee [0] which uses this to provide translation suggestions:

[0] https://www.linguee.com/




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: