Previously, machine translation required being trained on a bilingual corpus, that is, a corpus of the same set of sentences in eg English and French. These corpora are pretty hard to come by and expensive to produce.
The paper describes a technique to use two monolingual corpora instead, i.e. one set of sentences in English and a different set of sentences in French. That's way easier to find.
>> These corpora are pretty hard to come by and expensive to produce.
Actually there are lot of texts translated by qualified translators in several languages, for political reasons: EU's commission websites, perhaps some other countries official websites.
You can have a look at Linguee [0] which uses this to provide translation suggestions:
Previously, machine translation required being trained on a bilingual corpus, that is, a corpus of the same set of sentences in eg English and French. These corpora are pretty hard to come by and expensive to produce.
The paper describes a technique to use two monolingual corpora instead, i.e. one set of sentences in English and a different set of sentences in French. That's way easier to find.
It's far from just a definitional trick.