While this is great, I like to give a shout out to DeepL Translator [1]. I'm not affiliated with them but I like to recommend them to people who like to step out of Google ecosystem. I am using DeepL for about a year now and I mostly use it for NL<->EN, DE<->EN. So far, I never felt that the translation is off, or terrible, and if not better in some cases, it's as good as Google Translator.
> So far, I never felt that the translation is off, or terrible, and if not better in some cases, it's as good as Google Translator.
In my experience Deepl is consistently, without fail, considerably better than Google Translate. I basically use Google now only for more exotic language pairs, or full-page translations.
I've found that sometimes what Google does for these is translate from Language A > English, and then English > Language B, which leads to bizarre results.
I have noticed that too. When translation libre from French to Azerbaijani, it gave the Azerbaijani word for free as in free of charge, rather than free as in freedom. That is an ambiguity that is mostly limited to English, maybe some other languages, but definitely neither French nor Azerbaijani.
From reports, Google Translate has effectively created its own internal (AI GD ML) metalanguage, which can interpolate between languages it's not been specifically trained on. E.g., with Japanese <-> English and Korean <-> English, Google Translate can manage Japanese <-> Korean, without being specifically trained to do so.
So yes, there's an intermediary language. But it's not English.
The intermediary language is not English, but due to the way the training set is constructed (pairs of texts in various languages, with the vast majority of pairs having English as one of the languages, is my understanding) it can be very hard to tell apart from English sometimes.
For example, translating "рубанок" ("plane", in the sense of the carpenter's tool) from Russian to Polish used to produce "samolot" ("airplane") in Google translate up until sometime earlier this year, because in the intermediate representation "plane" was ambiguous just like it is in English. It looks like that particular bit is fixed now, which is at least progress! Maybe they've been adding more non-English text pairs...
That's almost certainly accurate. The metalanguage / interlingua isn't English, but being based on A <-> English and B <-> English training, is all but certainly influenced by English grammar, words, and idioms, in ways that direct A <->B training would not be.
It seems to be 'fixed' now, but once I was trying to translate from Hindi to Nepali (which are actually closely-related languages, think Italian and Spanish) with a simple sentence along the lines 'Ram came', where 'Ram' is a common Hindu name (effectively: 'John'), written in Devanagari with a long vowel: राम (rām).
And I gave the Hindi input in devanagari (राम आ गया), but still the Nepali translation ended up being the equivalent of 'the sheep came' (भेडा आयो), so somewhere along the line it seemed to be treating the name राम (rām) as equivalent to the English string 'ram' and translating accordingly.
So if the intermediate language isn't English, it certainly has some English-like properties....
Supposedly, but it behaves suspiciously like English in practice, perhaps because of the input data (lots of texts originally in English then translated to many languages and fed in)
Since we're complaining about Google Translate, I'd like to mention how ridiculous their "verified translation" system is. It works by throwing automatic translations at people who, in their majority, have never studied English, and expecting them to tell whether it's right or wrong, but what happens is that most just confirm whatever they get as being correct. As a result, at least for Portuguese, many of them, if not most, are just plain wrong.
Considering Translate is such an important product, I can't fathom why they just don't hire a single linguist (or just anyone who isn't completely clueless, really) per language to register decent translations, or at least import them from a real dictionary...
Well, the more general problem about asking people visiting Google Translate to verify if a translation from A->B is correct is that generally people visiting Google Translate didn't know how to translate A->B or weren't very sure.
How many people on Google Translate actually are able to reasonably verify translations? Relatively few - and those qualified few who might poke at Translate out of curiosity are just as likely not to feel inclined to offer free labour to Google.
> Considering Translate is such an important product, I can't fathom why they just don't hire a single linguist (or just anyone who isn't completely clueless, really) per language to register decent translations, or at least import them from a real dictionary...
Perhaps professional translators rather than linguists. I imagine they have some linguists on the project, but they're likely to be more NLP-type linguists.
The difficulty is that they're interested not just in word-level meaning/translation-accuracy, but also phrase- and clause-level accuracy, and those are really large (i.e. theoretically infinite) spaces.
I've heard that the Spanish<->English translations aren't too bad.
Sometimes I have seen English words injected verbatim/untranslated in the middle of a phrase, when asked to translate between two non-English languages.
This is pretty standard, even in human (non-automated) translation.
One example is technical/service documentation for a heavy machinery company I worked for. The tech writers were based in Germany, but O&M Manuals were required in Japanese for sale in Japan. Those docs were translated using English as a "pivot language". Usually dictated by pricing (fewer German+Japanese translators = much higher cost).
But one advantage of automated translations is that there shouldn't have to be a pivot language.
And in any case, for German->Japanese, going through English probably has a lower cost. But for Hindi->Nepali, you'll lose a lot of information as Hindi and Nepali are closely related and similar not only in terms of correspondences between vocabulary items, but also grammatical structures, which is effectively 'thrown away' if there's an English, or close-enough-to-English-to-effectively-be-English intermediate translation language. (Not to mention the inefficiencies of the equivalent of sending a package from Delhi to Kathmandu via London.....)
I think in the case of automated translations it's a function of training data and confidence rather than cost. If you don't have a corpus of translation data for the source/target language combination to draw from, you're essentially forced into a pivot model.
For Spanish, at least, it definitely isn't better than Google.
And recently everything I had to look up while reading Gabriel García Márquez, DeepL didn't know. After enough failures I gave up and returned to Google Translate for the remainder of the book.
Google may have a wider range of words, but DeepL is definitely much better in grammar and idioms, which Google tends to translate literally.
Moreover, Google often provides a single target word; whereas DeepL allows you to select from a range of synonyms clicking on a word, and will adjust the sentence accordingly to use the new word. When Google gets the context wrong and provides the wrong meaning for the translation, DeepL's capability to translate with a different meaning is invaluable.
same here. Admittedly it supports just a few language pairs but the translation quality is consistently and considerably better than Google Translate and other major offerings.
During a C1 German course I took, I tried writing an essay In English, using deepl and then submitting it to the teacher. Only manual thing done is choosing the correct alternative from the list of words deepl gives you.
The Teacher said that it was amazing and that many native students she had couldn't write that well.
This seems really quite variable. The second sentence I tried DE->EN ended up with an awkward and confusing literal translation of a phrase that Google Translate handled well.
I find the deepl translations in the available languages very good, much better than google translate. Unfortunately, the selection of languages is (still) very limited.
It looks useful but the lack of non-European languages makes me slightly suspicious, I wonder if their approach generalises to Arabic, Chinese, Japanese, etc.
It depends on which part of their approch you focus on. I'd expect their machine translation model to be sufficiently general to support basically any language given enough training data. Yes, the languages you listed have some edge cases, but so do languages they already support fine. For example, the lack of spaces separating words in Chinese and Japanese can be handled by the same word-piece segmentation they need for German compound words.
The bigger problem is likely to be lack of training data. Unless they have a pile of cash to pay professional translators to produce a parallel corpus, the alternative is to scrape translations from the internet. Basically, crawl the same site multiple times with different Accept-Language headers and try to align the results. Crucially, this depends on an existing ecosystem of bilingual websites with high-quality human translations.
According to DeepL's website they're a spin-off of Linguee, who provide a search service for exactly that kind of parallel data. So before DeepL starts supporting any given language pair, you should expect it to appear in Linguee first. https://en.wikipedia.org/wiki/Linguee
Edit: It took me a while to figure out how to select a different language on https://linguee.com (Their UI seems broken using mobile Firefox Preview.) Appending /english-chinese and /english-japanese to the URL shows that they already support those two, and the alignment of translations appears reasonable to me. No /english-arabic, though.
The European Union, the Swiss confederation, Belgium, Canada and other multilingual states, the European patent office and many international organization provide a huge corpus of professionally translated documents and reports for major European languages. Not so much for Japanese.
If you know of a good source for large piles of docs that have been accurately/naturally translated eng/ch/jp, I'm sure deepl would be interested. As another poster pointed out, any deep learning NLP project boils down to quantity+quality of data. I'm assuming adversarial approaches don't work well in this context but I'm not very familiar with nlp research.
Right but that's kind of the essence of machine translation. You won't always have high quality parallel training data for all languages so you have to find a way to thrive with low quality data.
Google has clearly made it their mission to solve that problem, and I'd say they've been rather successful.
I lived in Germany for a bit and Deepl is what everyone recommended over Google for professional translation. Google is great to figure out how to ask for your schnitzel at the shop, but Deepl is for when you want to make a deal.
I’m not sure about European English, but at least in American, it’s not normal to say “I live in Germany since a bit”. If you don’t live there anymore, you could say “I lived in Germany for a bit”. If you still live there, you could say “I have lived in Germany for a bit”
Based on some (limited) experience with people for whom German is L1 who are speaking English I suspect (without knowing any German myself) that this is a typical formation that a native German speaker would use when intending to form a "for a bit" phrase in English. tl;dr, I think it was a joke.
I have used it for translating several academic writeups, including proposals and papers from English to German. It works like a charm. Also, translates German official letters from banks and government, to English pretty well.
OK, some observations. It looks like they used the UN official documents as a part of their corpus, so it translates regular news from Russian into English almost perfectly. I was actually stunned how good the translation was.
But once you step away from it, quality goes down. I tried translating random pieces of Russian literature and it makes obvious mistakes. It can't even manage the structure of sentences, never mind word choice.
Translations from English are also bad. For example, it translated "I never felt that the translation is off" as "I never felt that the translation is turned off".
In the firm I made my internship, we have to use German in every communication. I use DeepL to check my E-mail or help me write speech for the presentation, etc (not a native speaker). The translator is wonderful!
I'll be blown away when it does near-perfect JP/EN translation (which it doesn't even seem to support). No machine translation has ever been close to being remotely good when it comes to JP/EN, including the ones developed by GAFAM.
I use it since 2 years, mainly for French->English and occasionaly for English->French. I love it. It understand very well idioms and propose excellent translations. Even for traduction of single words, it is far better than google.
As far as I've seen that doesn't support translating web pages other than copy-pasting the text you want to translate... Unless I'm missing something. So HN, tell me, how do I get to that functionality, if it's there?
I was hoping to use them for an application, but their API pricing is orders of magnitude higher than Google's and Microsoft's. I guess they must be focused on the web application primarily.
Sometimes after writing a text I decide to throw it into DeepL for shits and giggles and the translations are pretty much as good as native every time.
[1]: https://www.deepl.com/translator