"It's clear that GPT-4 and Claude are skilled translators" on what basis? What m...

benbreen · on Oct 3, 2023

Well, that's just it - I use Google Translate all the time to translate historical texts, and for whatever reason, GPT-4 and Claude both work better. Since I deal with texts that feature archaic orthography like the long s (ſ) the main advantage over Google Translate is that LLMs can make educated guesses about what a word should be. But even in terms of pure translation ability — assuming all orthographic issues have been corrected — the LLMs do a better job in the languages I've tested and which I can read (early modern Portuguese, Spanish and French, plus Latin).

The post by David Bell which I linked to gets into this for French - I agree with him that ChatGPT (I guess he was using GPT 3.5) has a tendency to "overtranslate." But it is super impressive as a translator overall IMO: https://davidabell.substack.com/p/playing-around-with-machin...

og_kalu · on Oct 3, 2023

>on what basis? What makes a predictor LLM better at translating Latin than a system trained specifically for translation?

They just are. Sure it sounds a bit strange if you've never thought about it but they are.

>I'm sure they can do a decent job but it's weird to me that someone would leap to GPT-style tech despite its known tendency to hallucinate/make stuff up instead of translation-oriented tools like DeepL or Google Translate

1. They don't just potentially do a decent job. For a couple dozen languages, GPT-4 is by far the best translator you can get your hands on. Google, Deepl are not as good.

2. Tasks like summarization and translation have very low hallucination rates. Not something to be particularly worried about with languages that have sufficient presence in training.

>I can't imagine there are vast swaths of Latin in GPT's training set.

Doesn't matter. There is incredible generalization for predict the next token models as far as proficiency is concerned. a model trained on 500b tokens on English and 50b tokens of french will not speak french like a model trained on only 50b tokens of french but much much better.

https://arxiv.org/abs/2108.13349

It also doesn't need to see translation pairs for every language in its corpus to learn how to translate that language pair(but this is the case for traditional models too)

KaiserPro · on Oct 4, 2023

> a model trained on 500b tokens on English and 50b tokens of french will not speak french like a model trained on only 50b tokens of french but much much better.

Thats because french and english are reasonably similar, and share the same context.

Whilst latin and english are distantly related (latin is more related to french) they do not share the same cultural context.

Which version of latin are you translating? medieval?

Whilst its fun to do, and it has its place. There needs to be massive caveats about accuracy.

og_kalu · on Oct 4, 2023

It doesn't matter whether it's French or Korean. That was just an illustration.

lukebitts · on Oct 3, 2023

I haven't attempted latin translations, but anything from my native language to english and back has been 100% perfect, miles better than anything google translate can do

marginalia_nu · on Oct 3, 2023

Latin is tricky though. Google translate is notoriously bad at latin grammar, much worse than most living languages.

Not exactly sure why, maybe small corpus, maybe because it's a pro-drop language without fixed word order and an very complex set of conjugation rules.

duskwuff · on Oct 4, 2023

> Not exactly sure why, maybe small corpus

That's probably most of the problem. The entire corpus of ancient Latin is on the order of 9-10M words. If you printed it at the same density as a typical English novel, it wouldn't even fill a bookcase.

marginalia_nu · on Oct 4, 2023

Why would the corpus need to be limited to antique latin? The language was in use for much longer than antiquity. You've got everything from Thomas Aquinas to Isaac Newton to pad with.

KaiserPro · on Oct 4, 2023

Because they are almost different languages "modern" latin is quite different to "roman" latin.

marginalia_nu · on Oct 4, 2023

Even roman latin changed during the millennium it was around though.

phire · on Oct 4, 2023

For some important context, The "Attention is all you need" paper that established the transformer architecture that most LLMs use, is a paper about explicitly machine translation.

It the idea of using transformers for non-translation tasks was only briefly explored at the end of the paper. So it really shouldn't be surprising that LLMs are still good at translating.

Yes, the hallucinations are less than ideal, but the extra freedom is part of what makes their translation abilities so good when they do get it right. And it's not look google translate is completely free of "hallucination" type issues. It's well known that dedicated machine translation models will assume (aka hallucinate) genders when going from non-gendered to gendered languages.

vore · on Oct 3, 2023

I don't think they're arguing that an LLM is better at translation that an actual translator, just that they are pretty good at it. DeepL and Google Translate definitely also make things up though, so I don't think that's a good comparison...

mjn · on Oct 3, 2023

> DeepL and Google Translate definitely also make things up

I think what they make up is different, but this is a good point. They have a particularly odd tendency to either do something like autocorrect where it wasn't appropriate (translate a different word that is similar in spelling to the requested word), or to make up false friends, doing something like transliterate + then autocorrect in the target language.

One example, which I blogged about 5 years ago but is still mistranslated, is the word "ribbit" (what a frog does): https://www.kmjn.org/notes/google_translates_ribbit.html

In 2018, if you translated it to Greek with Google Translate, it gave you κουνέλι (kouneli), which is Greek for rabbit. A word that is one letter away from ribbit but not close to a similar meaning. When I tried it just now, it translates it to ραβδί (rabdi), which means stick and is completely unrelated to the correct answer, but I guess starts with similar letters as ribbit?

20after4 · on Oct 4, 2023

Google search has a horrible tendency to do the same thing to my search terms. Autocorrect is (usually) great when typing on a touch screen but it's horrible when it decides it knows what I mean better than I do.

svat · on Oct 4, 2023

It's a great question. But note that Google Translate is also trained on "predict the missing token": https://blog.research.google/2022/05/24-new-languages-google... / https://arxiv.org/abs/2205.03983 (search the blog post around “Surprisingly, this simple procedure produces high quality zero-shot translations.”)

This was in May 2022, as part of Google Translate adding support for several low-resource languages (including Sanskrit). I was already very surprised that simply training on predicting tokens does translation so well — then a few months later ChatGPT came out, trained (roughly) the same way and doing a lot of things besides translation.

imchillyb · on Oct 4, 2023

> What makes a predictor LLM better at translating Latin than a system trained specifically for translation?

Contextual awareness that is baked into the models. Large Language Models are at their core transformation engines. For the operation of transformative text there must be awareness of context. This alone makes LLMs great candidates for translation tasks.

anigbrowl · on Oct 4, 2023

I think it's probably great at Latin and Greek for reasons that should be obvious (plenty of public domain raw material, vast reams of scholarship dating back centuries). It's less good with some other languages, eg some Japanese companies have decided to train their own models due to dissatisfaction with ChatGPT's shortcomings.

gorbypark · on Oct 4, 2023

Slightly unrelated, since each model is trained and tunes for specific task(s), but the original transformer architecture and paper was built with translation in mind. The original performance tests were language translation benchmarks.

j16sdiz · on Oct 4, 2023

LLM based translates use/add contextual information.

It just choose better word when the original is ambiguous.

Hallucinating in translation task is quite low (much lower than creative, fact finding or information retrieval task)

ogogmad · on Oct 3, 2023

Translation-specialised models like Google Translate don't actually understand what they're translating. But models like GPT do. This fact is intuitive and easy for anyone to test.