Hacker News new | past | comments | ask | show | jobs | submit login
Translating Latin demonology manuals with GPT-4 and Claude (resobscura.substack.com)
170 points by benbreen on Oct 3, 2023 | hide | past | favorite | 113 comments



Everyone is excited about LLM abilities to help with language learning, while completely ignoring the fact that for most people LLMs will make the learning unneeded. There will be less experts in the field, and therefore we will lose the part of language and foreign literature understanding not captured by statmodels. Which is a huge part (subtle contexts in poetry, etc)


You aren't wrong, but this has been a dilemma with every new technology. The camera had that effect, modern metalworking had that effect, even tractors had that effect.

It's definitely a problem we should be talking about, but we can't go back in time or remain frozen, the genie never goes back in the bottle. We have to move forward towards the future while salvaging the parts of the past we want to bring with us.


We need efficiency when we want to maximize comfort and minimize labor.

But nothing forbids people to pursue less efficient endeavors during their free time. There are people maintaining old cars and locomotive. There are people gardening or woodworking "inefficiently" for their own pleasure.

What we remove is the need to force people to work on these fields. Whether we abandon them altogether depends solely on our culture.


> But nothing forbids people to pursue less efficient endeavors during their free time.

Assuming they have free time.


This line of thought goes back to Socrates and his supposed views on writing. (That it weakens the memory)


The major difference is that in the event that there's a catastrophic event, we won't be able to build the tech we need because the intermediate steps will be lost.

It is also how we know there hasn't been a civilization more advanced than us that left without a trace. All the oil that was easy to extract has been extracted.


I don't think that follows, it is easy to imagine an alternative history without all the hissyfits about nuclear fission or with real investment into nuclear fusion starting in the 70s.


It's not just oil, it's also coal and other integral resources that are now difficult to extract because we are at the upper level of the S-curve and efficiency has dropped.


Well, coal is the resource that's gone, but all the metals - iron, silver, aluminum - aren't depleted (they don't disappear when we use them) and are much more accessible than they were before, because we dug them out from deep underground, refined them (especially for aluminum) and concentrated them in various places. If civilization would disappear, our scrapyards are better mining locations than anything the Romans had.


"What if something happens to all the books"


> Raytheon has called in retired engineers to teach its employees how to build the Stinger missiles heavily used by Ukraine’s military—using blueprints drawn up during the Carter administration.

https://www.defenseone.com/business/2023/06/raytheon-calls-r...

And of course, lots of resources have been depleted.


As someone who loves writing in 3 languages (French and German being my native languages, English the third), playing with Claude (and to a lesser degree gpt4) has actually made me play and investigate the nuances of languages so much more. It does a great job of course doing stylistic transformations, which on their own are always stilted, but are great inspiration. But it also does a phenomenal job at explaining the nuances between the language, say when I want to explain why a certain German phrasing “feels” different to me.

Certainly seeing the amount of people never learning the language of the country they emigrated to, this is a problem we already have (and in my situation, never going to country where I don’t speak the language).

I think humans are going to continue nerding with language just as much as they ever have, I do really think it’s an innate drive, and llms are a mind blowing tool to do just so.


> Which is a huge part (subtle contexts in poetry, etc)

Certainly, but from what I remember of my GCSE English Literature back at the turn of the millennium, my fellow students and I didn't understand most of that subtly even when it was famous poets in our native language.

Shakespeare may be unsurprising in this regard given the age (why eye of newt and leg of toad? Some say common names of herbs, others that it's just some amusingly vulgar items), but we were also just as oblivious to the lived experience of being gassed in the trenches as per Dulce et Decorum Est or a cavalry charge as per The Charge of the Light Brigade in English as we would have been if this had been a second language.


I'm not sure about "unneeded". Important motivations why people learn a foreign language are because they want to speak with native speakers in that language without an intermediate (because they move to that country, or because they have a partner speaking that language), or because they find that language interesting/beautiful, or because they want to read or listen to original sources. Machine translation doesn't remove any of those motivations.


> while completely ignoring the fact that for most people LLMs will make the learning unneeded

Good luck using an LLM to talk to strangers in a bar in a foreign language.


This is literally the functionality from the recent OpenAI demo


Yeah, I remember a similar demo, 8 years ago for Skype:

https://youtu.be/G87pHe6mP0I?si=s-BjpVGApB15ciSl

I'm curious to see the next demo, 8 years from now.


The voice chat works pretty great in other languages as well (seriously what's the point of ragging on a feature you haven't used ?) so not sure what the point of this is. There are countless more demos that were not in fact just a demo.


Excuse me, but, what feature are you talking about and how do you know I haven't used it?


The new hands off voice chat that Open AI is rolling out to users. I know because your snide comment makes no sense if you have (It works). And I especially know now because people who actually have used a feature they rag actually say so after the call out, not turn the question around on the accuser.


I'm sorry, but I asked for a clarifying question: which feature do you mean? My earlier comment mentions both Skype and ChatGPT. Which one did you mean I haven't used, and how would you know I haven't, except for the guesswork in your latest comment?

Also I did not realise I was being accused of something? What am I being accused of?


This comment,

>Good luck using an LLM to talk to strangers in a bar in a foreign language.

Followed by,

>This is literally the functionality from the recent OpenAI demo

You respond with,

>Yeah, I remember a similar demo, 8 years ago for Skype: https://youtu.be/G87pHe6mP0I?si=s-BjpVGApB15ciSl

>I'm curious to see the next demo, 8 years from now.

You're going to pretend this isn't a snide comment but asking a clarifying question. Really ? Why are you being intentionally obtuse ?


I'm sorry, have I insulted you? Why are you calling me obtuse?


Homie you're the only one talking about Skype. How many times do they need to say "OpenAI" before that clicks?


A cool aspect here is that the LLM can take over the conversation if you don't know what to say/are bad at smalltalk.


most Latin text have been translated once and more then 100 years ago while most ancient Arabic text have never been translated. this is an old problem.

I see AI as savior here esp with reconstructing old languages we only have small amount of text saved


Which oldest Arabic text do you mean?

Most ancient texts (i.e. Egyptian papyri) are expected to be of too little use to even scan, let alone translate.


This Dan Brown/William Gibson crossover sucks my soul right out of the petabyte SSD I bought in a dark alley.


I've used ChatGPT 3.5 (not 4, too expensive) to translate most of the Latin writings of Jerome, Ambrose, and Ambrosiaster (from Migne's Patrologia Latina) - the translations have been put in this repo in the public domain:

https://github.com/HistoricalChristianFaith/Writings-Databas...

Some takeaways:

- ChatGPT did excellent with about 3 sentences max at a time. Exceeding 3 sentences would cause it to often truncate the response (e.g. translating 3ish of 5 sentences, or hallucinating more).

- ChatGPT would originally return the translation, sometimes randomly prefixed with a variant of "The translation is" and sometimes wrapped in quotes, othertimes not. Using the function interface to ChatGPT eliminated this problem.

- When it comes to quotations from Bible verses, ChatGPT sometimes "embellished" (not sure what else to call it). E.g. if part of Ephesians 2:7 is quoted in Latin, in the English ChatGPT would sometimes insert Ephesians 2:7-8 in full.


I don't really understand what the value in posting these kinds of takeaways about using GPT3.5 here is. GPT4 is significantly better, and improved models are coming. There's just not a lot of point to benchmarking 3.5 when likely every issue you've pointed out is solved by 4.


Average cost to translate an entire work in that repo via GPT3.5 = ~$2

Average cost to translate an entire work in that repo via GPT4 = ~$40

For a side-project, one is feasible to scale, and one ain't.


Could also try Claude-2, as the OP did.


This is really fun. Google Translate is incompetent at Latin, and I'm informed that so far ChatGPT still makes errors of grammar and word choice when generating Latin.

This experiment helps show we can use GPT-4/Claude to parse and summarize Latin, but doesn't yet show that we can rely on them to the level of a human expert.

I'm confident we'll get there pretty soon - and then will be able to rely on LLMs to generate Comprehensible Input and thereby greatly accelerate language learning.


You can include in the prompt a requirement to highlight sections the LLM was not sure about/needs to be verified.


Wouldn't that work just as well as including in the prompt a requirement for it to not make any mistakes?


With some LLMs, emphasizing the possibility and appropriateness of saying "I don't know" has reduced the frequency of hallucinations.


If you assume the LLM can gauge its “confidence” in the last n tokens it generated, which seems within the realm of reason (from a layman’s perspective), then I would think this idea would work better the significant majority of the time. It’s providing an additional dimension of context related to the output (which we’re assuming is sound, or at least not entirely nonsensical), which alone seems like enough of a justification to do this. It’s unclear (to me, at least) exactly what effect adding a “no mistake” requirement to the prompt would have on the LLM’s output; I could see it skipping ranges of tokens that it’s unsure about, which seems less preferable to having it provide a best guess and make clear that it’s only a guess, but I could also certainly see it operating as it otherwise would have without the “no mistake” instruction, giving the same dubious output to a user that may now have an unwarranted increase in confidence in the LLM’s output.

I’ve spent a decent amount of free time doing what feels like coercing, tricking, or otherwise manipulating GPT-4 and Llama2 into doing my bidding - with my bidding being mostly toy ideas for little tools to make random small tasks easier and one or two more interesting ideas that are fun to mess around with, but would probably require some medical-grade antianxiety meds to even consider using in a real production setting (ie a universal ORM.) Even though I’m not developing (or I guess we now call it prompt engineering) in a rigorous or serious way, I’ve found that making the LLM _actively_ reconsider and validate its output works very well, with the effectiveness seeming to be a rough function of “how actively” you trick it into doing so. Giving a list of “be sure to consider these things” at the end of your prompt often works, but also very often doesn’t; adding another step to the process you’re asking them to perform comprised of subtasks that map to the list of gotchas, but reframed as actions you are requiring them to perform, is often the remedy for cases where the simple suggestion list isn’t enough, and is basically a more active variant of the same idea as providing the gotcha list. Dialing it up a bit more, requiring them to provide an update after they complete each subtask to confirm they indeed performed it and to provide a summary of what they found makes their retrospective assessment even more actively involved, and has been a pretty damn reliable trick for ironing out kinks and known failure modes in my prompts.

All that being said, I think the simple fact that you’re now actively requiring them to reflect on their confidence in their output, and therefore the correctness of their output, may lead to this idea improving the quality of output/results as an unintended side effect that would alone make it worth doing.


I think that mostly depends on getting more high quality Latin into the training set, but I'm guessing the new amount of that being generated/discovered is relatively small. Then again, new techniques for training models could prove me wrong.


This is the result when I asked GPT-4 to write "a new, original passage in latin which is inspired by Del Rio's style. Make the Latin elegant and lucid, with witty turns of phrase," then asked Claude to translate it back to English. I am no Latin expert, but I know enough to say that this is clearly lucid, makes sense -- though overwrought -- and has no glaring errors. I will ask my classicist friends to weigh in on the matter of style though.

GPT-4 improvising: "Quemadmodum divini philosophi docent, mysteria naturae introrsus latitant, neque enim sine causa Democritus 'naturam rerum' dixit esse clausam in occulto. Verum, si Angelorum bonorum opera ut in Psalmis celebratur, 'Benedicite, omnes Angeli Domini, Domino,' quo pacto maleficae entitates—daemones scilicet—invisibiles nec tamen inactiones sunt? Mihi consentaneum videtur quod Tertullianus et Ambrosius prodiderunt, angeli esse nuntios divinos, sed daemonas esse nuntios corruptionis. Ne confundamus! Nam verbum 'daemon' Graecis non semper malum significat, sed Romanis—o mores!—semper sinistram portendit.

In huius argumenti speculo, miror qua de causa daemones semper in tenebris vel in locis desertis, vel in nocturna silentia se abscondant. An non, ut Seneca loquitur, 'magna vis veritatis est, quae se per se defendet'? Atqui daemones a veritate abhorrent, quasi nox a lucem fugiat.

Ergo, si volumus a daemonum insidiis nosmetipsos protegere, debemus, ut Virgilius canit, 'forsan et haec olim meminisse juvabit,' non oblivisci quod lux aeterna sit telum acerrimum contra tenebras. Haec ita dicam, et iam concludo: sicut angeli ad caelum pertinent, sic daemones ad abyssum—et utrique suum locum noverunt. Caveamus igitur ne, dum caelum quaerimus, abyssum inveniamus."

Claude translation: "As the divine philosophers teach, the mysteries of nature lie hidden within, for not without reason did Democritus call nature 'enclosed in the dark'. Yet, if by the work of good Angels as celebrated in the Psalms, 'Bless the Lord, all you Angels of the Lord,' in what manner are evil entities—namely demons—invisible yet not inactive? It seems fitting to me what Tertullian and Ambrose handed down, that angels are divine messengers, but demons are messengers of corruption. Let us not confuse the two! For the word 'daemon' does not always signify something evil for the Greeks, but for the Romans—oh the times!—it always portends something sinister. In the mirror of this argument, I wonder for what reason demons always hide themselves in shadows or deserted places or the silences of the night. Does not, as Seneca says, 'truth have great power, which will defend itself'? Yet demons shrink from the truth, as the night flees from the light. Therefore, if we wish to protect ourselves from the snares of demons, we must, as Virgil sings, 'perhaps one day it will please us to remember these things,' not forget that eternal light is the sharpest weapon against the darkness. I will say this much, and now conclude: just as angels belong to heaven, so demons belong to the abyss—and both know their proper places. Let us beware, therefore, lest while seeking heaven we find the abyss."


This is very impressive!

Problems I see in the Latin:

(1) The sentence beginning "verum, si Angelorum" loses the thread in the middle (as you can also see in the English translation). The if-clause (protasis) has no main verb, while there is no then-clause (apodosis) at all.

(2) I think "inactiones" is hallucinated. This word isn't found in Wiktionary, Lewis & Short's Latin Dictionary, or Souter's Glossary of Later Latin. If it existed, it also wouldn't be an adjective meaning "inactive".

(3) In the indirect discourse governed by "prodiderunt", the subject should also be accusative ("angelos esse nuntios divinos").

(4) Since "confundo" is generally transitive, it's a bit questionable to say "ne confundamus" without an object like "eos", although I'm not positive it's not idiomatic.

(5) I would think "sinistrum" rather than "sinistram" if the intention is to refer to "something sinister" rather than "the left-hand side".

(6) "nox a lucem fugiat" is wrong (the preposition "a" always governs the ablative, not the accusative), and should probably be either "nox lucem fugiat" (less likely to me "nox a luce fugiat"). For rhetorical parallelism with "a veritate" it would definitely be "a luce".

(7) The orthography is also inconsistent in writing "iam" but then quoting the Aeneid as saying "juvabit" (if you write consonantal i as j, you would expect to do it consistently everywhere).

Again, it's very impressive. That's a lot fewer mistakes than I would make writing a Latin text of this length!


This is awesome, thank you! Or should I say gratias tibi :)


> Or should I say gratias tibi

You might want to get a verb in there too.


Not necessarily, agere is implied in colloquial usage.


The embedding method that nearly all LLM's use puts them at a severe disadvantage because they can't 'see' the spelling of common words. That makes it hard to infer things like 'past tense words end with an e'.

With small modifications, the exact characters could be exposed to the model, in addition to the current tokens, but it would require a full retraining, which would cost $$$$$$$$.


You remind me of the ELMo architecture.

https://paperswithcode.com/method/elmo


So, next week on HF?


You get really good results if you prompt it with: “You’re an expert in Latin translation”.


"Many people equate the word "daemon" with the word "demon", implying some kind of satanic connection between UNIX and the underworld. This is an egregious misunderstanding. "Daemon" is actually a much older form of "demon"; daemons have no particular bias towards good or evil, but rather serve to help define a person's character or personality. The ancient Greeks' concept of a "personal daemon" was similar to the modern concept of a "guardian angel"—eudaemonia is the state of being helped or protected by a kindly spirit. As a rule, UNIX systems seem to be infested with both daemons and demons."

that naming convention might turn out to be more prescient than people thought. Can't wait until my Catholic school education pays off and I chant at my computer in Latin


Linux actually got the "demons" right: those manifest as usually hidden activity in someone's brain, and in special circumstances can take control over the entire system (e.g. with a deadlock if it's a neutral demon, or by other means if it's malicious). Those Greek daemons, in contrast, never possess or control anyone: they may inspire, but only if the subject is consciously seeking such inspiration ("the gates must be opened from within").


Latin demonology manuals were about GPT-T and Calude in the first place.


"It's clear that GPT-4 and Claude are skilled translators" on what basis? What makes a predictor LLM better at translating Latin than a system trained specifically for translation?

I'm sure they can do a decent job but it's weird to me that someone would leap to GPT-style tech despite its known tendency to hallucinate/make stuff up instead of translation-oriented tools like DeepL or Google Translate (I say this as someone who despises both of those tools due to their quality issues)

I can't imagine there are vast swaths of Latin in GPT's training set.


Well, that's just it - I use Google Translate all the time to translate historical texts, and for whatever reason, GPT-4 and Claude both work better. Since I deal with texts that feature archaic orthography like the long s (ſ) the main advantage over Google Translate is that LLMs can make educated guesses about what a word should be. But even in terms of pure translation ability — assuming all orthographic issues have been corrected — the LLMs do a better job in the languages I've tested and which I can read (early modern Portuguese, Spanish and French, plus Latin).

The post by David Bell which I linked to gets into this for French - I agree with him that ChatGPT (I guess he was using GPT 3.5) has a tendency to "overtranslate." But it is super impressive as a translator overall IMO: https://davidabell.substack.com/p/playing-around-with-machin...


>on what basis? What makes a predictor LLM better at translating Latin than a system trained specifically for translation?

They just are. Sure it sounds a bit strange if you've never thought about it but they are.

>I'm sure they can do a decent job but it's weird to me that someone would leap to GPT-style tech despite its known tendency to hallucinate/make stuff up instead of translation-oriented tools like DeepL or Google Translate

1. They don't just potentially do a decent job. For a couple dozen languages, GPT-4 is by far the best translator you can get your hands on. Google, Deepl are not as good.

2. Tasks like summarization and translation have very low hallucination rates. Not something to be particularly worried about with languages that have sufficient presence in training.

>I can't imagine there are vast swaths of Latin in GPT's training set.

Doesn't matter. There is incredible generalization for predict the next token models as far as proficiency is concerned. a model trained on 500b tokens on English and 50b tokens of french will not speak french like a model trained on only 50b tokens of french but much much better.

https://arxiv.org/abs/2108.13349

It also doesn't need to see translation pairs for every language in its corpus to learn how to translate that language pair(but this is the case for traditional models too)


> a model trained on 500b tokens on English and 50b tokens of french will not speak french like a model trained on only 50b tokens of french but much much better.

Thats because french and english are reasonably similar, and share the same context.

Whilst latin and english are distantly related (latin is more related to french) they do not share the same cultural context.

Which version of latin are you translating? medieval?

Whilst its fun to do, and it has its place. There needs to be massive caveats about accuracy.


It doesn't matter whether it's French or Korean. That was just an illustration.


I haven't attempted latin translations, but anything from my native language to english and back has been 100% perfect, miles better than anything google translate can do


Latin is tricky though. Google translate is notoriously bad at latin grammar, much worse than most living languages.

Not exactly sure why, maybe small corpus, maybe because it's a pro-drop language without fixed word order and an very complex set of conjugation rules.


> Not exactly sure why, maybe small corpus

That's probably most of the problem. The entire corpus of ancient Latin is on the order of 9-10M words. If you printed it at the same density as a typical English novel, it wouldn't even fill a bookcase.


Why would the corpus need to be limited to antique latin? The language was in use for much longer than antiquity. You've got everything from Thomas Aquinas to Isaac Newton to pad with.


Because they are almost different languages "modern" latin is quite different to "roman" latin.


Even roman latin changed during the millennium it was around though.


For some important context, The "Attention is all you need" paper that established the transformer architecture that most LLMs use, is a paper about explicitly machine translation.

It the idea of using transformers for non-translation tasks was only briefly explored at the end of the paper. So it really shouldn't be surprising that LLMs are still good at translating.

Yes, the hallucinations are less than ideal, but the extra freedom is part of what makes their translation abilities so good when they do get it right. And it's not look google translate is completely free of "hallucination" type issues. It's well known that dedicated machine translation models will assume (aka hallucinate) genders when going from non-gendered to gendered languages.


I don't think they're arguing that an LLM is better at translation that an actual translator, just that they are pretty good at it. DeepL and Google Translate definitely also make things up though, so I don't think that's a good comparison...


> DeepL and Google Translate definitely also make things up

I think what they make up is different, but this is a good point. They have a particularly odd tendency to either do something like autocorrect where it wasn't appropriate (translate a different word that is similar in spelling to the requested word), or to make up false friends, doing something like transliterate + then autocorrect in the target language.

One example, which I blogged about 5 years ago but is still mistranslated, is the word "ribbit" (what a frog does): https://www.kmjn.org/notes/google_translates_ribbit.html

In 2018, if you translated it to Greek with Google Translate, it gave you κουνέλι (kouneli), which is Greek for rabbit. A word that is one letter away from ribbit but not close to a similar meaning. When I tried it just now, it translates it to ραβδί (rabdi), which means stick and is completely unrelated to the correct answer, but I guess starts with similar letters as ribbit?


Google search has a horrible tendency to do the same thing to my search terms. Autocorrect is (usually) great when typing on a touch screen but it's horrible when it decides it knows what I mean better than I do.


It's a great question. But note that Google Translate is also trained on "predict the missing token": https://blog.research.google/2022/05/24-new-languages-google... / https://arxiv.org/abs/2205.03983 (search the blog post around “Surprisingly, this simple procedure produces high quality zero-shot translations.”)

This was in May 2022, as part of Google Translate adding support for several low-resource languages (including Sanskrit). I was already very surprised that simply training on predicting tokens does translation so well — then a few months later ChatGPT came out, trained (roughly) the same way and doing a lot of things besides translation.


> What makes a predictor LLM better at translating Latin than a system trained specifically for translation?

Contextual awareness that is baked into the models. Large Language Models are at their core transformation engines. For the operation of transformative text there must be awareness of context. This alone makes LLMs great candidates for translation tasks.


I think it's probably great at Latin and Greek for reasons that should be obvious (plenty of public domain raw material, vast reams of scholarship dating back centuries). It's less good with some other languages, eg some Japanese companies have decided to train their own models due to dissatisfaction with ChatGPT's shortcomings.


Slightly unrelated, since each model is trained and tunes for specific task(s), but the original transformer architecture and paper was built with translation in mind. The original performance tests were language translation benchmarks.


LLM based translates use/add contextual information.

It just choose better word when the original is ambiguous.

Hallucinating in translation task is quite low (much lower than creative, fact finding or information retrieval task)


Translation-specialised models like Google Translate don't actually understand what they're translating. But models like GPT do. This fact is intuitive and easy for anyone to test.


I tried using BingGPT to translate simple Chinese text from screenshots. The results were complete hallucinations, different each time for the same screenshot.

I wouldn't trust these translations at all.


That’s a completely different test. You’re using the vision multimodal ability to decipher Chinese script, essentially adding an OCR step to the process, and it’s not good at OCR of Chinese script.

Try feeding it actual Chinese characters. From what I understand, it’s somewhat competent.


No no, it hallucinated. It didn't get the OCR wrong.


Image input in Bing basically can't handle non English text. Has nothing to do with its Chinese translation ability, which is great.


I envisioned future hacking will be like whispering magic poems and spells aka. prompts to AI systems. I know about prompt injection, however this would raise things to a new level :)


Coming soon, layoffs in medieval history departments.


Why?

You think scholars are mostly translating?


If only. There are far too many documents to read, transcribe and translate compared to the number of people paid to do that sort of thing!


Gpt4 does an ok job translating texts that aren't complex, but if you read the original and its translation side by side, you'll see that gpt4 still makes dumb mistakes every few sentences, hallucinates stuff when it runs into cryptic words it's not familiar with, and sometimes omits important passages. Gpt4 is like a very productive, but clueless newbie.


> gpt4 still makes dumb mistakes every few sentences, hallucinates stuff when it runs into cryptic words it's not familiar with, and sometimes omits important passages

This, but in all domains where you are an expert - it becomes apparent that GPT-4 makes stupid mistakes. It makes mistakes frequently. Can't seriously see it replacing humans soon, it is far from having four 9's of reliability.

Probably needs 99.99% accuracy to work alone unsupervised by humans, because at each decision step it incurs the error rate again and again, so error rate is growing exponentially in sequence length. Coupled with the "forward only" thinking pattern of LLMs that doesn't allow backtracking and planning this error rate kills autonomy.

Recently, GPT-4-Vision was found to have poor OCR accuracy. Smart but stupid, same story.


And a demon summoning is one of the worst imaginable places to have dumb mistakes in the instructions and incantations.


I always figured our AI overlords would kill us in a Terminator kind of way, not in a "No! You must not read from the book!" sort of way.

https://www.youtube.com/watch?v=E0DIsPBczcI


Digital Devil Saga indeed


I always wondered why these texts are so difficult to interpret... ... ... why certain symbols, like a crow have ambivalent meanings. In some cultures the crow is evil, while in others, it's benevolent.

GPT4 to the rescue, let's see what'll happen if everyone has the means to summon demons, curse others and the like.


What could possibly go wrong.


I've been working on a language-learning app, and gpt-4 has made things doable that didn't seem to be doable without it. For example, translating to lesser known languages like Cree. Online you can find a dictionary of words, or phrases, but I wasn't able to find a translator for arbitrary statements. It's not listed on google translate, nor yandex, et al. But, it's possible with gpt.

My strategy in automated translation is to translate, and then translate back to English. That way we can be fairly sure the translation is accurate. Of course if this app "has wings" I would open source the translations to allow corrections, and/or hire native translators for languages I don't speak (which is the vast majority of languages).

Last tip is that for me, I was able to make a pretty good automated system for this. What I did was spell out a monstrosity of a system-prompt which ensures a few things. One, it will always give me the most descriptive romanization for non-ASCII languages. Two, it will give me output that is essentially .csv data. Three, if it encounters a made up language like Pirate, it will try (and do a pretty good job). The rest is just parsing my final translation file to find prompt/language pairs which aren't saved yet and piping that to the monstrous prompt which queries for translations.

Then you can run a separate program for validation. See if it passes my game of telephone test.

This way I and others could learn common phrases in languages which are hard to otherwise access. It even works for dead languages/dialects/character-voices, etc. For me, it's pretty amazing actually. Please note that I never said perfect. But it's pretty damn close actually.

All in all this workflow allows what would have been unimaginable even two years ago.

In doing tests today I found it interesting/useful to note how gpt-4 is "thinking" about translating the word "settings". Here is its thoughts. This is not the exact system prompt I'm using for my app.

  system_prompt = "You are an expert in translating lesser-known languages. When translating you will include both the native writing system, and the romanization into the latin alphabet. When you romanize text you always include any accents or pronunciation marks."

  user_prompt = "Translate the following into Cree. [Hello, goodbye, settings]"

  # Output
  In Cree language, your words will be translated into:

  1. Hello - ᑌᔭᔭᑎ, romanized as "Tānsi"
  2. Goodbye - ᐊᔭᙱᐂᒥᑎ, romanized as "Ayāwāw"

  "Settings" is a bit more complex, because it implies technological context that doesn't necessarily have a direct equivalent in Cree. However, a possible option is:

  3. Settings - ᓂᐹᕗᓂᑕᐚᓇᐠ, romanized as "Nipāvunitawānāk", which might refer to "adjustments".


I really hope you check with native Cree speakers before you publish a system like this. Smaller language communities are already being hurt by low-quality generated text. See also https://news.ycombinator.com/item?id=35300603 for an experience of GPT with the language Northern Saami.

Also, back-translation was never a good evaluation method. Say I don't know English; and I make a new system that calls Google Translate word for word. I back-translate the Norwegian input "den sykkelen kunne jeg gjerne hatt" and my word-for-word system gives "it the bike could I happily hat". Now I evaluate that whole sentence by putting it into Google Translate and presto I get "den sykkelen kunne jeg gjerne hatt" – can I conclude that word-for-word Google Translate is perfect? Or even worse, if you use the same system in both directions, anything that's looks like a bijection if you squint will do (reverse then reverse, transliteration into Canadian syllabics then back, encrypt/decrypt, map each word to a number, etc.).

If you have a bilingual corpus of Cree text, then you can get some automated measures (some popular ones listed at https://wiki.apertium.org/wiki/Evaluation ), but it would have to be a corpus you didn't find online since otherwise GPT-4 has it in its training data and it the numbers will not be valid for sentences that are not in that corpus.


At this point any output is considered experimental and would definitely need human intervention to achieve any kind of quality.


I think there's some hallucination creeping in here!

(1) This HN discussion is the only Google search result for each of these syllabics strings.

(2) I tried using https://syllabics.atlas-ling.ca/ to transliterate these to Roman letters, and none of these was transliterated in the same way as the GPT-4 output (although the third one is somewhat close).

(3) I searched and found that "hello" in Cree is likely written ᑖᓂᓯ (not ᑌᔭᔭᑎ), while correctly romanized as "tān[i]si".

Your approach is clever, but I think the language model is still ultimately overconfident (and confused) here somehow.


Yeah, for Cree it is definitely more suspect than trustworthy. Another thing I noticed was that on another attempt I actually received different translations, so.. it's hard to say how this is going to be refined to be usable, or if it indeed is at all.

And wow, yes we are all alone on google results for those strings.

EDIT 1: Another thought occurs to me, if it's getting the transliteration right, and not the syllabics, maybe I seperate the tasks and go english -> transliteration -> syllabic. I will have to see if that approach works better.

Another idea might be to use that syllabics site to bring it from transliteration -> syllabic. I noticed that they were correct if translated there.

EDIT 2: By updating the system prompt I was able to get it to translate properly. I had to remind it to be correct!

  You are an expert in translating Cree. When translating you will include both the native writing system, and the romanization into the latin alphabet. When you romanize text you always include any accents or pronunciation marks. You use syllabics properly and in the modern usage

  Hello - ᑕᓂᓯ (Tânisi)
  Goodbye - ᐅᑲᕆ (Okaawii)
  Settings - ᐅᑌᕁ ᐟ (Otēw with Roman orthography)


> I had to remind it to be correct!

It's so funny to encounter the effects of language models producing the highest-probability completions of a prompt, and how those aren't necessarily the same as the most correct completions.

I also saw something like this with people asking GPT models to write poetry, and they wrote mediocre poetry. But then when asked to write good poetry, they wrote better poetry!


Yeah, I found that for that kind of use case you really wanted to remind it. You could even say things like,

  written beautifully with an intricate sense of wordplay
  in the style of [multiple good poets] 
If you're in the chat interface you could even do:

  that was really great! But I want you to write it better!


Why would you use an AI for something you know nothing about? that's, like, the one use case that will not work and results in unchecked hallucinations


As an inherently flawed starting point, versus no starting point


There are a lot of ways to translate, for example using people who actually have the job of translating languages. There are places that will connect you to one. There are many ways to do this, and theyre not "type it in and burn the energy of 100 suns to get a wrong but confident result"


This is such a cool example. "Adjustments" makes so much sense, doesn't have the very modern "settings" implication of a device UI/machinery


Yeah. Hopefully it actually makes sense!


have you had any luck with old Akkadian scripts?


I lost three hours reading through the rest of this wonderful blog.


Sure give the AI barely constrained by alignment prompts that flip off when given the wrong prompt arcane rites to summon demons that will end well...


It makes a few mistakes still, and that makes it "just a tool"? Ok, how about in a few weeks when it stops making mistakes?


Compilers, calculators, hammers, spreadsheet software, watches don’t make mistakes either, and they are “just tools”.


Real demons prefer Latin.


Oh no, more demons summoned! AI ruins the world.


Institutional review board time. We've already been warned that a computer merely enumerating the names of God can end existence. Then surely a computer can also summon ancient demons.


The author has no sense of self preservation!

> Demonology is a topic I’ve been interested in ever since I read Stuart Clark’s brilliant Thinking with Demons. However, it’s also one I find intimidating to do real research in, since so many of the sources involved are written in a highly erudite form of Latin.

Should be:

> since I don't want to end up with a creature from the netherworld in my study.


I mean, my computer has Legion written right on the front of it, so this just makes sense.


From what I've gathered from Catholic exorcists, demons adopt different personas, and shouldn't be trusted about anything they say. The only questions the exorcist asks are those pertaining to the case, all in the interest of breaking the claim of demons and expelling them, in the name of Christ, the stronger man from the parable. As the Lord says, Satan is a liar and a murderer from the start, and when he lies, he speaks his native tongue. What I'm saying, keeping a database of demons makes little sense.


Slightly off topic but this is hilarious. We are already crafting "chants" and "spells" for LLM, i.e. prompt engineering, now we are teaching it demonology too? Some priest from the middle ages would have a heart attack.

Now I know how the AI apocalypse would look like. GPT-42 would summon hordes of demons from the pit of Hell to bring about the end of days. Who need all that pesky nuclear codes when you can call upon Satan?


Technically the origins of the Gnostic demiurge as a concept comes remarkably close to the modern trend of AI.

A lot of the imagery was inspired by the Orphic Phanes, a dual gendered being of creation and light that gestated in the cosmic egg of the universe.

There were groups that combined the idea of an eventual creator with naturalism such that you had the idea of an original dual gendered primordial 'man' which then brought forth a 'son of man' that ended up creating us in the images of that original.

They allegedly patterned this on the Phrygian mysteries of a spontaneous original being that was likened to a tumor springing into existence.

They even tied this all into concepts around the dependence of the soul on the body, suggesting an original physical world and a non-physical replica created by the demiurge where death was functionally escapable.

The later influence of Neoplatonism flips it all back to a perfect form original and corrupted physical incarnation - the version of Gnosticism most well known today.

But for a tiny slice of time in antiquity you had people talking about the ideas of a naturally occurring man in a physical world giving rise to a creator of light which then recreated the physical original man in a non-physical twin of the cosmos in order to effectively allow them to escape death.

Which is a pretty wild set of beliefs for the time, but less so in an age where we are moving towards humanity bringing forth AI literally in light (optoelectronics is a likely next hardware shift), creating digital twins of the world around us, and increasingly making that world inhospitable to our continued existence.

Some of the nuances of what they were thinking about are pretty wild to consider in modern contexts.


Last year I partnered with a friends company to try to build a general AI management platform and I've always felt that this stuff is very occult in nature but with all the buzz around new LLMs and stuff I leaned in really hard on the occult imagery for the design part of it. The project didn't continue, but I'm going to keep that in my pocket and try to reform it.

I put the design docs up just for posterity.

[1] project tombstone: https://orphanim.ai

[2] Design and Style Guide https://docs.google.com/document/d/1JsEeok9NmRGDAF3o0AX7h9oz...


I remember the moment I first saw ChatGPT. "Finally I can translate all my Latin demonology manuals"




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: