Extremely likely, especially with the increasing abilities of LLM to decode unkn...

__MatrixMan__ · 2024-05-12T17:24:51 1715534691

We'll just fine tune our existing models with data scraped from the whale internet. Surely that will work.

baerrie · 2024-05-12T18:23:46 1715538226

You could train an llm on all existing whale sounds, get it to “listen” to live whales and respond with what it “thinks” it should, then do human analysis on the results, maybe find one shred of meaning, rinse and repeat.

fluoridation · 2024-05-12T20:05:33 1715544333

That's literally impossible. Imagine trying to learn Japanese by talking to a Japanese man on the phone, with neither of you being able to understand or see each other or what you're each doing. Without shared context communication is impossible. Best case, you and the Japanese man would create a new context and a new shared language that would be neither English nor Japanese that would allow you to communicate about whatever ideas fit through the phone line. Maybe words like "sound", "word", "stop", etc.

anakaine · 2024-05-13T02:11:46 1715566306

Impossible in a single step, but perhaps impossible is too strong a word to reject the possibilities that arise when we consider how the statistics of words, or sounds are connected. If you can work out the statistical correlation between groups of sounds you can start to gain an idea of how they are interrelated. This is a stepping stone on the path to understanding.

fluoridation · 2024-05-13T15:31:23 1715614283

>the statistical correlation between groups of sounds

That assumes that the speaker is similar to the person correlating the sounds. For example, if you had statistical data for utterances of English sounds in the context of Magic the Gathering tournaments, and you tried to decipher the speech of a Swahili electrical engineer talking about transistors, you could very well decipher something that's seemingly coherent but entirely incorrect.

It would be an overgeneralization to assume that whales speak about things in the same statistical patterns that humans do.

anakaine · 2024-05-19T08:34:21 1716107661

You just applied the concept of conversation and topic to yourself when the proposal did not suggest that.

nimbleal · 2024-05-13T07:34:07 1715585647

I could see that being possible with a human language, but a non-human language? No way near enough context, I'd think.

4star3star · 2024-05-13T03:55:44 1715572544

Then what's needed is a sea drone that tags along with whales and collects context for their language.

baerrie · 2024-05-12T22:41:25 1715553685

Well even one of those words could be enough. If I knew he was in danger by the terror in his voice well then probably one of those words is “help”

mocha_nate · 2024-05-13T03:28:37 1715570917

Nice. Probably just start with whale twitter and go from there

jjtheblunt · 2024-05-12T17:07:15 1715533635

What increasing abilities of LLMs to decode unknown languages are you referencing?

(I possibly missed a paper)

DougBTX · 2024-05-12T19:27:44 1715542064

No idea of this is even vaguely in the right direction, but this comes to mind: Unsupervised speech-to-speech translation from monolingual data

https://research.google/blog/unsupervised-speech-to-speech-t...

mmmmmbop · 2024-05-12T17:16:34 1715534194

See e.g. here: https://ai.meta.com/research/no-language-left-behind/

dotnet00 · 2024-05-12T17:49:45 1715536185

If you scroll down, the very first step they describe is for collecting datasets of existing translations. They aren't translating even unknown human languages, let alone completely alien ones.

darepublic · 2024-05-12T16:55:55 1715532955

I dunno if sometimes the language would be contextual, and utterances could not be understood without taking into account the context of what is occurring, or the speaker. Yes I know human language can be subject to these variables too. Anyhow it's all speculation and the dream of talking to animals is surely exciting.

Also, a Youtube doc about researchers attempting to teach dolphins english: https://www.youtube.com/watch?v=UziFw-jQSks

BobbyTables2 · 2024-05-12T18:31:09 1715538669

Imagine if they are communicating using a lot of pronouns.

I can’t even understand some other people when they keep switching the target of the pronoun without being explicit.

“He is tired. He dropped the ball on his foot. He yelled at him for being tired.”

(How many people are here?)

082349872349872 · 2024-05-12T19:27:45 1715542065

I've heard that "da kine" in Hawai'i Creole English historically was, and still may be, used exactly in situations where the speakers share plenty of context, allowing them to figure out what it denotes, but leaving listeners largely unenlightened.

compare "dude" in Fig. 1 of https://acephalous.typepad.com/79.3kiesling.pdf

stubish · 2024-05-13T04:03:06 1715572986

In a language such as Thai, pronouns are left out in most cases, and only added when you need to disambiguate. No plurals either, requiring you to add this information with extra words when it matters. But nobody forces you to communicate effectively, or use Oxford commas.

krisoft · 2024-05-13T11:23:20 1715599400

> Imagine if they are communicating using a lot of pronouns.

That's fine. The idea is to record them with lot of metadata in situ. Recording what is going on with the whales. (are they feeding? are they traveling? are they in a new location or somewhere they have been for a while? How many wales there are?) And also about their surrounding (sea state, surface weather, position and activity of boats, prey animals etc etc.)

fnordpiglet · 2024-05-13T05:41:26 1715578886

You would need some way to convert the whale LLM to human language though. Otherwise you would just be making pre trained GPT4 for whales. One option would be to label data according to induced reactions in whales to whale language completions (i.e., let the LLM complete whale language and use the reactions to try to induce some understanding. But it feels unlikely we would get further than providing a chatgpt for whales that only they can understand.

og_kalu · 2024-05-13T12:28:43 1715603323

You wouldn't necessarily need that. You don't actually need translated text for every single language pair a LLM will learn to translate.

ie train a LLM on English, French, Spanish data. This data only contains parallel text in English-French. Can this LLM still translate to and from Spanish ? Yeah.

fnordpiglet · 2024-05-13T19:06:12 1715627172

You still have a bridge and each of those languages are not just from the same species but the same language family. If there’s English to French and French to Spanish there’s a semantic relationship between English and Spanish.

There exists no bridge to whale any more than there is aliens from Alpha Centauri.

og_kalu · 2024-05-13T23:57:18 1715644638

Common concepts are common, what species the language is in is not as relevant as you think. Text and Image space, two entirely different modalities are so related in high dimensional space, you can translate between them with just a simple linear projection layer.

eschaton · 2024-05-12T21:45:12 1715550312

Where do you get this idea that LLMs can be useful “to decode unknown languages” at all?

droopyEyelids · 2024-05-12T16:47:49 1715532469

How would we train the LLM to actually decode it though? Don't we need some way to weigh the results?

TeMPOraL · 2024-05-12T17:02:57 1715533377

My guess: train a generative model to predict whale sounds, based on recordings of real ones, and hope that the resulting latent space will map to the one of a human-trained LLM. We'd need a stupidly large amount of recordings of whale songs, a tokenization scheme, and few already translated sounds/phrases to serve as starting points for mapping the latent spaces.

baerrie · 2024-05-12T18:20:51 1715538051

Exactly. Also, I think an alternative to LLM that is more generally trained towards identifying large linguistic patterns across a language could be cross referenced with the aforementioned more standard llm to at least point to some possible meanings, patterns, etc

dogcomplex · 2024-05-13T03:30:29 1715571029

We'd need contextual tracking of what the whales are actually doing/communicating to match to the songs. An LLM would be excellent at finding any correlated patterns between the language and actions, and then mapping those to similar English concepts, but that all requires the behavioral data too. Cameras strapped to whales maybe?

wumbo · 2024-05-13T04:58:24 1715576304

I unironically want this more than another particle collider.

tomrod · 2024-05-12T18:28:16 1715538496

Would just need a way to tokenize, then use predictions to map back to some positive interaction symbol. Something like we think a certain phrasing means "food-fish-100m-down" and whales respond consistently to that.

surfingdino · 2024-05-12T17:22:36 1715534556

We'd do it without its "help" and give it the results which it would then recombine and hallucinate.