Exactly. Also, I think an alternative to LLM that is more generally trained towards identifying large linguistic patterns across a language could be cross referenced with the aforementioned more standard llm to at least point to some possible meanings, patterns, etc
We'd need contextual tracking of what the whales are actually doing/communicating to match to the songs. An LLM would be excellent at finding any correlated patterns between the language and actions, and then mapping those to similar English concepts, but that all requires the behavioral data too. Cameras strapped to whales maybe?