Hacker News new | past | comments | ask | show | jobs | submit login
The World's Most Efficient Languages (2016) (theatlantic.com)
148 points by Jtsummers on Dec 1, 2018 | hide | past | favorite | 98 comments



> But does this mean a different way of experiencing life?

What is noticeable, at least for me, is a different approach to naming things. E.g. in German technical things are usually named by what they are or do (descriptive), while in English many things are just named "randomly" after e.g. people or who/where it was first used. Consider: Elastitizätsmodul vs. Young's modulus; Steilkegel vs NMTB taper; Tellerfeder vs Belleville washer; ...


A lot of things in English are also named as to what they do, but they are disguised because we tend to use Greek and Latin. I remember when I was learning German I found it amusing that the Germans call a television "ein Fernseher" -- literally a far-seer. But then I thought about what television means! It's tele (far) + vision!


The Germain "ein Ferhseher" is likely just a literal translation of the English "television", being a 20th century technological invention.


This. Trying to categorically distinguish between English and German words gets more difficult when a lot of modern terminology just gets shipped over as loan words.


But what makes German a little different is that they translate the loan words using their own roots. No doubt Fernseher is taken from television, but compare it to Spanish "televisión" -- there they literally took the word.


Some loan words are, some aren't. Certainly a lot of them do get properly translated, and it's always fun to see what sticks.


The level of humour I experience from spoken Spanish (especially on day-to-day conversation) is on another level compared to English. This is solely based on some words/phrases sounding way more funny in Spanish than in English. Can't think of any examples, but generally a quick 2-3 word phrase in Spanish will carry some (intended) funny-ness to it, whereas in English, the joke is on the meaning of the words rather than the sound and expression of it.

My English comprehension level is comparable to native speakers, so it's not because I'm missing any jokes. This is not to say I don't enjoy comedy in English, in fact, I seem to enjoy it as much as anyone else.


I was thinking about this as well, words and phrases carrying certain emotions etc. But it might be just that as a non-native speaker, I am not attuned to it so it's harder to pick up on. I think it's definitely there though.


> But does this mean a different way of experiencing life?

it does mean.. "paying attention to different things in life". Culture and language are ying-and-yan, either one shapes the other. Like, when same word means very different things, vs when there are plenty of words/constructs for some nuances that might be hard-to-differ for a foreigner. Whether the former means "we dont care about that", or just the opposite, "we do know that so well that it is obvious", i dont know..


I think it's the general tendency of English to be "hieroglyphic", like Chinese. English spelling teaches this approach early on: some logic it there, but you can't rely on it and still have to memorize, not infer. Then memorizing names, instead of logically inferring / producing them, becomes natural.


But conversely, we have X-rays vs Röntgenstrahlung.


Neither of those are especially descriptive. X tells you no more than Röntgen does.


When programming, I name all my functions after myself ;)


This reminds me of someone who named all of his variables after vegetables... I wouldn't want to inherit that product


Actually I use this in one particular case: in Android there is an event callback "onActivityResult" and it's first parameter is integer. In all demos online you see officially looking name like SDCARD_WRITE_PERMISSION and when noobs copy paste it and try to compile it they can't find in which library it is defined, so when in demo code I use BANANA it's little bit more obvious that they have to define it yourself. But MY_SDCARD_WRITE_PERMISSION also works.

Banana makes it obvious that it's not official and on which two places it is connected together.


Unless its a salad app. `toss(lettuce, tomatoes)`


Had a consultant developer once who named his functions like doMagic() & doStuff()


A legitimately very talented developer that I worked with showed me some of his first code at the company that contained `function post_document_load_do_shit` and `var turd`


More of a young male thing than anything to do with ability.


I do this if it's not production code. It's memorable, and it's even semantic in some cases.


In C++, many things carry the STD prefix.


fortunately it's std not STD. The first looks like nothing in particular, the latter looks like S&M kaomoji.


sexpresso_std


> In a Native American language of California called Atsugewi (now extinct), if a tree was burned and we found the ashes in a creek afterward, we would have said that soot w’oqhputíc’tainto the creek. W’oqhputíc’ta is a conglomeration of bits that mean “it moved like dirt, in a falling fashion, into liquid, and for real.” In English, we would just say “flowed.”

I realized something like that in English when translating. The word "slay" means, according to a basic dictionary, to kill (violently). But to an English speaker it means much more. I wouldn't say that a school shooter slayed the children, rather shot. The Boston bomber didn't slay the runners, but rather bombed. I would consider slaying to be an act done by a sword. But, in the case of a terrorist stabbing people, I would call that a stabbing, not staying. I would say that slaying is only appropriate if the sword attack was going across a body and not directly into, like with stabbing.


"Slay" is a funny word. I think of it mostly having a connotation of a just person killing something evil. So I could slay a dragon, but not a chicken.

It has connotations of heroism.


Chickens can be proper evil buggers. Slaying is júst the word you are looking for.


"And lo, it was an evil chicken before my eyes. And as he was evil, I slew him."


Thank you! Remember they are, in essense, dinosaurs. Evil little dionsaurs with mad, beady eyes!


IDK, many serial killers are known as The Foo Bar Slayer.


Maybe it's actually that it's archaic and, therefore, used only for grandiose things? Serial killers are usually given "cool" names.

I don't know exactly what the connotation is, but it's real.


It almost becomes fully inverted in the Legend of Zelda games. I think this behavior is there in the whole series, but in Ocarina of Time in particular, if you kill a chicken, there's a dramatic cutscene of the chicken dying which foreshadows the chicken's brethren registering your evil. They then chase and attack you. You can't fight the horde back, so they eventually kill you unless you get indoors before that occurs.


Minor correction: When you attempt to kill a cucco (chicken) after a few hits it'll crow for the backup ones which kill you. The first one never dies.

https://www.youtube.com/watch?v=SFRI3byJgYA


"The enemy has been neutralized"


English has lots of pairs of words that both mean the same thing. Examples: pork and swine, beef and cow, mutton and sheep, battle and fight, fraternity and brother, justice and right, elevate and lift.

The words pork, beef, mutton, battle, fraternity, justice, elevate are all French, and were used by the Norman aristocracy after 1066. They all connote sophistication and nobility. The words swine, cow, sheep, fight, brother, right, lift are all Anglo-Saxon, and were used by the powerless peasants after 1066. They all connote earthiness and familiarity.

There is a fun podcast "The History of English" that covers a lot of this stuff. If liked the OP, you'll like the podcast.


You mention beef and cow but missed steak and here is where it gets really freaky!

Beef is derived from boeuf. Beefsteak is English. Biftek is French and derived from beefsteak. So the French word is derived from English which was, in turn, French.

A quick search shows that the etymology of steak seems to be Norse. So we have a good old mishmash here. Anyway, the important point is that England invented the beefsteak and exported the concept to France. However they do seem to have accepted the baton and done a pretty good job with it ever since.


The word 'boulevard' is similarly freaky. The Dutch (and others) took it from the French, who initially took it from the Dutch (bolwerc/bullwark).


In Italian we have "bistecca", which is a slab of red meat, not only from beef but also from pork.


But those words don't mean the same thing.


The roots for the modern word slay go very far back with a basic meaning of 'to strike' or 'to beat.' So shooting and bombing are definitely in that range, since they involve a bullet or shrapnel or a concussive force striking and killing. The reason it seems inappropriate is because it is archaic.

P.S. The better past tense is slew, not slayed.


Apparently "slay" connotes a lot more to you than it does to me.

I do find it an odd (but probably not wrong) word to use for a bombing murder, but the rest seem not at all strange to me.


The article briefly touches it but doesn't really expand on "the", which is every bit as weird to outsiders as German genders or Karbadian "I mean it".

So, "the father" is something like "Not just any father, but a particular father, and we can assume that we both know which particular father we're talking about."

Next time anyone says "language X is so weird, they have an expression for Y!" think about how to explain "the" to, say, speakers of Mandarin. :)


There are equivalent articles in Chinese.

a -> yige "一个"

the -> zhege "这个"/ nage "那个"

And of course words like "my", "our" which are used in place of articles are there as well.

You might not need the articles when the context is very clear (which causes problems for Chinese speakers when they try to learn English), but the concept is there. I believe it's the same in Japanese. Can't comment on other languages though.


I'm sure there are Chinese words that can serve the same purpose. The point is that they aren't mandatory.

So, to an English learner coming from these languages, a sentence like "I was cleaning attic and found book" is perfectly understandable. Exactly what attic and what book we're talking about should be apparent from context.

On the other hand, in actual English, "I was cleaning the attic and found a book" portrays a different picture from "I was cleaning an attic and found the book." Use one when the other is warranted, and it sounds weird and confusing.


But then we have "I was cleaning house and found Kevin." too.

Both "a Kevin" and "the Kevin" would feel weird, especially "the Kevin". One might do it on purpose as a sort of joke, implying that Kevin is an ordinary inanimate object.

More oddness is that "cleaning up" doesn't involve an "up".


"You'll never guess who I saw?"

"Who?"

"Elvis"

"Elvis? The Elvis or a Elvis?"

"The Elvis"

The doesn't imply someone is inanimate, it just means that they are a specific one.


Interestingly, Latin also lacked both "a" and "the", whereas their equivalents appear in Romance languages more than in English.


A consequence of the Romance languages simplifying/removing declensions, I suspect.


Don’t those mean “this” and “that” ? “This”, “that”, and “the” are similar because they’re all determiners that express definite ness (whereas “a” is a determiner that expresses indefiniteness), but mean something clearly different in English.


FWIW, in Italian, and I think other romance languages, definite aricles come from the words used for 'this' although evolved to something district.


这个 / これ mean more "this" whereas 那个 / それ or あれ mean more "that" (with a finer distinction in the Japanese). Their meanings are relative to the speaker - there is no absolute definite article. (I'll say too, even these relative specifiers were seldom used in classical Chinese.)


> I believe it's the same in Japanese. Can't comment on other languages though.

English is quite varied; takes similarly to eliding subjects, at times.


In languages without definite articles you can always use a word like "this" or "that" when clarification is needed.


Der Vater oder ein Vater? German has a definite article and an indefinite article just like English.

Now, I don't generally have to worry about gender (per se) in most of my nouns but it is bloody hard remembering whether a table is masculine (DE) or feminine (FR, IT).


To complicate things, in Italian "table" exists both as "tavolo" (male) and "tavola" (female). The female version is more often used for "plank" than "table", except in some specific expressions like "andare a tavola" (let's sit at the table to eat). No idea why.


There was a Reddit thread about something like this and some bilingual people said the language they speak changes their personalities to a degree.

For example someone raised in a language - say French in France - and then learnt English and moved to the US and finding than he is being more assertive or direct when speaking English than when speaking his mother tongue.


A more reasonable explanation than "languages influences personality" is just the change of context. You speak different languages in different contexts.


You do, but, when you're in a context where you're speaking a language, you're also thinking in that language. I wouldn't be surprised if that has an influence on one's thought processes.

Concrete example: I have less of a tendency toward hyperbole when I'm speaking my 2nd language. The reason why is simple: I don't know many hyperbolic idioms in that language, so I lack the ability to express myself hyperbolically. But I think it goes a step further: Lacking the vocabulary, I have a hard time even having hyperbolic thoughts when I'm thinking in that language.

(That said, I don't want to suggest that this is somehow an aspect of the language itself. It seems more likely that native speakers have about the same range of expression as native speakers of any other language, and I'm just lacking it because of my limited familiarity.)


This author makes the case that a language may be inherently better or worse at expressing certain notions.

https://lithub.com/on-living-and-thinking-in-two-languages-a...


I always wonder if our language can influence our actual behavior. For example, think of all the different ways you feel ‘hungry’. Well, what do you do when you’re hungry, you ‘eat’.

Is it possible that your decision to ‘eat’ becomes independent from the impetus and a goal of its own? To the point where you’re now preparing food to ‘eat’ rather then to precisely address the sensation/discomfort that led you to that decision to begin with?

In other words, is it possible that languages that allow you to more or less precisely describe/categorize something effect how we think about them and ultimately act in response?


Something that I find interesting that even the manner of talking - speed, intonation, volume - can be extremely different between languages, and not readily recognizable as the same person talking.


No mention of Ithkuil, a conlang notable for attempting to maximize efficiency of expressiveness.

https://en.wikipedia.org/wiki/Ithkuil


Leibniz also dreamed of something similar: https://en.wikipedia.org/wiki/Characteristica_universalis


After reading on the Circassian language in the article, I can't dismiss Ithkuil again.

No wonder some Russians are attracted to the conlang...

Give me Toki Gonna though!


More on Whorfianism and why it's (mostly) wrong, if this article piqued your interest: https://plato.stanford.edu/entries/linguistics/#Who


"Languages differ, not in what they can express, but in what they must express."

(Roman Jakobson, I think)


IMVHO the most efficient language is the one all involved parties in a communication speak, understand and can use to express as clearly as possible anything they want.

So IMO German is the most efficient one for Germans native speaker's between them, French for Frenches etc. The thing we really need is an international, artificial language, as simple as possible for anyone in the world to be used ONLY for docs and international communication alongside their native one to avoid advantage a country above others.


> Other languages occupy still other places on the linguistic axis of “busyness,” from prolix to laconic, and it’s surprising what a language can do without. In Mandarin Chinese, a way of saying “The father said ‘Come here!’” is “Fùqīn shuō ‘Guò lái zhè lǐ!’”

The writer seems to have copy pasted the phrase into Google translate. The phrase ‘Guò lái zhè lǐ!’ is simply incorrect grammar in mandarin. I'm unsure of how accurate the rest of the piece is.


For non-speakers, ‘Guò lái zhè lǐ!’ translates to "Come here to here", which is definitely a bit awkward. Usually `Guò lái` is sufficient since "here" is implied, similar to how you can simply say "come" in English.


Guò lái, zhè lǐ

Is fairly common in daily life.


Something that hasn't been brought up yet is sign language.

In New Zealand Sign Language, the sign language that I know and use, it is possible to reduce "go over to the other side of the room, pick up that suitcase, and bring it back over here to me" down to simply pointing at the suitcase, imitating picking the suitcase up by its handle, and then pointing downwards in front of oneself, all whilst looking at the interlocutor, possibly with eyebrows raised towards the end of the sentence.

To actually see it, it's a very efficient way to express not only a command but what object is desired without having to actually name it, where it should go, who should carry it out, and the eyebrows might imply that it is a request.

That said, it can be cumbersome and inefficient in many use cases; I would rather read English text when trying to quickly find a piece of information in an article than rewind or fast-forward through a video of somebody signing the article. In a real-time visual modality, though, it's very efficient.


McWhoter has fascinating podcast on Slate.com (sans the #$&*() Slate ads) All a couple of great popular science books on language.


All languages are complex in some way or other.

Japanese is rather simple in many ways (e.g., number of sounds, syllabic alphabets) but very complex in others (Kanji, counters, and so on).

Spanish too has few sounds, a Latin alphabet, simple accent rules, there's generally only one way to write any word you hear pronounced... but it has thousands of irregular verbs, four regular kinds of verb irregularities, and irregularly irregular verbs.

English has more sounds, spelling is a mess, has no diacritical marks so you just have to know how written words are to be pronounced, and has developed or and imported hundreds of thousands of words, but its verb conjugation is simpler than most every major European language.

A language without verb tenses can still express complex time relationships -- it's just done differently than in a language with verb tenses.


Seems to me that there's room for improvement in this article. For example, the author does not try to compress any written language samples. If you wanted to find a "most efficient" language, you'd compress samples, maybe normalize them with the uncompressed length because it's hard to get the same thing written in different languages. My guess would be that natural languages would compress quite a bit, with English close to the least compressible. Artificial languages like Ithkuil and FORTRAN would be much more compressible, putting lie to Dembski's assertion that "natural" information is more likely to be incompressible.

EDIT: I think English would be least compressible. But I suspect that all written languages would compress quite a bit.


Why do you think English would be least compressible? Is that based on conjecture or have you investigated this? Why would artificial language be more compressible? That seems completely orthogonal to me (by definition, an artificial language can be designed with whatever properties you choose). Fortran may be more compressible due to its limited set of keywords, but it's my impression that Ithkuil is by design more information dense and thus harder to compress than English.

The most efficient language is the least compressible language only in a narrow and arbitrary sense of efficient. There are many considerations such as what is efficient for the speaker, the hearer, redundancy to noise, efficiency with respect to particular purposes, etc. We can assume that natural languages will generally make a good trade-off across these factors, and searching for the most efficient language in one particular narrow sense is not very useful. Moreover, compression of text focuses only on surface form, completely ignoring the dimension of meaning.


I live in the USA. We get labels in English, French and Spanish so that products can be sold in Canada and Mexico. The English labeling is almost always visibly shorter than the French and Spanish. So I hypothesize that English would compress less.

My conjecture is that artificial languages will be more compressible because they haven't had time to get honed down, like English losing "thee" and "thou", that personal mode of address. Esperanto and Loglan are completely regular, which natural languages are not, and thus has a lot of use-cases where the regularity doesn't matter - they haven't had time to lose the mostly-unused features.

For better or for worse, compression of text only uses the surface form to compress, because that's the level that compression works on - letters or bytes or some other unit. You can't compress meaning. Meaning doesn't exist per se: colorless dreams sleep furiously, after all. That is, you can use perfectly sensible words and letters and even legitimate syntax, and still create strings devoid of meaning. A document consisting of perfectly spelled words, and legitimate syntax, yet without meaning like the colorless dreams sentence, will compress identically to ordinary text with the same orthographic and syntactical validity.


> If you wanted to find a "most efficient" language, you'd compress samples, maybe normalize them with the uncompressed length because it's hard to get the same thing written in different languages.

It's been generally done, just by determining how many syllables are necessary to express an idea; it also turns out to be pretty easy to measure IIRC, because people express ideas at a constant rate across cultures, so if you speak a less efficient language, you speak faster, and if you speak a more efficient language, you speak slower.

Romance languages aren't very efficient; English is pretty efficient, and again IIRC, the most efficient common language is Vietnamese, which can express ideas in about 2/3 of the verbiage that English needs.


How does Vietnamese manage to eliminate that much redundancy? What parts of language have they done away with?


You could compress the written language in UTF-8 or the IPA version of the languages words and potentially get very different results. Also, languages tend to be spoken at about the same bitrate even though they may have different numbers of bits per syllable. Spanish and Japanese, limited to pure vowel sounds and neither tending to pile up the consonants, tend to have less information per syllable but be spoken rapidly. English, with its diverse vowels and relatively consonant-rich syllables, and Chinese, with its medium number of phonemes but added layer of tones, tend to be spoken more slowly.


The author of the article would have been wise to include this information.

I did find "Clustering by Compression", https://arxiv.org/abs/cs/0312044

The emphasis in this paper was not on "efficiency", but rather on phylogenetic grouping. The authors used the "Universal Declaration of Human Rights", which apparently has 52 translations as the subject text. See Fig 13 and section 5.2 for details. Instead of just compressing and normalizing, they developed a normalized compression distance, which involves concatenating two texts, compression, and division by the largest of the two texts, as compressed alone.

I think that using the same encoding of all the languages and some kind of normalizing would wash the results of differences due to encodings.


Umberto Eco's "The Search for the Perfect Language" comes to mind

https://www.wiley.com/en-us/The+Search+for+the+Perfect+Langu...


"Moreover, anyone who has sampled Chinese, or Persian, or Finnish, knows that a language can get along just fine with the same word for “he” and “she.”"

English (at least in en_GB) can coerce they into a gender neutral form of he or she. Sometimes it will need a bit of sentence re-arrangement and other tricks but will still be natural.

I was chatting to my aunt and they said: "Ooh, you are awful." My dad looked at me and said "crack on". In the first sentence they is substituted directly for she and in the second sentence he is dropped entirely from: "and he said".

I'm sure other languages have similar tricks. In this case I don't think it is even breaking the rules.


Depends on whose rules and to what extent those rules are being enforced.

I think 'they' as a gender-neutral pronoun is widely accepted in speech and may have greater acceptance in writing than once it had thanks to modern conceptions about inclusiveness in language, especially by those who challenge/have challenged the notion of gender as a binary.

For me, that use of 'they' shall remain incorrect in writing (by virtue of introducing ambiguity relating to number) except in reported speech and prose that intends to invoke a casual, perhaps intimate, feeling in the reader; but perfectly acceptable in speech itself.

That makes it different to the other languages where gendered pronouns don't exist at all, or are so limited in actual use that they might as well not exist.


> I think 'they' as a gender-neutral pronoun is widely accepted in speech and may have greater acceptance in writing than once it had

Well, less than it had before the Victorian effort to forcibly re-engineer English in the image of Latin, but more than it had at the height of influence of that effort, whose influence is still waning.

> For me, that use of 'they' shall remain incorrect in writing (by virtue of introducing ambiguity relating to number)

The ambiguity of number ship pretty much sailed with “thou”.


"It is unclear which of the generals passed intelligence to the enemy, but their action had grave consequences for the war effort." How would you rewrite this sentence to be grammatically correct?


What I find fascinating is that NLP tools today can learn to translate between two languages even without a dictionary or a parallel corpus. That says something about the relations that exists between languages.

https://code.fb.com/ai-research/unsupervised-machine-transla...


Try using that thing against Bahasa Indonesia, the fastest language I'm aware of, and let's see how intelligent it really is.


As a speaker of both Bahasa Indonesia and Mandarin, I'd say Mandarin is faster.



This article made me wonder about not just the density of information in words but time. Just as with compression with computers, you have to balance compression ratio with the speed to compress and decompress. So a slightly more verbose language might actually convey more information per unit time because it's easier to understand and speak.


This sounds a lot like what people say about computer code. Write it to be understood, not to be "clever" with a minimum lines of code.


0. Take all concepts, modifiers and verbs

1. Make a histogram of their usage

2. Huffman encode them based on what sounds/inflections/etc. are easiest to make and most distinct

3. Add an ECC syllable to each word or phrase

4. Set straightforward, simple semantic rules so that it's mechanical, obvious and unambiguous

Relish in the glory of the next Esperanto.


How would this be translated into Riau?

"Sorry, Charlie. Starkist doesn't want tuna with good taste, they want tuna that tastes good."


In general, you can expect that anything that is grammatically encoded will be able to be expressed in a language that does not encode that distinction with periphrasis.

For instance, Khwe has five past tense time distinctions that are grammatically encoded in verbal suffixes:

    /-hã/: the recent and immediate past
    /-tà/: the morning of the day of the utterance
    /-ǁ'òm̀/: a "remote past" before today
    /-tĩ/: another "remote past", for a few days or weeks ago
    /-hĩ/: a "far remote past", used to refer to things that happened years ago or even in mythical times.
Does this mean English cannot be used to distinguish events that happened a few days ago vs. a few years ago? Obviously not: we just resort to using expressions like "a few years ago", since we don't have word-parts (morphemes) that encode this variable.

The example you give seems complicated, but this is actually mostly a matter of lexicon rather than grammar: it is an accident of English's history that "taste" can be a noun meaning 'aesthetic preference' and it can also be a verb meaning 'having a flavor'. The two noun-phrases in that sentence are "tuna with good taste" (a noun with a prepositional modifier) and "tuna that tastes good" (a noun with a restrictive relative clause modifier). Riau Indonesian definitely has relative clauses, and I think it also must allow modification of nouns by prepositional phrases, so the only interesting difference between English and Riau Indonesian for this sentence would be the fact that "taste" would (most likely) no longer be the same word in the two phrases.


Yes, thank you.

> the only interesting difference between English and Riau Indonesian for this sentence would be the fact that "taste" would (most likely) no longer be the same word in the two phrases.

Which is I guess my point: puns don't usually translate. So is the underlying reality changed? (I used to wonder about this briefly at times when I was studying Russian in college.)


It’s been an hour and no one’s provided a translation, so I’ll take a stab at it:

Maaf, Charlie. Starkist enggak mau ikan tongkol yang canggih, tapi mau ikan tongkol yang enak.

I’m not a native speaker so what I wrote above is probably riddled with errors, but it won’t be long before a native speaker or someone with a doctorate in the Riau dialect of Indonesian comes along and corrects it. This is the Internet after all :)


Yeah this is not one of the Riauan dialect, it is in THE high prestige Riau-Johor dialect that became Indonesian though.

In my everyday language, modern Jakartan: Sori Charlie, Starkist gak mau tongkol yang dipilih-pilih, tapi tongkol enak aja.

[How do you translate with good taste?]


Curiosity got the better of me. Google translate, detect language. "Sorry, Charlie. Starkist doesn't want sophisticated tuna, but wants good tuna." Not bad.


Linguistic relativity




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: