Hacker News new | past | comments | ask | show | jobs | submit login
A Simple Structure Unites All Human Languages (nautil.us)
167 points by dnetesn on Sept 20, 2019 | hide | past | favorite | 53 comments



Steven Pinker's book ‘The Language Instinct’ talks about pretty much everything a layman would want to know about the basics of language, in layman's terms. It covers syntax, Chomskian grammars, prescriptive vs descriptive grammar, history and relations of languages, Sapir-Whorf hypothesis, biological and evolutionary basis for languages, language acquisition in children, differences between human languages and animals' attempts (spoiler: Pinker doesn't give any credit to claims that animals learn language). Big plus of the book is that it uses Chomsky's theories extensively but explains them without need for specialized knowledge (it's explicitly mentioned how Chomsky is unreadable without linguistic training). All in all, much recommended. The book even works well as an audiobook.

Also: in preparation to diving in Kafka's books I learned about a peculiar feature of his style:

> Kafka often made extensive use of a characteristic particular to the German language which permits long sentences that sometimes can span an entire page. Kafka's sentences then deliver an unexpected impact just before the full stop—this being the finalizing meaning and focus. This is due to the construction of subordinate clauses in German which require that the verb be positioned at the end of the sentence.

>> “Als Gregor Samsa eines Morgens aus unruhigen Träumen erwachte, fand er sich in seinem Bett zu einem ungeheuren Ungeziefer verwandelt.” (original)

> “As Gregor Samsa one morning from restless dreams awoke, found he himself in his bed into an enormous vermin transformed.”

There's a neat picture illustrating the difference in the order of the parse tree: https://en.m.wikipedia.org/wiki/Franz_Kafka_bibliography#Eng...


Thanks for the book recommendation and the link.

From the Wikipedia article:

> German also lacks an informal language register

Can someone provide more insight into what this refers to? There are definitely less formal or technical sounding word variants in German, and of course duzen/siezen to add another level of formality, so I'm not sure what this could refer to.


The other comment is not quite right, a ‘language register’ is not a dialect, even though apparently the whole classification is difficult and imprecise due to the nature of languages as a continuum. A ‘language register’ means a variant, choice of words, that are used in specific situations or settings: https://en.wikipedia.org/wiki/Register_(sociolinguistics)

So, afaiu an ‘informal register’ would be something like brospeak or language spoken at home and among friends, contrasted to that spoken with strangers and at work. But I don't know what the situation is in German. With English and Russian, every generation and each subculture invents their own slang just to differentiate themselves―can't imagine how any country would avoid developing informal language, considering the existence of Oktoberfest.


> what this refers to

I'd assume it's about the dialects, which are very different from the "official" language, so much that Jerome K. Jerome had a joke in one of his books, more than century ago:

"Germany being separated so many centuries into a dozen principalities, is unfortunate in possessing a variety of dialects. Germans from Posen wishful to converse with men of Wurtemburg, have to talk as often as not in French or English; and young ladies who have received an expensive education in Westphalia surprise and disappoint their parents by being unable to understand a word said to them in Mechlenberg."

"Modern times" and technologies suppress the dialects and with each generation the portion of the local-specific dialects is being lost.


In German, speaking in different registers actually changes the verb grammatically. Kanst du vs Können Sie vs Könnten Sie.

Could you (low register to someone of equal or lower status) vs Could you (medium register to someone of higher status) vs Might you (highest register to someone of higher status). In English I had to change to a different verb, non grammatically. In German these are all the same verb with a different rule applied.


Fascinating. I always felt there was something special about his sentence structure but could never quite put my finger on it. Other than their length, that is.


Reminds me of how Yoda speaks in Star Wars.


I thought previously that Yoda's speech is an emulation of anastrophe frequently seen in poetry. However, as the post article notes, generally the OSV order is barely ever seen in languages.


> “As Gregor Samsa one morning from restless dreams awoke, found he himself in his bed into an enormous vermin transformed.”

You know, this actually makes sense. I think that after a couple of hundred pages, this would just seem like an easy-to-read, natural alternative word ordering.

Then, if the author started substituting German words here and there, starting with the obvious ones, then ones English words are derived from, and so on... before you know it, you'd be reading German!


That chart is very neat. Thank you.


I know that Chomsky knows about inflectional morphology and so I'm sure that his theory does try to account for it, but I was frustrated that all of the examples here were only about word order. The author said

> Word order is, of course, is far more complex than I’ve shown here. There are languages with very free word order, and even within languages there are many intriguing complexities. However, this idea, that Merge can both combine bits of language, and reuse them, gives us a unified understanding of how the grammar of human languages works.

But none of the examples in this simplified account even gestured at noun case, or at the prospect of expressing subject (or agent) with verb conjugation, or at feature agreement.

Is there a straightforward way to understand why Chomsky thinks that this approach addresses those phenomena?


Many linguists believe the same set of principles act on sub-word units, a theory called Distributed Morphology or DM[0]

"'Syntactic Hierarchical Structure All the Way Down' entails that elements within syntax and within morphology enter into the same types of constituent structures (such as can be diagrammed through binary branching trees). DM is piece-based in the sense that the elements of both syntax and of morphology are understood as discrete instead of as (the results of) morphophonological processes."[1]

So a rather simplified way to think about it is that each word is a little mini-tree and Merge operations create a branching structure between its constituent parts. Each of those words is then part of the larger sentence-level tree. The important this is that Merge is acting on the units at both levels.

Similarly, Merge can act on structures that are the output of previous Merges, allowing you to have a verb, in a particular conjugation (it's sub-word structure), that selects for a particular type of supra-word structure (a tree) that's headed by a noun with some set of features, eg a particular case.

Another point that is kind of alluded to in the article is that you can create movement from one part of a tree to another with Merge. In previous theories of syntax, there always needed to be both something like Merge and a special "move" operation. But Merge simplifies things quite a bit in that regard.

[0]https://en.wikipedia.org/wiki/Distributed_morphology

[1] https://www.ling.upenn.edu/~rnoyer/dm/#how%20DM%20is%20diffe...


Whoa, that's amazing!

Can you explain a little more about how something like an agreement rule would be analyzed in this framework?

I suppose I didn't quite understand your "that selects for a particular type of super-word structure (a tree) that's headed by a noun with some set of features". Is this sort of akin to a type system in programming? Like the verb is only willing to bind with a subject noun phrase whose head has a particular feature?


In this framework, you can think about a morpheme as being a tuple of features. Like you said, it is sort of akin to a type system, where passing the wrong type to a function won't work. A morpheme will select for a feature or set of features from whatever its merged with, and won't merge with something it doesn't agree with.

I think using a language which doesn't care much about word order will be illustrative here, so let's use Latin:

puella vidit canem

The girl sees the dog

We can break this down into:

puell - a vid - et can - em

girl - NOM.S.FEM sees - S.3 dog - ACC.S.FEM

So 'puella' is the tuple of features [+NOM, +S, +FEM], 'videt' is [+PRES, +ACTIVE, +S, +3], etc.

Here, we want to do a Merge with 'puella' and 'videt': we say that 'puella' selects for the features +NOM, +3, +S (nominative, third person, and singular) in its verb, but doesn't care about the others. It can still agree with its verb if the verb is passive or in the past tense. But if a verb is conjugated in a way that violates the features it selects for (eg the verb is conjugated as first person plural), 'puella' won't merge with it.

As you said, a phrase level structure will have the features of its constituent parts bubble up to it. So once we've done the first Merge with 'puella' and 'videt', our structure is now selecting for a noun phrase that has the feature +ACC. Because 'canem' meets this requirement, we can get the final Merge necessary for our finished sentence.

{ { puella, videt }, canem }

Note that this account still works if we change the order of the sentence to any configuration, we just need to reorder the merges.


Thanks, that's really neat (and since I know Latin, the analysis made perfect sense to me).

I suppose that you could, for example, account for the different conjugations and declensions by saying that they are also features of noun and verb stems that have to agree with endings that want to bind with them, right? Like "vid-" and "-et" is not just "sees - S.3" but also something like "see [+2conj]" and "S.3 [+2conj]" allowing them to bind with each other, where "-at" might be "S.3 [+1conj]" so it could bind with "am-" being "love [+1conj]", while "-et" doesn't bind with "am-" (except when interpreted as a different lexical item that adds [+subjunctive] to a [+1conj] stem?).

My next question is whether there are tools to facilitate writing parsers with this framework because it makes me want to write a Latin parser and see how well it does (and maybe how many formal syntactic ambiguities exist in Latin texts that we might not even notice most of the time).


Yep, but, as soon as you exit very simple phrases subject/action/object, the approach may become complex to apply in practice, a couple (known) latin (tricky) examples (JFYI):

mala mala mala sunt bona

Soli soli soli


Unfortunately that's true, which is why linguistics is a field of study with its own journals, rather than something that can be summarized neatly in the space of an HN comment :P

This model really can account for quite complex language data though. For example, check out this account of auxiliary verbs in Basque: https://www.academia.edu/3112898/A_Distributed_Morphology_An...

Speaking to your examples: "mala mala mala sunt bona" isn't particularly difficult to analyze this way, you just need to realize that the "mala"s are different words (kind of like the famous English "Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo" sentence). If I remember the proverb correctly, it means "apples (mala) are good (sunt bona) for a painful jaw (mala mala)".

You need an analysis that allows adjectives to Merge with nouns iff they match case, gender and number, so that allows us to create a noun phrase "mala mala" in the instrumental ablative. Then you need a way to have the case, gender and number of subject bubble to the top of the phrase it will make with an auxiliary so that the adjective after the auxiliary is feature restricted to that case, gender and number. Once the elements of the auxiliary verb phrase have Merged, you get:

{{ mala, sunt }, bona }

Finally you have a rule that allows auxiliary verb phrases to Merge with noun phrases headed by an ablative. If you want the first "mala" to be the subject, then re-Merge it with the whole sentence so far, which in effect moves it to the top of the tree, leaving a trace in its original position.

I'm not sure what the second example means. My best guess is that it's the dative singular of 'sol', a matching masculine dative singular of 'solus' and a genitive singular of 'solum', so something like "for the only sun of the land". If that's correct, you need our previously used rule for Merging adjectives iff they match the noun in case, gender and number. Then you can add an additional rule that genitive nouns can be Merged with noun phrases (without any feature selection needing to take place) to form a new noun phrase.

Hopefully that shows that Merge and feature selection as mechanisms can be used outside of toy models, to actually account for real data.


Your translations/guesses are correct, "mala mala mala sunt bona" is afaik an invented phrase, not entirely unlike "I Vitelli dei romani sono belli" (which is bilingual Latin/Italian, meaning in italian "The calves of the Romans are beautiful" but meaning in latin "Go, Vitellio at the sound of the Roman war god") to trick/have some fun of Latin students, while "Soli soli soli" was a phrase sometimes inscribed on sundials.

Anyway, yes, the Merge and feature can work just fine outside of "toy models" the note was about about they soon becoming complex.


This is still wrong. Nouns and verbs can change irregularly depending on each other.

It's like people look only at their own language and try to pretend that same structure must be universal.


From the examples that I saw in this thread and on Wikipedia, it seems like this framework does try to account for this complexity by positing more specific rules that require those changes to happen in order for the Merge to be allowed. Since I just heard about this for the first time, I'm not sure I can justify the claim that it succeeds for every aspect of every language, but it looks like the people promoting this approach are aware of quite a bit of the possible complexity of language.

Can you suggest a more example of a linguistic phenomenon that you think this framework can't deal with?


I'm not a linguist and accordingly haven't read Chomsky proper, but let me take a stab from what I know:

Chomsky isn't concerned with just grammar of written or spoken languages―instead he shows that these grammars map to a universal mental grammar. Different physical languages deal in different ways with constructs of the mental grammar: English expresses the same concepts and relations as inflections, but uses helper words for that. Correspondingly, where you build a grammar tree from words in English, you build it from roots and affixes in inflectional languages. There are languages in which a single word with a bunch of affixes equates to an English sentence.

Regarding “expressing subject (or agent) with verb conjugation”: this should just translate straightforwardly to an implied subject. Even incomplete sentences in whatever language have their implied subjects, verbs and objects―usually deduced from previous speech. Pinker in ‘The Language Instinct’ has a great example of how casual speech is very compact compared to legalese writing where you have to explicitly write out everything in anticipation of ambiguities and adversarial reading.

Chomsky also has a concept of ‘trace’ which is an ‘invisible’ word in a sentence and refers to a previously mentioned word. E.g. in “the spoon that I'm eating soup with,” there's a hidden member: “the spoon that I'm eating soup with <trace>,” and the trace refers to the spoon, helping to build the grammatical tree. I think this concept is hinted at in the article with Gaelic “caught boy caught fish.” Afaiu the ‘trace’ is different from sub-word and implied entities, but it helps to illustrate how the mental grammar isn't the same as written one.


> Regarding “expressing subject (or agent) with verb conjugation”: this should just translate straightforwardly to an implied subject.

Yes you can express any set as a tree (in fact as many possible trees) but if different possible tree representations of that set are equally as valid - then the tree structure isn't really there, it's just an arbitrary interpretation imposed on the data.

If a sentence A B C can be reordered to any permutation, and some parts can be dropped without changing the meaning - then what lets you decide that the proper representation is this tree:

       .
     /   \
    A     .
         / \
        B   C
and not this:

          .
        /   \
       .     C
      / \      
     A   B    
And if any of these is as valid as any other - why insist that the underlying structure is even a tree?

Another hint that these kinds of interpretations are wrong is that most succesfull NLP software typically uses word vectors which ignore the tree-like structure and just treats the data as unordered set :)

To understand a sentence it is more important to know all the words in it, than to know how they are nested. Even for analytical languages like English :)


Felt the same way. I'm not an expert on language by any stretch, but even I know the role of a noun in Latin is determined by its case rather than its position.

Also is 'Merge' a thing? Or has the author just blessed his concept into proper noun-hood? It's not really explained.


For sure. Merge is the most basic primitive for creating metadata. If you are familiar with RDF, which uses Subect-Object-Predicate triples that's more similar to Chomsky's original X-bar theory which calls these triples Head-Complement-Specifier. The difference is that Merge sees the triple as a derivative of the merge primitive and X-bar sees the triple as the primitive. Both are about the fundamental nature of metadata or data linkage/relationships and how meaning is derived. Both can be applied mostly successfully down the layers of the language stack - semantics -> syntax -> morphology -> phonology -> phonetics.



The merge theory is ugly and opportunistic. Chomskian linguistics began as an attempt at using formal-language machinery (context-free phrase grammars mostly) for describing natural languages. It kinda worked with some modifications in the form of Government & Binding theory, but Chomsky wanted it to be not only a decent formalism, but also a valid theory of language in the mind. Therefore, in the 1990s he said: wait, let's get rid of all the complex stuff (deep structure, surface structure, layers of derivation) and restate everything in terms of one and a half operations (merge and move; move is a kind of merge, therefore 1.5). It _looked_ like it greatly simplified the analysis of languages, but then it turned out that to actually build syntactic parse trees for even quite simple sentences and somehow explain grammatical case assignment, for instance, one needs to augment the basic system with all kinds of hideous bells and whistles (arbitrary "feature checking", literally dozens of "functional projections", weird sequences of merge & move operations, etc.). Contemporary research in computational syntax truly and firmly abandoned contemporary Chomskian linguistics (aka the minimalist program) and is much closer to more traditional types of formal syntax theorising such as HPSG, simply because minimalism is an incoherent unformalisable incomprehensible mess. The title of this article is just a hoax.

A good overview: http://langsci-press.org/catalog/book/255

Edit: grammar


> Other animals, even extremely intelligent close evolutionary relatives like bonobos and chimpanzees, treat sequences of words as sequences, not as hierarchies. The same is true for modern artificial intelligences based on deep learning.

That's only true when you think about LSTMs, but stacked CNNs, tree-LSTMs, graph neural net pooling and attention layers can do hierarchical aggregation. Hierarchical representations have been at the centre of many papers. There's even hierarchical reinforcement learning for describing complex actions as composed of simpler actions.

And trees are not good enough to represent language. Graphs would fit better because some leaf nodes in the tree resolve or refer to nodes on other branches (e.g. when you say He referring to the word John present in another place in the same text).

http://www.arxiv-sanity.com/search?q=hierarchical


Kind of light, but it makes me think of the methodology I adopted for learning Japanese, which was to basically ignore the ordering of things. Don't get me wrong; I speak the words in the right order. It's just that I didn't try to memorize the ordering of things while learning the language. Because, as I think the article points out, your brain is pretty good at figuring that out without trying.

The most helpful thing I think you can do while studying language, other than placing yourself in real world scenarios where you use it, is to just read (or make up) example dialogues. Words by themselves aren't that helpful because they're often out of context. Sentences are better, but even they benefit from being embedded in a larger structure such as a dialogue or paragraph. I guess the point is, the more context, the better. Or, as the article would say, the more merging the better.


Example from the article in Polish:

to drink wine = "pić wino"

I drink wine = "piję wino" or "ja piję wino" or "piję ja wino" or "wino piję ja" or "wino ja piję" or "ja wino piję"

boy caught fish = "chłopiec złapał rybę" or "rybę złapał chłopiec" or ...

There are many languages in which word order doesn't matter, it's the changes to the words that encode role of the word in the sentence, and you can divide the sentence into many alternative hierarchies (and certainly not all of them are strictly binary - some parts of the sentence are trinary or even more complicated, imposing binary structure on them is artifical and misleading).

I think this theory is very lacking in predictive power, says very little about supposed "universal" language and still isn't really universal as there are all sorts of exceptions.

Natural languages doesn't follow formal grammars strictly, especially not such a simplistic one.


For those wanting these ideas described in a more mathematical "language" [0-1] are a good start. If you are interested in studying objects and how they compose, you are probably interested in category theory.

[0] https://golem.ph.utexas.edu/category/2018/02/linguistics_usi... https://arxiv.org/pdf/1809.05923.pdf

[1] https://arxiv.org/pdf/1809.05923.pdf


Wow! Thanks a lot. This is a very fascinating application of category theory.


An interesting article on a possible counter-example: https://www.newyorker.com/magazine/2007/04/16/the-interprete...


This is a widely cited counter-example, but I think most linguists think the case is closed regarding whether this is actually a counter-example to the Chomskyian program. Andrew Nevins and David Pesetsky have an extremely convincing rebuttal of Everett's claim.

http://semantics.uchicago.edu/kennedy/classes/s07/myths/nevi...

https://www.academia.edu/3112859/Evidence_and_argumentation_...


In ‘The Language Instinct,’ it's explained why Chomskian mental grammar is likely to be universal, despite isolated counter-examples: pidgin dialects have crappy grammar and are bad at recursion, but children who begin by learning a pidgin from adults, soon develop it into a creole with full-fledged grammar that has all expected features. This happens even when the children speak no other language, e.g. deaf children who learned pieces of a sign language from hearing adults.


This article only talks about the structure of writing in languages, which is a pretty different thing than "all human languages".

The definition of language is (according to Google), human communication.

For there to be communication, you need a recipient. Which means that mere written words have no meaning without someone reading and interpreting them.

You can analyze syntax and structure all you want, but meaning depends on people's interpretations, which are subjective and depend on multiple other factors.

For instance, if I'm angry, I'll read a text message or an email and interpret it in a completely different way than if I'm calm. The meaning I interpret will also depend on who sent me the message. Neither of these things are captured in the syntax of the messages.


That's a puzzling take. I see where you went with it, but I'm inclined to think of writing as already including the recipient. Such as when I write notes for myself. Often, I'm keen to do this in instances where I find a thought worthwhile but also quite subtle or nuanced and therefore forgettable. Sometimes, when I reread these notes quite a while later, I'm surprised at what was occurring to me at the time and happy, sad, amused, or even befuddled at what was going on in my head at that moment. My rereadings are also colored by the context shifts you mention. But in those instances where I didn't succeed in conveying my meaning in a way I can reconnect with later, I don't think I'd consider that note not language, but rather a use of language that failed. It's still language. Even if its communication value is less than desired, it's a message built with the same tools as the messages that do work and pass your communication test. The fact that the intended future reader is there in the mind of the writer makes me think of it as language even before it's read, if it ever is. Otherwise, what is a written message that isn't read? It exists, but as what? How would you characterize it?


Thank you for your very thoughtful and insightful reply.

> what is a written message that isn't read?

Imagine an ancient civilization that left written symbols, but the people are long gone, there's no one that knows how to interpret them anymore.

During the time the symbols were not being seen or interpreted by anyone, what would you call them?

But more important than that, is not that the symbols can't mean anything, rather that the meaning will be assigned by the reader when they read it (not just by the syntax of the symbols, which is what the article seemed to imply). And that meaning can be very very different than what the writer intended it to be.

What I'm basically saying is that meaning/interpretation of communication/messages is fluid/dynamic. It depends on the writer, the symbols, the reader and a lot of context. It is not fully contained or captured just by the symbols in which we express it.

Using your comment as an example, your "rereadings are also colored by the contexts shifts".


Don't miss the best part, in Randy Morris' comment:

Question I the depth this analysis of. Paring rule variability or recursive process, or randomized association efficiency? Arbitrary hierarchy inherent world model categories captures actor, act, actee of. New Guinea highlands reference I languages of number large (day before yesterday), exploit possibilities almost all categorical where.


Merge is cons!


This is exactly correct! And the head is equivalent to car, the complement is equivalent to cdr, and the specifier is equivalent to cddr.


But cons operates sequentially. I thought the point about merge is that it's not sequential?


I'm not sure what you mean about Merge being non-sequential. All Merge does is take two elements α and β, then return a binary branching structure with α as the head, like so:

{α, β}

Both α and β can be either some atomic unit, eg a word or a morpheme, or the previous output of Merge (a tree).

This is their first example as rendered as cons:

(cons I (cons drink wine))

As merge:

{ I, { drink, wine } }

As a tree:

Edit: I can't get the tree to look even remotely correct in HN formatting, but you get the idea. It's in the article.


It's a thread running through the article that the big thing about merge is that it is hierarchical and not sequential. e.g., see the following passage from the article:

>> [Merge] applies to discrete units of language (words or their parts). It combines these, not sequentially, but hierarchically.

Or the paragraph at the start where human language abilities are contrasted to bonobos' and chimpanzees' and deep learning models' sequential processing; etc.

My knowledge of Lisp is rusty, but as far as I remember it cons is a list operator that joins the head to the tail of a list (like the "|" in Prolog). So it imposes an order - on a sequence. Apologies if I misremember this.


Ah I see what you mean. I think the author of that article is making the point that Merge is about creating trees, not appending words together in a more traditional sense. But S-expressions can be used to express trees, so when you're operating on trees notated as S-expressions, adding a parent node to two expressions looks like you're concatenating them.

The author is just trying to say that Merge is doing the same thing we're talking about cons doing to S-expressions to a tree and noting that it creates hierarchy. Eg a new Merge says I've created a new top-level node that is a parent for the two inputs. A second Merge says I've made the first input c-command both inputs of my first Merge and created a new top-level node.


But, if I don't misremember this also, cons creates an ordered pair, no?


As far as I understand it cons just creates pairs (which can be interpreted as lists). It doesn't impose order.


Unordered would be if cons(a, b) == cons(b, a) which is not the case


Lisp cons ?


Yes


If you assume that all meanings can be represented by a parse tree, then the claim here is straightforward - "Merge" is a constructor for a binary tree node, and the claim is that Merge can construct any binary tree. I think the trickier question is whether parse trees really do capture everything about a sentence .. my guess is not.


Well, a binary tree with unordered children.


How was "Merge" proposed in the early 1990s? Hierarchical structures and compositional semantics predate that by decades.

If the proposition was Merge as a "universal" operation, then I'd say there is no evidence that our brains implement such an operation, and that it has a very shallow stack, which is domain dependent. That makes such an operation a meaningless abstraction, not suitable for explaining anything about human behavior whatsoever.


Open-ended recursion is not quite universal. Piraha lacks arbitrary causal embedding.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: