I know that Chomsky knows about inflectional morphology and so I'm sure that his theory does try to account for it, but I was frustrated that all of the examples here were only about word order. The author said
> Word order is, of course, is far more complex than I’ve shown here. There are languages with very free word order, and even within languages there are many intriguing complexities. However, this idea, that Merge can both combine bits of language, and reuse them, gives us a unified understanding of how the grammar of human languages works.
But none of the examples in this simplified account even gestured at noun case, or at the prospect of expressing subject (or agent) with verb conjugation, or at feature agreement.
Is there a straightforward way to understand why Chomsky thinks that this approach addresses those phenomena?
Many linguists believe the same set of principles act on sub-word units, a theory called Distributed Morphology or DM[0]
"'Syntactic Hierarchical Structure All the Way Down' entails that elements within syntax and within morphology enter into the same types of constituent structures (such as can be diagrammed through binary branching trees). DM is piece-based in the sense that the elements of both syntax and of morphology are understood as discrete instead of as (the results of) morphophonological processes."[1]
So a rather simplified way to think about it is that each word is a little mini-tree and Merge operations create a branching structure between its constituent parts. Each of those words is then part of the larger sentence-level tree. The important this is that Merge is acting on the units at both levels.
Similarly, Merge can act on structures that are the output of previous Merges, allowing you to have a verb, in a particular conjugation (it's sub-word structure), that selects for a particular type of supra-word structure (a tree) that's headed by a noun with some set of features, eg a particular case.
Another point that is kind of alluded to in the article is that you can create movement from one part of a tree to another with Merge. In previous theories of syntax, there always needed to be both something like Merge and a special "move" operation. But Merge simplifies things quite a bit in that regard.
Can you explain a little more about how something like an agreement rule would be analyzed in this framework?
I suppose I didn't quite understand your "that selects for a particular type of super-word structure (a tree) that's headed by a noun with some set of features". Is this sort of akin to a type system in programming? Like the verb is only willing to bind with a subject noun phrase whose head has a particular feature?
In this framework, you can think about a morpheme as being a tuple of features. Like you said, it is sort of akin to a type system, where passing the wrong type to a function won't work. A morpheme will select for a feature or set of features from whatever its merged with, and won't merge with something it doesn't agree with.
I think using a language which doesn't care much about word order will be illustrative here, so let's use Latin:
puella vidit canem
The girl sees the dog
We can break this down into:
puell - a vid - et can - em
girl - NOM.S.FEM sees - S.3 dog - ACC.S.FEM
So 'puella' is the tuple of features [+NOM, +S, +FEM], 'videt' is [+PRES, +ACTIVE, +S, +3], etc.
Here, we want to do a Merge with 'puella' and 'videt': we say that 'puella' selects for the features +NOM, +3, +S (nominative, third person, and singular) in its verb, but doesn't care about the others. It can still agree with its verb if the verb is passive or in the past tense. But if a verb is conjugated in a way that violates the features it selects for (eg the verb is conjugated as first person plural), 'puella' won't merge with it.
As you said, a phrase level structure will have the features of its constituent parts bubble up to it. So once we've done the first Merge with 'puella' and 'videt', our structure is now selecting for a noun phrase that has the feature +ACC. Because 'canem' meets this requirement, we can get the final Merge necessary for our finished sentence.
{ { puella, videt }, canem }
Note that this account still works if we change the order of the sentence to any configuration, we just need to reorder the merges.
Thanks, that's really neat (and since I know Latin, the analysis made perfect sense to me).
I suppose that you could, for example, account for the different conjugations and declensions by saying that they are also features of noun and verb stems that have to agree with endings that want to bind with them, right? Like "vid-" and "-et" is not just "sees - S.3" but also something like "see [+2conj]" and "S.3 [+2conj]" allowing them to bind with each other, where "-at" might be "S.3 [+1conj]" so it could bind with "am-" being "love [+1conj]", while "-et" doesn't bind with "am-" (except when interpreted as a different lexical item that adds [+subjunctive] to a [+1conj] stem?).
My next question is whether there are tools to facilitate writing parsers with this framework because it makes me want to write a Latin parser and see how well it does (and maybe how many formal syntactic ambiguities exist in Latin texts that we might not even notice most of the time).
Yep, but, as soon as you exit very simple phrases subject/action/object, the approach may become complex to apply in practice, a couple (known) latin (tricky) examples (JFYI):
Unfortunately that's true, which is why linguistics is a field of study with its own journals, rather than something that can be summarized neatly in the space of an HN comment :P
Speaking to your examples: "mala mala mala sunt bona" isn't particularly difficult to analyze this way, you just need to realize that the "mala"s are different words (kind of like the famous English "Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo" sentence). If I remember the proverb correctly, it means "apples (mala) are good (sunt bona) for a painful jaw (mala mala)".
You need an analysis that allows adjectives to Merge with nouns iff they match case, gender and number, so that allows us to create a noun phrase "mala mala" in the instrumental ablative. Then you need a way to have the case, gender and number of subject bubble to the top of the phrase it will make with an auxiliary so that the adjective after the auxiliary is feature restricted to that case, gender and number. Once the elements of the auxiliary verb phrase have Merged, you get:
{{ mala, sunt }, bona }
Finally you have a rule that allows auxiliary verb phrases to Merge with noun phrases headed by an ablative. If you want the first "mala" to be the subject, then re-Merge it with the whole sentence so far, which in effect moves it to the top of the tree, leaving a trace in its original position.
I'm not sure what the second example means. My best guess is that it's the dative singular of 'sol', a matching masculine dative singular of 'solus' and a genitive singular of 'solum', so something like "for the only sun of the land". If that's correct, you need our previously used rule for Merging adjectives iff they match the noun in case, gender and number. Then you can add an additional rule that genitive nouns can be Merged with noun phrases (without any feature selection needing to take place) to form a new noun phrase.
Hopefully that shows that Merge and feature selection as mechanisms can be used outside of toy models, to actually account for real data.
Your translations/guesses are correct, "mala mala mala sunt bona" is afaik an invented phrase, not entirely unlike "I Vitelli dei romani sono belli" (which is bilingual Latin/Italian, meaning in italian "The calves of the Romans are beautiful" but meaning in latin "Go, Vitellio at the sound of the Roman war god") to trick/have some fun of Latin students, while "Soli soli soli" was a phrase sometimes inscribed on sundials.
Anyway, yes, the Merge and feature can work just fine outside of "toy models" the note was about about they soon becoming complex.
From the examples that I saw in this thread and on Wikipedia, it seems like this framework does try to account for this complexity by positing more specific rules that require those changes to happen in order for the Merge to be allowed. Since I just heard about this for the first time, I'm not sure I can justify the claim that it succeeds for every aspect of every language, but it looks like the people promoting this approach are aware of quite a bit of the possible complexity of language.
Can you suggest a more example of a linguistic phenomenon that you think this framework can't deal with?
I'm not a linguist and accordingly haven't read Chomsky proper, but let me take a stab from what I know:
Chomsky isn't concerned with just grammar of written or spoken languages―instead he shows that these grammars map to a universal mental grammar. Different physical languages deal in different ways with constructs of the mental grammar: English expresses the same concepts and relations as inflections, but uses helper words for that. Correspondingly, where you build a grammar tree from words in English, you build it from roots and affixes in inflectional languages. There are languages in which a single word with a bunch of affixes equates to an English sentence.
Regarding “expressing subject (or agent) with verb conjugation”: this should just translate straightforwardly to an implied subject. Even incomplete sentences in whatever language have their implied subjects, verbs and objects―usually deduced from previous speech. Pinker in ‘The Language Instinct’ has a great example of how casual speech is very compact compared to legalese writing where you have to explicitly write out everything in anticipation of ambiguities and adversarial reading.
Chomsky also has a concept of ‘trace’ which is an ‘invisible’ word in a sentence and refers to a previously mentioned word. E.g. in “the spoon that I'm eating soup with,” there's a hidden member: “the spoon that I'm eating soup with <trace>,” and the trace refers to the spoon, helping to build the grammatical tree. I think this concept is hinted at in the article with Gaelic “caught boy caught fish.” Afaiu the ‘trace’ is different from sub-word and implied entities, but it helps to illustrate how the mental grammar isn't the same as written one.
> Regarding “expressing subject (or agent) with verb conjugation”: this should just translate straightforwardly to an implied subject.
Yes you can express any set as a tree (in fact as many possible trees) but if different possible tree representations of that set are equally as valid - then the tree structure isn't really there, it's just an arbitrary interpretation imposed on the data.
If a sentence A B C can be reordered to any permutation, and some parts can be dropped without changing the meaning - then what lets you decide that the proper representation is this tree:
.
/ \
A .
/ \
B C
and not this:
.
/ \
. C
/ \
A B
And if any of these is as valid as any other - why insist that the underlying structure is even a tree?
Another hint that these kinds of interpretations are wrong is that most succesfull NLP software typically uses word vectors which ignore the tree-like structure and just treats the data as unordered set :)
To understand a sentence it is more important to know all the words in it, than to know how they are nested. Even for analytical languages like English :)
Felt the same way. I'm not an expert on language by any stretch, but even I know the role of a noun in Latin is determined by its case rather than its position.
Also is 'Merge' a thing? Or has the author just blessed his concept into proper noun-hood? It's not really explained.
For sure. Merge is the most basic primitive for creating metadata. If you are familiar with RDF, which uses Subect-Object-Predicate triples that's more similar to Chomsky's original X-bar theory which calls these triples Head-Complement-Specifier. The difference is that Merge sees the triple as a derivative of the merge primitive and X-bar sees the triple as the primitive.
Both are about the fundamental nature of metadata or data linkage/relationships and how meaning is derived. Both can be applied mostly successfully down the layers of the language stack - semantics -> syntax -> morphology -> phonology -> phonetics.
> Word order is, of course, is far more complex than I’ve shown here. There are languages with very free word order, and even within languages there are many intriguing complexities. However, this idea, that Merge can both combine bits of language, and reuse them, gives us a unified understanding of how the grammar of human languages works.
But none of the examples in this simplified account even gestured at noun case, or at the prospect of expressing subject (or agent) with verb conjugation, or at feature agreement.
Is there a straightforward way to understand why Chomsky thinks that this approach addresses those phenomena?