Hacker News new | past | comments | ask | show | jobs | submit login

Once you go beyond Markov chains and into vector models (Transformers or even LSTMs), next —-token-- syllable prediction can capture grammar.



That some aspects of grammar can be captured by statistical analysis was never really in dispute. The OP is slightly confused about the hierarchy part. Chomsky never said that you couldn't discover by statistical analysis that languages have a hierarchical structure. He rather said that a baby, based on the data it has available, would not be able to determine statistically that certain rules of grammar are defined over this structure rather than over the linear sequence of words.


I mean, contrary to Chomsky, that certainly advanced statistical models are able to capture all grammar - at least, nit worse than ordinary humans.

It feels that Chomsky’s understanding of ML stopped at the level of Markov chains.


Don't waste your time arguing Chomsky supporters, it's a cult. They keep wringing non-falsifiable theories out of lengthy hallucinations, but it all reveals itself to be just Trek physics later on.

He's not just problematic in politics, he is in linguistics too, just the BS is harder to notice. I suppose his compiler theories were legit, and his pioneering spirit that lead to establishment of multiple fields of researches might be too, but leading theories he'd created are just as dubious as the first man made shed built on a discovered island would be.

And the problem is not just that early speculations in a novel field are often wrong, it's that his supporters don't care. They'd regurgitate those scientific theories(tm) ad infinitum and waste resources for the whole humanity. So don't bother trying to fix them and making them motivated.


First, Chomsky (and I) were talking about language acquisition in children where there aren't billions of examples, so it's completely irrelevant if some other system can do something, the question is how do humans do it.

Second, there isn't any evidence that LLMs have captured grammar rules in any meaningful sense, just as they can't do addition or any other recursive computation.


Is there any work demonstrating this? For example, how do statistical models capture adjunct/argument asymmetries in extraction?


Attention is one of the core parts of transformer architecture, so I would be surprised if they have any trouble understanding this asymmetry.

Could you provide a testable hypothesis? I would be happy to test it on GPT4.


Sure, here are a couple of examples of ECP violations removing ambiguities.

1a. How often did you tell John that he should take out the trash?

b. How often did you tell John why he should take out the trash?

(1a) can either be a question about frequency of telling or frequency of trash disposal, whereas (1b) can only be a question about frequency of telling. I asked GPT-4 to explain how each sentence was ambiguous and it seemed to entirely miss the embedded readings (the ones about frequency of trash disposal) for both sentences, while finding some other ambiguities that were spurious (such as suggesting erroneously that (1b) could be a question about how many different reasons you gave John in a single instance).

Similarly, (2a) has both a de re and a de dicto reading, whereas (2b) has only a de re reading:

2a. How many books did Bill say that Mary should read?

b. How many books did Bill explain why Mary should read?

That is, (2a) can be asked either in a scenario where Bill has said "read 10 books!" or in a scenario where Bill has said "read Book A, Book B and Book C!" without necessarily counting the books himself. (2b), on the other hand, only has the second kind of interpretation. I've had mixed results with GPT-4 in this case (depending on exact choices of vocabulary, etc.), but it certainly makes some mistakes. For example, it says that (2b) can mean "John explained the reason for a certain number of books that Mary bought".

As the sibling comment points out, it would not show very much if GTP-4 did correctly determine these ambiguities as it has had access to much more data than a child. You would also need to show that the same statistical techniques would work when applied to a realistic dataset.


Thank you for providing these examples.

I asked GPT4o, and it has no trouble with understanding 1a: https://chatgpt.com/share/daea469f-d823-45e2-9d6b-f6bea82a26...

As a side note, my instinctive reading is on the telling frequency. Sure, one can make a garden path sentence, but (for my own ESL eyes and ears) it would be more straightforward to say, "Did you tell John how often he should take out the trash?"

2b does not feel right on its own (and I am not an AI). I can understand it, but it feels like reverse engineering rather than reading a normal sentence.


The whole issue is the difference between (1a) and (1b), not whether the AI can understand one of the sentences under some of its available interpretations. Indeed, with GPT-4o, I get the same result as you for (1a), but also a description of a spurious parallel ambiguity in (1b). Part of the trouble here is the inconsistency of results depending on the exact phrasing of the question and random variation in GPT-4's responses. I wouldn't be surprised if it sometimes gives the right answers, but I don't think it does so reliably.

(2b) is what's known in classical terms as a 'subjacency violation', so yes, it sounds imperfect. Nonetheless, native speakers agree on which interpretations it can and cannot have. GPT-4 does not have the same capacity, as far as I've been able to determine. You sometimes have to be a little creative with scenario construction for sentences like (2b) to click.

"Ok, So Bill explained why Mary should read War and Peace, then he explained why she should read The 39 Steps, and then he explained why she should read some other titles that I can't remember. I wonder just how many books Bill explained why Mary should read."

But try constructing a scenario for the other interpretation and you'll find that it's still just as bad.


Well, prompting is not a nice addition to LLM; it is a necessary thing to do.

> Nonetheless, native speakers agree on which interpretations it can and cannot have. GPT-4 does not have the same capacity, as far as I've been able to determine.

This one is an expectation, not even a factual statement. A factual statement would be "95% of English native speakers with a college degree" or so. Among less educated, the numbers could be depressingly low, even for much more straightforward tasks.

Then, the question is how a given ML model fares against real data, not against some platonic ideal.


We can see similar results with much simpler examples where the judgments are incontrovertible. Here's a classic pair from Chomsky:

a. John expects to present himself.

b. Who does John expect to present himself?

In (a) 'himself' refers to 'John' whereas in (b) it refers to whichever person is the answer to the question. How does GPT-4o fare?

> Who does "himself" refer to in the sentence "John expects to present himself"?

> In the sentence "John expects to present himself," "himself" refers to John. It means that John is the one who expects to present himself.

> Who does "himself" refer to in the sentence "Who does John expect to present himself?"

> "Himself" refers to John. In this sentence, John is the one who is expected to present himself.

The statistical model is confused by the superficial similarity between (a) and (b), just as Chomsky predicted decades ago.


Well, again, I'm not sure what your prompts are, but here we go:

https://chatgpt.com/c/7ead1985-83be-4dce-9fd7-29aafb248f01

> The statistical model is confused by the superficial similarity between (a) and (b), just as Chomsky predicted decades ago.

Well, WE are statistical models as well. So, any too-broad claims on the inability to understand natural language by ANY statistical models are falsified the moment they are spoken (or written); unless you go into Penrose-style Neoplatonism.

---

Vide your doubt if I find a (counter)convincing example. Sure, any benchmark of human vs. model performance is an empirical verification. And yes, some artificial models may struggle with some tasks. For example, for the Winograd schema, there is a leap between GPT3 and GPT4: https://d-kz.medium.com/evaluating-gpt-3-and-gpt-4-on-the-wi....

What it's (to me) is ex-cathedra defining what is the English language. The actual natural language people use is full of utterances that are not correct (yet are easily understandable) and "correct forms" that are rarely used and, if so, misunderstood by anyone save for those who put conscious effort (linguists, lawyers, etc).

For any discussions, it is essential to know if we are working with a real (and highly probabilistic) natural language or what of its concrete models (i.e., an abstraction).


My prompts are in my comment. Your ChatGPT link sends me to a login screen.

Humans (or English speakers at least) aren’t confused at all by the pair of sentences in my last comment. If you’re just going to try to deny the plain facts about how these sentences are (and aren’t) interpreted by English speakers then that’s really just a kind of grammatical flat-Earthism. The judgments at issue aren’t remotely subtle.

> Well, WE are statistical models as well.

This is begging the question. Chomsky would deny this.


So, I think that we differ at a fundamental level.

While you prefer to work in a Neoplatonic world of ideas, I prefer empirical facts and convictions that all models are approximations.

English grammar is not fixed per se; it evolves with region (have you ever been to Singapore?) and time. Your judgment (or Chomsky's, or anyone), however founded, is not a fact. It is an opinion up for experimental scrutiny.

I don't say that your examples are incorrect. Still, measure the percentage of correct (or consistent) answers for humans against particular models. Otherwise, it might be maths, might be philosophy, might be arts, but it is not (empirical) science.


As far as I can see, your issue with Chomsky has nothing to do with the performance of modern LLMs. You just reject all the data that generative linguists take to be crucially informative as to the grammatical structure of natural languages. You would hold the same view if LLMs had never been invented. So it is really a common case of AI and cognitive science talking entirely at cross purposes.

> English grammar is not fixed per se; it evolves with region (have you ever been to Singapore?) and time.

Sure, but this is not the case for the examples I gave. There aren’t dialects of English where (b) has the interpretation that GPT-4o thinks it can have. It’s no use trying to muddy the empirical waters in the case of completely clear judgments about what English sentences can and can’t mean.


There is no example or standard that would satisfy you. Any failing example can be added to the training set in the next version and even if it couldn't it wouldn't matter because you could find a person somewhere that would also fail it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: