On Chomsky and the Two Cultures of Statistical Learning

brockf · on May 27, 2011

Chomsky's one paragraph quote at the beginning of this article is more clear and thoughtful than the rest of this. I feel the author's missing the point.

In the case of language, observing and reporting statistical probabilities in written/spoken language output does very little to explain the cognitive systems used in acquiring and using language. Even one statistical anomaly serves to show that statistical learning is NOT the entire picture when it comes to language development.

There was another article on HN a while back that had another great quote from Chomsky that does well to illustrate what I feel is his main point here: "Fooling people into mistaking a submarine for a whale doesn't show that submarines really swim; nor does it fail to establish the fact". Creating a computer that can produce millions of grammatical utterances does little to show that we understand language systems. Now, if a computer could - like humans - learn to produce infinite, novel, contextual, and meaningful grammatical utterances, that's a different story. But that story will take a lot more than statistical learning to write.

moultano · on May 27, 2011

Chomsky is just appealing to our own biases. We don't want to be statistical approximation machines, so that makes it easy to dismiss attempts to mimic us with statistical approximation machines.

However, the preponderance of evidence* so far suggests that we are just statistical processing machines. Hence why Chomsky seems way off the mark.

*We know that various layers in the visual and auditory systems basically just compute ICA, and we know that the brain is incredibly plastic. Large areas can be removed and the remainder will compensate. That makes it seem likely that all neurons compute something like ICA (or at least that degrades to ICA when confronted with visual or auditory input.)

romaniv · on May 27, 2011

Chomsky is just appealing to our own biases. We don't want to be statistical approximation machines

Do we? Intellectual cynicism that makes people to publicly reduce humans and humanity to something mechanical and predictable is probably the most popular attitude I see online. This doesn't mean it's wrong in all cases, but surely it's not something exceptional.

ryanklee · on May 27, 2011

The kind of intellectual cynicism you describe is often a prerogative and mode of those highly educated in the sciences, and therefore of a very small majority of humans generally, who otherwise in my experience do tend to think of themselves as agents free of statistical determinations and as having wills and minds rather than cogs of any sort.

Helianthus · on May 27, 2011

this statement is exactly the attitude described by romaniv.

wicknicks · on May 28, 2011

Statistical learning as a field benefits humans most when its augments our actions. I think instead of comparing humans and statistical algorithms to see who fares better, we should focus on how the two can blend together, and help each other out. As the author points out, all the success stories are largely man-machine collaborations (imagine search engines without user data and inputs).

norvig · on May 28, 2011

Thanks for the comment, brockf. I'm sorry the essay didn't make sense to you. Let me try again on a few points.

Are you saying that one statistical error in a probabilistic model makes the entire model wrong? Then you'd equally have to say that one logical error in a categorical model makes it equally wrong. And manifestly, there are many logical errors in all grammars. So I'm not sure what your point is here.

I'm interested to know: I quoted Chomsky: "That's a notion of [scientific] success that's very novel. I don't know of anything like it in the history of science." Do you agree with him? If so, do you judge all the Science and Cell articles as not being about accurately modeling the world and only about providing insight? Or do you think Chomsky meant something else by that?

I understand that there are two goals, accurately representing the world, and finding satisfactorily simple explanations. I think Chomsky has gone too far in ignoring the first, but I acknowledge that both are part of science. I further think that statistical/probabilistic models of language are better for both goals. This is obvious to me after working on the problem for 30 years, so maybe it is hard for me to explain why. I think Manning, Pereira, Abney, and Lappin/Shreiber do a good job of it. Also, I don't see how a system that successfully learns language could be anything other than statistical and probabilistic. I agree it is a long ways away ...

-Peter Norvig

foldr · on May 28, 2011

>I further think that statistical/probabilistic models of language are better for both goals.

Could you give some concrete examples? As a linguist, I don't see that statistical models are currently giving us much insight in those areas where current syntactic theory does give some insight. So for example, we don't seem to have learned much about relative clauses, ergativity, passivization, etc. etc. through these models. On the whole, statistical methods seem very much complementary to traditional syntactic theory. This seems to be Chomsky's view also:

"A quite separate question is whether various characterizations of the entities and processes of language, and steps in acquisition, might involve statistical analysis and procedural algorithms. That they do was taken for granted in the earliest work in generative grammar, for example, in my Logical Structure of Linguistic Theory (LSLT, Chomsky 1955). I assumed that identification of chunked word-like elements in phonologically analyzed strings was based on analysis of transitional probabilities — which, surprisingly, turns out to be false, as Thomas Gambell and Charles Yang discovered, unless a simple UG prosodic principle is presupposed. LSLT also proposed methods to assign chunked elements to categories, some with an information-theoretic flavor; hand calculations in that pre-computer age had suggestive results in very simple cases, but to my knowledge, the topic has not been further pursued."

Anyway, if you want to pursue this critique of Chomsky further, I'd recommend a bit more background reading. This article gives a fuller explanation of the views he was outlining at the conference: http://www.tilburguniversity.edu/research/institutes-and-res...

>Or do you think Chomsky meant something else by that?

He presumably means what he said, namely that merely creating accurate models of phenomena has never been the end goal of science. You acknowledge this yourself when you say that you take both modeling and explanation to be part of science.

tel · on May 28, 2011

What about the middle ground of structured probabilistic/statistical models? By introducing strong assumptions and prior information you create models that still have great flexibility, but have meaningful parameters which can be interpreted theoretically. These appear to me to solve both Chomsky's apparent non-interpretive model complaint and the technical problem of training a model with a large number of parameters.

On one end of the continuum, n-gram models for large n with infinite training data estimate the empirical distribution of language and thus are the best you can possibly do. On the other end, rule based grammars directly transcribe intelligible "rules" of language generation and comprehension. Both ends are clearly fraught with problems.

In the middle we have topic models, recursive grammars, decision trees, various ad-hoc smoothing methods, each of which both allowing for more tractable training and introducing more meaning to the parameters of the trained model.

I feel like effort here provides (somewhat unsatisfactory) answers to both criticisms. I think it's fair to say that probabilistic/statistical models deserve more attention in a lot of fields in order to overcome a history of neglect, however.

losvedir · on May 27, 2011

>In the case of language, observing and reporting statistical probabilities in written/spoken language output does very little to explain the cognitive systems used in acquiring and using language.

Unless, of course, those cognitive systems are nothing more than some statistical probabilistic mechanism. I don't know anything about the field, but the article was interesting to me in that it seemed to at least partly argue that. I know, for me at least, I'll frequently produce a sentence and then repeat it to myself a few times to see if it "sounds right." Now, I don't know what is happening to determine that, but perhaps I'm comparing it to some statistical probabilistic model I have in my head?

> Even one statistical anomaly serves to show that statistical learning is NOT the entire picture when it comes to language development.

1) Does it? Maybe it shows the specific statistical probabilistic model in question is wrong. Consider, as Chomsky did, a model which predicts zero probability for a novel sentence. Clearly, as you say, one anomalous novel sentence is all it takes to disprove such a model. But what about other models which can handle them? The "anomaly" may not be an anomaly anymore.

2) Do you have some anomaly in mind which shows statistical probabilistic models don't work?

-----

The article was very interesting to me, but I don't know anything about the field. I guess my main question boils down to: Is it possible that language acquisition and production is nothing more inside our heads than a simple statistical probabilistic model?

jimbokun · on May 27, 2011

"Now, I don't know what is happening to determine that, but perhaps I'm comparing it to some statistical probabilistic model I have in my head?"

I had a non-native Japanese teacher once who, when asked a question on proper Japanese usage, would often stop for a second, clearly playing the sentence or phrase over again in his head, and way "no, they don't really say that" or "yes, they do say it that way."

Clearly, he was using his extensive experience listening to Japanese over many years to determine grammaticality, so at least a statistical model, if not conclusively a probabilistic one.

romanows · on May 27, 2011

A simple statistical model is probably not the only thing human infants are using when they learn language. Linguists make a pretty good case that there must be some structure in-place for infants to acquire language robustly, quickly, and with the kinds of noisy input (overheard speech) they have to work with.

It's not my field so I can't give examples off the top of my head, but the argument involves rapid acquisition of syntax and near-complete absence of errors that you'd expect to see in a simple statistical model.

wisty · on May 28, 2011

Exactly. Almost everyone can identify recursive grammar (except people in a small South American tribe who speak a non-recursive language).

You don't need a raw MC to assess the likelihood of "The DOG ate my homework", "My WASHING MACHINE ate my homework", and "MY LEGALLY ate my homework". You need P(WASHING MACHINE = NOUN), and P("My NOUN ate my homework").

But that's not right. You could also have P(WASHING MACHINE = NOUN THAT CAN EAT STUFF). Or maybe P(EAT = HUMOROUS TERM FOR DESTROYED), P(HUMOROUS SENTENCE), P(WASHING MACHINE = OBJECT THAT CAN DESTROY HOMEWORK).

Anyway, it's really bloody hard to put it all together. But that's what humans do. I'd imagine that we store it in our short term memory, then make a few quick parses of it, under varying assumptions, and keep the ones that are most consistent.

In reality, Chomsky is fighting the same sort of battle that happened when Newton and Leibnitz were around (no, not the battle between Newton and ... the rest of the world really). OK, you have gravity. But what causes it? Why? It's an interesting question, but not necessarily one that will lead anywhere.

true_religion · on May 28, 2011

In the case of "my washing machine ate my homework" and other non-standard expressions, many people will be confused by making the obvious associations. It's only when the new rules are explained to them that they come to understand what was meant.

The failure of a machine to understand a sentence given a set of rules, may simple mean that it needs to be taught new rules.

brockf · on May 27, 2011

If that was true, then why did humans evolve to speak at all? Why, if speech is simply a reaction to statistics we are tracking and behaviours that have been rewarded, would the first utterances have been made? And how do we make completely novel utterances that attempt to express our otherwise abstract thoughts?

pygy_ · on May 27, 2011

> Why, if speech is simply a reaction to statistics we are tracking and behaviours that have been rewarded, would the first utterances have been made?

Why not? Look at it from the bottom up:

Communication is a fundament of life, from intra-cellular to inter-cellular to inter-organism interactions (another fundament is the ability to keep oneself in a low entropy state, at the expense of the rest of the world).

Human speech is an evolution of mammal communication. It grew up in complexity, from grunts and other basic noises, along with our way of living, up to what we have now.

> And how do we make completely novel utterances that attempt to express our otherwise abstract thoughts?

Speech is a big collage. New is either the result of

* a recombination of the sub-parts of past speech

* the definition of a new word in terms of older words, or sometimes arbitrarily (for proper nouns).

Nothing fancy AFAICT.

brockf · on May 27, 2011

There's a big difference between "grunts and basic noises" and language. Or at least, that's my opinion. In this same line, I don't believe dogs/monkeys/birds/bees have language, despite the ability to communicate.

This view is just to simplistic to hold its weight when you really look at the intricacies of language and its evolutionary history which, by the way, I would suggest comes from manual gesture and not grunting.

pygy_ · on May 27, 2011

> There's a big difference between "grunts and basic noises" and language. Or at least, that's my opinion. In this same line, I don't believe dogs/monkeys/birds/bees have language, despite the ability to communicate. This view is just to simplistic to hold its weight when you really look at the intricacies of language and its evolutionary history which, by the way, I would suggest comes from manual gesture and not grunting.

Mu![1]

But you're probably right about gestures.

Wild chimps have a vocabulary of about 66 signs. We can also observe tribes with languages more primitive than ours (no pronouns, for example). But there's a missing link of several millions of years of evolution between both.

What are the (known) intricacies of the evolution of our ability to communicate?

There's no definitive proof for the statistical argument, but a growing amount of (neuro)scientific evidence points to it. What's (are) your alternative hypothese(s)?

[1] http://en.wikipedia.org/wiki/Mu_(negative)

[2] http://www.ncbi.nlm.nih.gov/pubmed/21533821

brockf · on May 27, 2011

This is where the debates really begin :)

I think that most people who believe in some form of the motor theory of speech perception will also believe that speech evolved from manual gesture.

Others scoff at the motor theory. In fact, I'd say I'm in the minority by bringing it up with any regularity.

If the question, what is "known" about the evolution of our ability to communicate, I wouldn't have much to point you towards. Most is theory based on modern evidence, somewhat like armchair psychology. Other people point to our ability to integrate non-verbal gestures into our comprehension, activation of our motor cortex prior to semantic/phonetic network activation when disambiguating difficult speech sounds, our ability to synthesize visual/auditory sources of information when the visual information relates to speech gestures (mouth/tongue movements), etc.

pygy_ · on May 27, 2011

What's the link between the motor theory of speech perception and your criticism of losvedir's post?

Aren't these issues completely orthogonal?

scott_s · on May 27, 2011

Why, if speech is simply a reaction to statistics we are tracking and behaviours that have been rewarded, would the first utterances have been made?

That criticism can be lobbed at all abilities that we claim came about due to evolution - which, to be clear, is all of them. The statistical model would be the mechanism, but it wouldn't be the reason why it evolved. That answer is relatively boring, and is the same one as all evolutionary processes: it appeared randomly from mutation, and it provided benefit to those that had it.

jimbokun · on May 27, 2011

"That answer is relatively boring, and is the same one as all evolutionary processes: it appeared randomly from mutation, and it provided benefit to those that had it."

Not just boring, but a totally banal and useless answer.

What kinds of mutations? In what sequence? How did it provide a survival benefit? What were the earlier forms of language like, and how did they become the languages spoken today?

Just saying "evolution did it" is about as informative as saying "God did it."

scott_s · on May 27, 2011

Excellent questions! That I hope someone will investigate. But brockf seemed skeptical that it was even possible for there to be an evolutionary process that produced humans with a statistical-learning-engine in their brains for language. Which I find curious, since - and this is my point - the same can be said for everything that is a result of evolutionary processes. That is, his complaint has nothing to do with language and statistical processoes. The same complaint could be lobbed at eyes.

brockf · on May 27, 2011

Just to clarify my position (as it is misunderstood above): I believe it is one of the most important factors in acquiring language. 100%. However, I personally believe that it's a domain-general tool exploited by a domain-specific language module adhering to evolved instincts in language acquisition.

scott_s · on May 27, 2011

And why can't that domain-general tool be some kind of statistical machine? I ask this because I don't see why what you said is incompatible with it - in fact, I agree with what you said - but I suspect that the mechanism is probably statistical in nature.

brockf · on May 28, 2011

Right, I think we're in agreement then. Statistical learning is the domain-general tool. Or one of them.

It's the domain specific tools that are up in the air, though, and what I'm interested in.

PaulHoule · on May 27, 2011

There's no doubt that Noam Chomsky founded a paradigm of academic activity. Linguists can generate an unlimited number of papers and monographs by finding problems and proposing intellectually convincing solutions.

From an engineering standpoint, however, Chomsky's view of grammar has been remarkably barren when it comes to machine processing of natural language. It's made a major contribution to artificial languages but despite a lot of effort it hasn't added much performance to what can be done with statistical methods.

I'd agree that a hidden Markov model that does POS tagging with high accuracy doesn't provide an intellectually satisfying model for "how language works", but you don't need to have a model for "how language works" in order to use it.

anghyflawn · on May 27, 2011

I feel there is excessive emphasis on "what Chomsky said" and "what Chomsky did". Norvig chooses to point out that the principles and parameters framework is, let's say, imperfect, but, well, Chomsky would agree. Moreover, if you zoom out and stop obsessing about quotes from "Syntactic Structures", you will realize that a lot of the work that's being done in theoretical linguistics is not quite as barren. Yes, statistical methods for (say) anaphora resolution can be extremely efficient, but basically very few people had thought about anaphoric relations in any systematic way before generative linguistics came around.

Moreover, rule-based NLP approaches also have their place, and they are often the direct result of theoretical advances. A case in point is the modelling of morphophonology (which is necessary for spell checking, dictionaries and text generation for morphologically complex languages): many successful approaches are those based on finite-state machines, which could not have happened without Johnson and later Koskenniemi using them to formalize the rule-based approach pioneered by Halle and (yes) Chomsky (well, not quite, but this is still the point of reference for rule-based phonology).

(I am a theoretical phonologist, but my colleagues who do actual NLP work of this type tell me that statistical methods aren't that great for the sort of work they do.)

brockf · on May 27, 2011

I absolutely agree. I don't care if the computer that automatically transcribes my voicemail understands speech either - I just want it to be correct.

That technology is impressive to the utmost degree, I just don't think it should be the endgoal of modern science.

jerf · on May 27, 2011

There is no guarantee that the Kolmogorov complexity of any interesting system will fit inside the rational parts of our heads. There pretty much is a guarantee that we will not be able to fully understand our own brains, using our brains; the part of our brain that can understand things is just dwarfed by the size of the rest of it. (We really do quite a lot with not very many free neurons.) Even if there is a generative theory that can explain human speech in less bits than a direct lookup table, there's no guarantee we can find it, and the null hypothesis must be that we won't because there is no such theory.

We should look for it, but we should not expect to find it.

stcredzero · on May 28, 2011

I'd agree that a hidden Markov model that does POS tagging with high accuracy doesn't provide an intellectually satisfying model for "how language works", but you don't need to have a model for "how language works" in order to use it.

I'd agree that an adhoc equation that fits observed core sample data doesn't provide an intellectually satisfying model for "how sedimentation works", but you don't need to have a model for "how sedimentation works" in order to use it.

That's actually from something I worked on a long time ago. People actually use adhoc models of sedimentation and the formation of sedimentary rock for practical purposes. I also think most people suspect that we'd learn something valuable by figuring out the underlying reason why the data fits the particular description.

lkozma · on May 27, 2011

I think Norvig acknowledges the point you are making here, namely that the statistical approach does not explain the cognitive systems behind language. However (if I understand correctly) he implies that those systems might be too complex to be adequately explained, let alone emulated and we can achieve more by observing them as black boxes, analyzing their outputs, i.e. language as it is used.

"if a computer could - like humans - learn to produce infinite, novel, contextual, and meaningful grammatical utterances"

To perfectly achieve this goal, you might have to simulate 4 billion years of evolution under the same conditions as it happened on Earth, and a few thousand years of cultural evolution as it led to our languages and our cultural context. Language is incredibly complex and changing, many of its details might be incidental, i.e. results of random events, so it seems unreasonable to pretend that we can deduce it all from some elegant first principles. At least that is my reading of Norvig's argument.

pealco · on May 27, 2011

> I think Norvig acknowledges the point you are making here, namely that the statistical approach does not explain the cognitive systems behind language.

If that is the case, then the argument that Norvig is making is irrelevant to the argument Chomsky is making. Chomsky simply makes the point that statistical accounts lack explanatory adequacy. As someone who has worked closely with many of his students and who has received extensive training on his scientific program, I can say with confidence that Chomsky would have no objection whatsoever about the usefulness of statistical approaches to linguistic engineering problems. The results speak for themselves. He would go on to say, however, that how well a statistical approach solves a linguistic engineering problem is irrelevant to the question of how humans do what they do.

The answer to the question may well be statistically grounded. That is a valid hypothesis and a logical possibility which should be taken seriously. However, it is incumbent on the proponents of such an answer to provide evidence that it is what humans are doing. Here are some examples of the kinds of evidence necessary:

* evidence that humans are capable of performing the kinds of computations that the statistical approach requires,

* evidence that the statistical approach works with the relatively limited amount of data that a human receives,

* evidence that the statistical approach fails in ways that humans fail

How well a statistical approach succeeds at an engineering task is not an item on this list, simply, again, because engineering tasks are irrelevant to what humans actually do.

Let me specifically say that statistical approaches are not, from the start, ruled out as potential candidates for the algorithms underlying human language. It's just that a case has to be made for them using the right kind of evidence.

Finally, I'll reiterate what others have pointed out: from a scientific perspective, that something is hard to explain doesn't mean that we shouldn't try. And, those that have given up (as you suggest Norvig has) shouldn't fault those who haven't for calling them out on it.

brockf · on May 27, 2011

Since when has science been reasonable? :)

In situations like this, I tend to speak in theoretical absolutes. A computer that "could - like humans - learn to produce infinite, novel, contextual, and meaningful grammatical utterances" isn't even on the timeline right now, but it's the theoretical goal in showing that we understand language acquisition (ontogenetic development), evolution (phylogenetic development), and production.

Just because that goals seems unattainable doesn't, to me, mean that we need to aim any lower. Now, this is premised on my belief that mimicking phenomena with statistical learning is not as intellectually satisfying as understanding the underlying cognitive systems, but that's not believed by everyone.

lkozma · on May 27, 2011

I agree that learning to produce and interpret varied utterances is a worthy goal, but the fact is that (far as it still is today) lowly statistical methods have gotten us closer to this goal than the other, chomskian approach. It could be a situation where aiming lower lets you shoot higher.

anghyflawn · on May 27, 2011

This is a fundamental misunderstanding of what modern generative linguistics is all about (to be fair, it is extremely widespread). The aim of this branch of science is expressly not to "learn to produce and interpret varied utterances" (called E-language in the jargon), but to understand the cognitive processes behind the production and interpretation of utterances (called I-language). Now you may agree or disagree with the methods and assumptions used in the pursuit of this goal, but it is patently unfair to accuse the field of failing to do something it never set out to do.

equark · on May 27, 2011

You provide no evidence for the last statement: "that story will take a lot more than statistical learning to write."

The existing evidence overwhelming suggests that a computer that can "learn to produce infinite, novel, contextual, and meaningful grammatical utterances" will be based on probabilistic models. In fact it's hard to imagine how it could possibly be otherwise.

The computer is observing noisy sensory input and is trying to make inferences about how to communicate with some future reader. Mathematically, there is only one way to write this problem: probability. It's true that the learned model may have amazing structure to it, but this will almost certainly be learned via probabilistic models rather than being hand-coded by some future Chomsky.

That fact does not imply we will understand language systems or the human mind. The Chomsky route may be better suited for that task.

brockf · on May 27, 2011

Where is the existing evidence? And what evidence in modern science can possibly look into the future and make a prediction about something that right now is so far beyond our grasp?

Statistical learning explains a lot. I'm a huge fan of it. Skinner's Behaviourism also explained a lot in psychology. But, just as in the case of behaviourism, I fear that statistical learning has/will hit a wall at which its explanations become futile and overly simplified.

My personal belief is that, at that wall, we'll see that human language instincts and evolved language-specific mechanisms will be what we are looking for.

scott_s · on May 27, 2011

Consider catching a ball. We know how to design a robot that will catch a ball: it will be the hardware for moving an "arm" and a "hand" for the catching, as well as computer hardware and some software for the logic. The software will solve differential equations in order to predict where the ball will be, and when to move the "arm" and "hand" to the correct spot in order to catch it.

No one, as far as I know, argues that humans actually solve differential equations in their head when they catch a ball. They just... catch it. Perhaps with some failed attempts along the way, but as a part of growing up, we learned basic eye-hand coordination.

The notion that syntax and grammar as we have formalized it exist in our brains is the same as saying that differential equations exist in our brains. I find it much more likely that we innately have rough models for syntax, grammar, mechanical movement and object trajectories, but that it takes significant trial-and-error for us to tune those models until the point of competence. I think these models have to be at least partly statistical - otherwise, we wouldn't need to learn anything - and that while our formalisms may be nice approximations of what we do in our brains, I see no reason why they have to be exactly it.

burgerbrain · on May 27, 2011

"No one, as far as I know, argues that humans actually solve differential equations in their head when they catch a ball."

I would argue that we actually find approximate solutions to differential equations with what is likely best described as an analog computer.

So essentially, humans and computers as they are currently programmed do it the same way. The only difference is that computers do it better.

scott_s · on May 27, 2011

By actually I meant solve them in the same way that you and I solve them; analytically, using our formalisms. Rather, I'm proposing that our brains are using some statistical model that gives results pretty damn close to what the analytical answers would be. And that something similar is true for syntax and grammar.

burgerbrain · on May 28, 2011

Eh, a computer is a computer.

amouat · on May 27, 2011

I don't think Norvig was arguing Chomsky was completely wrong about what he said, more that statistical models are a hell of a lot more important than Chomsky implies.

Looking at the statistics and evidence is of great importance in trying to form models and answers to the "why" questions. Although mimicking a bee dance may not mean we understand it, it does provide a basis for founding and comparing theories.

sitkack · on May 29, 2011

When is the pretending so good that it ceases to be pretending? How much Mocking does the Mocking Bird need to partake in before it is the Creating Mocking Bird?

What it didn't seem like Norvig got was difference between understanding and a highly sophisticated pretender. Gut Level vs Self Aware intelligence. Both are valid forms of intelligence but only one is a valid form of understanding.

I think statistical methods are a form of intelligence that are highly mechanical and could never achieve human level cognition (ie fart jokes). But I could be wrong, usually am more than half the time.

anigbrowl · on May 28, 2011

This [Chomsky, not you] is just a warmed-over restatement of Searle's Chinese Room argument against AI. And it's a bullshit argument, for a reason I can state in two words: Turing test.

madamepsychosis · on May 27, 2011

Whose's to say that the human brain doesn't learn by statistical analysis? The human brain collects data, forms hypotheses and tests them. Just like a statistical machine.

andrewcooke · on May 27, 2011

just because you only understand one side of an argument doesn't mean everyone else is an idiot.

in some sense this is the same argument as searle's chinese room. that's sufficiently well known and debated that it's fair to say that neither side can be dismissed as simply "missing the point".

brockf · on May 27, 2011

Throughout history, there have been many "well known and debated" arguments that have proven to be idiotic. See: the shape of the Earth.

This isn't one of them, though. And I didn't mean to imply that anyone was an idiot.

Instead of critiquing how I said something, or that I said something at all, can you tell me what I am missing? I'm obviously missing something - Norvig is no idiot.

andrewcooke · on May 27, 2011

the problem is whether or not there is any way to "ground" meaning. for physics, the "unreasonable effectiveness of mathematics" might suggest that there are simple "meanings" that underly physical "laws".

but there's nothing to say that the same is true for intelligence or language. maybe the brain is nothing more than a particularly flexible "neural net", which statistical methods are modelling quite well. in that case, "intelligence" is not qualitatively different from "a good simulation of intelligence".

the same problem occurs in free will - does it "really" exist? if we're just (mechanical, predictable, although highly complex) machines then it is difficult to imagine how it can. yet the intuition is that there is clearly some meaning to the idea of a "free agent".

these are hard questions. people don't know the answers. instead we look to what daniel dennett calls "intuition pumps" (see his book "elbow room" on the free will problem) - simple parallels that "feel right". from those, we use intuition to argue in one direction or another. but the problem with that approach is that it depends on what you choose as a "hint".

some advance is being made through experiment. imaging of neural activity in the brain, for example, or the recent discovery that people who believe they have free will behave differently to those that don't.

searle's chinese room argument - http://plato.stanford.edu/entries/chinese-room/

unreasonable effectiveness of mathematics - http://www.jstor.org/pss/2321982 http://plato.stanford.edu/entries/mathematics-explanation/

i can't find the free will behaviour result that was in the news about a week ago. but that's perhaps less related to this anyway.

and i doubt that everyone who thought the earth was flat was idiotic, frankly. just because something is obvious now doesn't mean it was a stupid question, or easy to answer, when first raised.

borism · on May 27, 2011

maybe the brain is nothing more than a particularly flexible "neural net"

now that I found quite funny :)

(yeah, most likely it is just that)

Jun8 · on May 27, 2011

This is not a new debate. Within Linguistics there has been a continuous push against statistical NLP models. Read the introduction of Manning's book, even he seems to be defensive about NLP.

Chomsky is a colossus, his achievements are well-known. However, at one point in many disciplines it comes to pass that the pioneers who pave the way in time become the very impediment to new ideas. His emphasis on Semantics have warped the minds of many generations of researchers (and some other ideas on universal grammar, too).

I experienced this first hand, my advisor, Prof. Raskin, a great researcher on semantics, nevertheless thought that statistical approaches were not the way to go. Sadly, in many Linguistics departments people are just not equipped with the statistical tools necessary to have a basic understand of what's being done in the NLP field. So NLP is generally taught under CS, EE, or CompE.

adavies42 · on May 27, 2011

i saw someone once compare chomsky to freud, as a foundational figure whose discipline can't/couldn't progress during his lifetime.

ordinary · on May 28, 2011

Einstein would be another example.

foldr · on May 28, 2011

>His emphasis on Semantics

What are you talking about? Chomsky has always been highly critical of formal semantics.

christianpbrink · on May 27, 2011

"If Chomsky had focused on the other side, interpretation, as Claude Shannon did, he may have changed his tune. In interpretation (such as speech recognition) the listener receives a noisy, ambiguous signal and needs to decide which of many possible intended messages is most likely. Thus, it is obvious that this is inherently a probabilistic problem, as was recognized early on by all researchers in speech recognition..."

This is the money shot especially since speakers are aware of the interpretive activity of listeners, and effective speakers play constantly on the ambiguities in their statements - structural (i.e. grammatical) ambiguities as well as semantic ambiguities. Listeners in turn are aware of speakers' awareness of this.. There is, effectively, an infinity of mutual awarenesses of structural ambiguities. In any instance of communication.

I think most technologists and (especially) businesspeople see this intuitively. I think many academics do not. Not sure how to articulate what I mean but I think I am saying something non-trivial about academics and their perspective on language.

cma · on May 27, 2011

Freeman Dyson earlier this year on this type of ambiguity as expressed in the drum language of the Democratic Republic of Congo:

http://www.nybooks.com/articles/archives/2011/mar/10/how-we-...

bluekeybox · on May 27, 2011

> I think I am saying something non-trivial about academics and their perspective on language.

I became convinced that there is a strain of thought, one that is especially pervasive in the academia, which believes that knowledge/meaning is something irreducible and almost mystical. It probably has to do with the fact that people who fetishize knowledge as something incredibly worthwhile for its own sake end up being overrepresented in the academia. Those who are a bit more cynical/nihilistic tend to go into finance or start their own companies.

The old advice "do not make any gods to be alongside me" is still relevant except for the "alongside me" part, which probably only has any meaning if you consider yourself religious. I have a feeling that many academicians, especially the old-school ones, idolize knowledge to the extent of ascribing to it god-like powers even if said knowledge has little relevance for anything practical.

CWuestefeld · on May 27, 2011

Server's down. Here's a cached link: http://webcache.googleusercontent.com/search?q=cache:http%3A...

EDIT: stop giving me upvotes. I've got 11 points now for nothing more than a link. I don't deserve them. Stupid hidden points...

jng · on May 27, 2011

People upvote so that the link to the cached copy will be at the top.

norvig · on May 27, 2011

Sorry about the intermittent access. My hosting service provides me with sufficient bandwidth, but only provides a version of Apache that forks a new process for every GET, and thus runs out of processes and denies access to a portion of visitors when I get slashdotted/redditted/hacker-newsified. If anyone can suggest a more reasonable hosting service, let me know. -Peter Norvig

alphamerik · on May 28, 2011

[cough] I hear Google has pretty good bandwidth and scaling. Ever try App Engine? [/cough]

PaulHoule · on May 27, 2011

It's funny. Lately I've been working with NLP systems and in the last few years there are a few really good parts-of-speech taggers that are about 99% accurate. All the ones I know of are based on hidden markov models, which definitely would disappoint Chomsky.

Part of the trouble w/ Chomsky is that real language doesn't draw a clear line between syntax and semantics. Even though an HMM doesn't correctly model the nested structures that are common in natural language, it makes up for it by encoding semantic information.

sharmajai · on May 27, 2011

Another trouble is that human beings are innately probabilistic when it comes to language. A sentence written/spoken by humans does not have to be gramatically correct, to convey it's meaning, and does not always follow the strict rules that Chomsky talks about.

It's not the language that defines how we communicate, it's how we communicate defines the language.

But I also disagree with peter when he says the why is not important, it is this why or the understanding of the matter that separates us from the machines like watson, since our sole purpose in life is not to win at a game, but play/enjoy the game and most importantly "reuse the understanding" gained in some other facet of life, a feat that I beleive no machine is capable of.

foldr · on May 28, 2011

>It's funny. Lately I've been working with NLP systems and in the last few years there are a few really good parts-of-speech taggers that are about 99% accurate. All the ones I know of are based on hidden markov models, which definitely would disappoint Chomsky.

No, it wouldn't disappoint him at all. In fact, one of his earliest works in linguistics discussed how transition probabilities could be used for chunking and categorization. (See http://www.tilburguniversity.edu/research/institutes-and-res... ) It's not as if Chomsky ever presented part of speech tagging as a poverty of the stimulus argument.

wccrawford · on May 27, 2011

"O'Reilly is correct that these questions can only be addressed by mythmaking, religion or philosophy, not by science."

... My jaw is on the floor. It drives me nuts when people go from 'We can't explain that yet' to 'The only explanation is God.'

The tides are incredibly complex when you insist on 'why' all the way back to the beginning of the universe. Everything is!

torstein · on May 27, 2011

>He doesn't care how the tides work, tell him why they work. Why is the moon at the right distance to provide a gentle tide, and exert a stabilizing effect on earth's axis of rotation, thus protecting life here? Why does gravity work the way it does? Why does anything at all exist rather than not exist? O'Reilly is correct that these questions can only be addressed by mythmaking, religion or philosophy, not by science.

Science doesn't really aim to answer the 'why'-questions, but rather the 'how'-questions. The scientific method boils down to falsifying hypothesis, and it's a lot easier with 'how does the tide work?' than 'why does the tide work (the way it does)?'.

Science can't say anything about 'Why does anything at all exist rather than not exist?', because there is no way to test any of the answers. So it's left to mythology, religion or philosophy to answer.

T-hawk · on May 27, 2011

> Why is the moon at the right distance to provide a gentle tide, and exert a stabilizing effect on earth's axis of rotation, thus protecting life here?

A possible answer to this stems from the anthropic principle. We evolved in a place with a moon because the moon helped us evolve. We don't see no moon because complex life such as us would not have developed without it. A stable rotation and gentle tide are conducive to the evolution of complex organisms; tides were instrumental in getting life out of the seas and onto land.

"Why is the sun the way it is?" can be answered similarly. A smaller star has too small a habitable zone where liquid water can exist. A larger star would have burned out sooner than the 4.5 billion years it took to develop sapient life. A double star has a much smaller set of stable planetary orbits. That the sun is an appropriate star for our life on earth is not divine providence or an enormously unlikely coincidence; it's the result of a universe-wide scenario of statistical multiple endpoints.

borism · on May 27, 2011

it's the result of a universe-wide scenario of statistical multiple endpoints

totally agreed with you up to that point which I have hard time understanding.

so you say universe is kind of fractal and we happen to be in the right place on that fractal, where all the ingredients come together?

T-hawk · on May 27, 2011

Yes, but there's a causal relationship that I think you're not quite expressing. We are where we are because here is where all the ingredients came together.

alphamerik · on May 27, 2011

This is a good example of why (how?) language is so weird. Maybe I am just satiated, but for an inquisitive mind, to me "Why is the moon in the sky?" and "How is the moon in the sky?" parse out to be semantically equivalent. Science (astronomy) does try explain how (why?) we exist and under what circumstance the universe came into existence (if it did).

wccrawford · on May 27, 2011

They are actually quite different.

'How' asks about the current state of things and how they are possible.

'Why' asks about the past, and how the current state of things came to be.

Depending on interpretation of the question, 'why' can be a lot more philosophical than 'how'.

mbateman · on May 27, 2011

I interpret this sort of question not as asking for a further step in a causal chain, but rather as demanding a teleological explanation where none is available.

While I disagree with almost everything Chomsky says about everything, and I think it was meant to be somewhat sympathetic, it's really unfair to propose an affinity between Chomsky and O'Reilly in this manner. What the hell.

Equally unfair is Norvig calling Chomsky a mystic for his invocation of Plato. Chomsky is a rationalist, not a mystic.

jimbokun · on May 27, 2011

"... My jaw is on the floor. It drives me nuts when people go from 'We can't explain that yet' to 'The only explanation is God.'"

Philosophy does not imply God.

And it drives me nuts when people don't see that many important questions cannot even be expressed within the framework of science.

pygy_ · on May 27, 2011

That anything at all exists has to be accepted as a given.

"God" (in its various versions and revisions around the globe) is just an anthropomorphized version of the previous proposition.

pygy_ · on May 27, 2011

I mean "God" in this context, of course.

People tend to map a lot of other stuff onto it.

ovi256 · on May 29, 2011

>That anything at all exists has to be accepted as a given.

Boy do I have news for you!

http://en.wikipedia.org/wiki/Solipsism

pygy_ · on May 31, 2011

This view is compatible with solipsism.

Assuming solipsism (however absurd it may be), both me and the universe do exist.

Wuzzy · on May 27, 2011

What science often does is that it responds to a "why" question by an analysis of the phenomenon and presenting its causes in some lower-level terms. But, from a certain viewpoint, that is not a satisfactory answer.

Take physics, for example. It can tell you why some objects behave the way they do by telling you there are certain particles, interacting forces, etc.. In this way you can explain, say, the photoelectric effect.

But it isn't really an answer to the "why" question, is it? It just pushes the question one level lower. Why are there such and such particles and forces? Why the constants? The very nature of these answers is descriptive. It is a description of how the world works, not why it works that way.

Maybe asking "why" in this ultimate manner is an ill-posed question - but that's not the deal here. It just doesn't seem that science in its current form, unlike religion or philosophy, could ever even attempt to answer it.

Don't get me wrong, I'm strongly atheistic myself, but there are some inherent limitations of scientific exploration and clarification with respect to the answers it can provide.

paganel · on May 27, 2011

> My jaw is on the floor. It drives me nuts when people go from 'We can't explain that yet' to 'The only explanation is God.'

I don't think anyone mentioned God, and, to be fair, the problem of what exactly constitutes reality and what would be the best ways to imitate it is quite complex. We, as a species, have been trying to find the rational answer to this problem for at least 2,500 years (since the pre-Socratics), but as far as I know we haven't come to any definitive answer, we don't even know if there is such an answer.

alphamerik · on May 27, 2011

I think the whole point of that part of the article is that the only answer that could satisfice O'Reilly and viewers is that of religion, and Norvig says Chomsky has a philosophy "(some would say religious belief)" i.e. some unscientific belief that "language should be simple and understandable", which is balderdash to claim that is a religious viewpoint, in my opinion.

There are several ways one could model language, from a top down purely statistical approach that Norvig likes, something in the middle which Chomsky proposes, to a bottom up neural model of chemical interactions. There are advantages and disadvantages to each method for many different reasons.

paganel · on May 27, 2011

> There are several ways one could model language, from a top down purely statistical approach that Norvig likes, something in the middle which Chomsky proposes, to a bottom up neural model of chemical interactions.

Yeah, I was just trying to take a step back (and maybe I was too OT, I agree), but at some point we should start asking ourselves more fundamental questions. Anyway, this discussion is way over my head, I'm just glad that HN users think there's an answer for everything, is like Godel or Kant have never written anything in their entire lives.

ugh · on May 27, 2011

And why exactly are myths, religions or philosophy better at answering questions than science? What’s the justification for that claim?

paganel · on May 27, 2011

> And why exactly are myths, religions or philosophy better at answering questions than science? What’s the justification for that claim?

Science assumes that there is a "justification" (for language, the universe, you name it). A true philosopher always begins by asking himself if there is such a thing as "justification". Just think about it, science has helped us put a man on the Moon and create the Tsar bomb, but it isn't able to answer Epimenides's "All Cretans are liars" paradox, a 2,500 years-old problem.

ugh · on May 27, 2011

What’s there to answer? It’s a paradox.

paganel · on May 27, 2011

> It’s a paradox.

You take as a given that paradoxes are or at least should be "un-answerable". This Wikipedia page (http://en.wikipedia.org/wiki/Liar_paradox#Possible_resolutio...) mentions a list of guys that tried to answer it, but anyway, this problem is way over my head, I should go back to writing Drupal modules before I also start rambling about Solipsism (http://en.wikipedia.org/wiki/Solipsism#Gorgias_.28of_Leontin...) :)

ugh · on May 27, 2011

If you can find a solution it’s no longer a paradox. Paradoxes are defined† as un-answerable. Whoever created the paradox made a mistake and merely created something that looks like a paradox.

“Paradox” is just a word. There is nothing special about something being a paradox (except that paradoxes are cool to think about).

†That’s at least a common definition. That’s where I’m getting such crazy ideas like “paradoxes have no answer.” You are free to define “paradox” some other way. Be assured that it was never my intention to claim that the definition I used is in some sense the true definition. Definitions are all about communication (you need to agree on definitions in order to be able to talk to each other), not tools for finding the truth.

paganel · on May 27, 2011

> Whoever created the paradox made a mistake

"Mistake" means there is a "right answer", a point of reference. What is that? Otherwise, your use of the concept "mistake" doesn't make sense.

> Paradoxes are defined as un-answerable

Yeah, of course we can define things ad-nauseam. That doesn't just make "reality" any more "real", it just helps us send men on the Moon, curing diseases or building atomic bombs.

ugh · on May 27, 2011

If whoever created the paradox had the goal of creating a paradox (as commonly defined) and ends up with something that has an answer (i.e. with something that is not a paradox as commonly defined) that person has failed to achieve her or his goal of creating a paradox. It is likely that the reason for this failure is a mistake the person made while creating the paradox. Other explanations for such a failure are also possible.

That’s the ultra verbose version of that sentence. I can crank the verbosity up quite a bit still but I would rather not want to.

The point of reference you are asking about is the goal of creating a paradox. That was sort of implied but seem to be quite a fan of verbosity. It is, of course, possible that someone – for example – just stumbles upon something that looks like a paradox. The mistake would then be the identification as a paradox.

I’m not really sure what you are trying to tell me with your last point. You started trying to define paradoxes some other way as they are commonly defined. (Quote: “You take as a given that paradoxes are or at least should be ‘un-answerable’.” – thereby implying that according to you definition of “paradox”, the same can have answers.) I wouldn’t have brought definitions up otherwise.

I agree with you that merely defining doesn’t tell you much (maybe nothing) about the nature of reality and said as much. Definitions are for communication, no tool for finding truth. Those tools are the meat of science, not definitions.

lurker19 · on May 27, 2011

"God" did not appear in the quotation.

Some questions are metaphysical, not because they are complex, but because they are ill-posed and not subject to falsifiable experimentation of observation.

BTW, a lot of real-world phenomena, like some large events of history or sociology or macroeconomics, fall into the same category of scientific unapproachability, due to practical limitations of our civilization and any plausible future civilization.

T_S_ · on May 27, 2011

The handshake example was illuminating. Three "equivalent" theories:

Theory A: Closed form formula function.

Theory B: "Algorithm". Still a function.

Theory C: Memoized function (constant time!)

According to the article "nobody" likes C, especially the article's Chomsky straw man. If one had a procedure to convert C to A, then this whole issue would become hairsplitting. Such a procedure would aim to convert a memoized function back into a form that uses more symbols from a mathematical language. A good criteria of success would be the description length of the resulting procedure in the preferred language. One reason this could be useful to science is that once you identify a value that is useful in many theories it becomes part of the language. Making it available to the next problem may speed up the search for a "good" description of the next phenomenon. Identical procedures that appeared in various algorithms might acquire a special name. One such value might be called "pi", another "foldr" and so on.

Of course there may be many good descriptions, just as there are many languages. Also, the example could be extended to statistical modeling situations by adding room for error terms in the suitability criteria.

So if, you have a general procedure to convert a table into a definition you can make money and science at the same time!

stcredzero · on May 28, 2011

My conclusion is that 100% of these articles are more about "accurately modeling the world" then they are about "providing insight," although they all have some theoretical insight component as well.

Before you can figure out why, you have to make sure you can accurately characterize the what. So there's a lot of science that is focused on coming up with a descriptive tool like an adhoc curve, before the underlying principles are discovered.

I think Chomsky is afraid that statistical models will cause people to stop looking for the underlying principles.

sethg · on May 27, 2011

This essay made me think: Lojban (http://www.lojban.org/tiki/la+lojban.+mo), among constructed languages, is the categorial language par excellence. Every word has a well-defined range of meaning; the grammar can be parsed by the same kinds of parsers used for programming languages; potential sources of ambiguity, like plural references, associativity of modifiers, and negation, have been rigorously (or tediously, depending on how you roll) nailed down.

Can there be such a thing as a conlang that demonstrates the ideal statistical grammar and semantics? (“All the words in this list are 60% likely to be used as nouns and 40% likely to be used as verbs....” But in the absence of a pre-existing linguistic community, how could you get students of the language to use them in the right proportions?)

cma · on May 27, 2011

Chomsky's April 8th lecture at Carelton University on language had several thoughts on machine translation:

http://www.youtube.com/watch?v=XbjVMq0k3uc

(I think it even had the same bee-dance example)

double-z · on May 27, 2011

The commentary has nothing to do with what Chomsky proposed. The author defines success as "being successful at accomplishing a task". That has nothing to do with science. Full stop.

macmac · on May 27, 2011

Are Norvig's comments on the "I before E except before C." really vaild? Why would one use a corpus for analysis of the rule, and not a dictionary? It appears to me that "CIE" (P(CIE) = 0.0014) is more common than "CEI" (P(CEI) = 0.0005) because the words that contain the "exception" "CIE" are used more frequently in the corpus than the words that follow the rule "CEI". Once you know the limited number of exceptions (in the dictionary sense) the rule appears to preserve its relevance.

jimbokun · on May 27, 2011

I suppose a the most useful corpus for this rule would be spelling tests.

noahlt · on May 27, 2011

Strangely appropriate is today's XKCD: http://xkcd.com/904/

kenjackson · on May 27, 2011

Hmm... I never thought of it that way. That sports are a weighted random number generator,but the various weights are unknown. And the commentators are discussing theories as to what the weights are, and how derived. (Although the cartoon seems to be saying the narratives are just about the numbers generated, which is more cynical, and frankly less interesting).

yourcelf · on May 28, 2011

Actually, Larry Birnbaum over at Northwestern is doing exactly that: http://infolab.northwestern.edu/projects/stats-monkey/

They take the coded sports results, and automatically generate narratives using statistical speech models. They have a startup that is doing it too, don't recall the name of it....

EDIT: I believe this is it: http://narrativescience.com/

_grrr · on May 29, 2011

I've been monitoring the page this post points to with a bookmarking tool we've just released in beta. Here are the latest set of changes:

http://app.bookmarkerpro.com/changes?fmt=html&id=2573

Quite a few revisions since first posted to HN!

_grrr · on May 29, 2011

More revisions... http://tinyurl.com/3sabdc9

niels_olson · on May 27, 2011

This whole theory vs observation argument exists at the very pinnacle of human thought, expressed in the Copenhagen interpretation. If you want to contribute to the human understanding of this, you'll have to beat Bohr and the uncertainty principle.

Create · on May 27, 2011

Fourier was there first.

niels_olson · on May 28, 2011

My claim wasn't first, it was top. You up-end the Copenhagen interpretation, show the universe really is deterministic, and every other argument on this subject, in every discipline, collapses. As it is, the arguments are almost certainly failed, but it's not quite a cinch. Because probability admits determinism as a special case. One of the deep points of Norvig's essay.

galactus · on May 28, 2011

It is interesting than on a completely different debate, chomsky takes norvig's position (he is accused of not looking for a "theory" and "whys" and he replies that it is pragmatic results that matter):

http://mindfulpleasures.blogspot.com/2011/01/noam-chomsky-on...

davidmathers · on May 28, 2011

Chomsky called the Watson computer that won Jeopardy "a bigger bulldozer." He goes into more detail about his AI opinions here: http://www.framingbusiness.net/archives/1366

borism · on May 27, 2011

And while it may seem crass and anti-intellectual to consider a financial measure of success

Why are other metrics Norvig provides like articles published or prevalence in practical applications are considered more intellectual?

And besides, I don't think "accurately modeling the world" is the end of it. Classical Newtonian mechanics correctly describe 99% of our activities in the real world and were considered pinnacle of scientific achievement for several centuries. Yet we know today that they're just a subset of General relativity and Quantum mechanics.