If you can excuse the slightly combative tone, data-driven (i.e. statisical) NLP is a big potato and Chomsky was dead on the money: you can model text, with enough examples of text, but you can't model language. Because text is not language.
Which is why we have excellent dependency parsers that are useless outside the Brown corpus (if memory serves; might be the WSJ) and very successful sentiment classifiers for very specific corpora (IMDB), etc, but there is no system that can generate coherent language that makes sense in a given conversational context and even the most advanced models can't model meaning to save their butts. And don't let me get started on machine translation.
Like I say - apologies for the combative tone, but in terms of overpromising, modern, statistical NLP takes the biscuit. A whole field has been persisting with a complete fantasy -that it's possible to learn language from examples of text- for several decades now, oblivious to all the evidence to the contrary. A perfect example of blindly pursuing performance on arbitrary benchmarks, rather than looking for something that really works.
There are other issues like keeping track of the context, in which they suck (as of now). And right now it is like text-skimming quality, rather than "understanding" of text.
For understanding meaning, it seems that text is not enough, we need embodied cognition. Not necessarily walking robots (though, it might help) but being able to combine various senses. Some concepts are rarely communicated explicitly with words (hence - learning from an arbitrarily large text corpus may not suffice), but we have enough of experience from vision, touch etc.
> while word embeddings capture certain conceptual features such as “is edible”, and “is a tool”, they do not tend to capture perceptual features such as “is chewy” and “is curved” – potentially because the latter are not easily inferred from distributional semantics alone.
If you can excuse the slightly combative tone, data-driven (i.e. statisical) NLP is a big potato and Chomsky was dead on the money: you can model text, with enough examples of text, but you can't model language. Because text is not language.
Which is why we have excellent dependency parsers that are useless outside the Brown corpus (if memory serves; might be the WSJ) and very successful sentiment classifiers for very specific corpora (IMDB), etc, but there is no system that can generate coherent language that makes sense in a given conversational context and even the most advanced models can't model meaning to save their butts. And don't let me get started on machine translation.
Like I say - apologies for the combative tone, but in terms of overpromising, modern, statistical NLP takes the biscuit. A whole field has been persisting with a complete fantasy -that it's possible to learn language from examples of text- for several decades now, oblivious to all the evidence to the contrary. A perfect example of blindly pursuing performance on arbitrary benchmarks, rather than looking for something that really works.