Summary: The Winograd Schema Challenge was run on July 11, 2016 at IJCAI-16. There were four contestants. The first round of the challenge was a collection of 60 PDP's. The highest score achieved was 58% correct, by Quan Liu, from University of Science and Technology, China.. Hence, by the rules of that challenge, no prizes were awarded, and the challenge did not proceed to the second round.
It is scary to see that all the deep learning methods can be defeated by such simple methos (both in NLP and vision). I am genuinely curious why none of the recent Deep NLP approaches work here.
I heard a researcher describe deep learning as just standard ML with feature engineering replaced by architecture engineering.
If that is the case, we need fundamentally new ML approaches.
> I am genuinely curious why none of the recent Deep NLP approaches work here.
To my understanding, it's because deep learning is fundamentally aimed at pattern-recognition, but makes no attempt to have a model of "reality" underlying that recognition.
I've seen some people argue that these models of reality are, themselves, nothing more than higher orders of pattern-recognition, and therefore ought to be amenable to deep-learning type methodologies. I have my doubts. Deep learning requires reinforcement training from big datasets, and the process of modelling reality does not often accommodate.
The process of learning to see things, navigate through space, etc., is something that you do through reinforcement training -- more or less like a neural network. Newborn babies don't see "things": they see colours and lights and motion, and it takes a couple of years of training before they reliably classify those sensory inputs. That's very much a big data exercise: every instant your eyes are open, you're collecting more data and training on it.
In contrast, consider the schema "The council denied the protesters a permit because they [feared/advocated] violence." The reason you can parse that sentence is not because you have trained yourself on a dataset of thousands of councils and tens of thousands of protests, and can therefore now recognise which might be fearing violence and which might be advocating it. Instead, you've built a model of the world out of vastly sparser data, via structured logical inferences. Which just isn't remotely the way that deep learning works.
So current AI techniques seem to have gotten very good at matching (and surpassing) a subset of cognition, but I'm not convinced that they will scale to cover these kind of schemas. I do think that fundamentally new ML approaches will be needed for these kind of domains. Neural Networks seem to be good at cerebellum and occipital lobe sort of tasks, but we'll need something else for the frontal lobe.
> To my understanding, it's because deep learning is fundamentally aimed at pattern-recognition, but makes no attempt to have a model of "reality" underlying that recognition.
Isn't that just the issue that you're essentially making a crude analog of the first layers of the visual cortex and then overtrain them to only recognize specific patterns while ignoring the rest of reality because you simply don't have the higher-order facilities in current networks?
In other words doesn't the critique boil down that ANNs are orders of magnitude simpler than the human brain and thus can only fulfill far more narrow task then many combined abstraction layers?
New techniques are getting developed only as hardware advances in capability to actually run more complex networks.
> doesn't the critique boil down that ANNs are orders of magnitude simpler than the human brain
No, it's not a difference of scale, it's a fundamentally different learning algorithm. Human brain trains interactively. From birth you being experiments controlling the environment around you. Your ability to perceive the world and your ability to control it are one in the same.
You can't understand the world without participating in it. And if you are participating in it, you're stuck on the same slow timeline as every other animal. No more "train on 1 million hours of data in 30 seconds". You poke a quantum field, you wait for your bit of information back. There's no possible speedup because you're up against physics. Want to integrate the experiences of 1000 remote AIs? Fine, that'll be 1000 quanta please. In order, one at a time.
And you're going to have to relearn all of the trillions of models our genes already learned over billions of years before you even catch up to a human baby.
Well, neural response in biological creatures is very slow---far slower than the clock cycles of an electrical computer.
Yet somehow, biological animals seem to demonstrate more intelligence and adaptability than machines. Perhaps, the issue is more with the machine architecture, than the speed.
Nah, no need for new ML approaches. It's just time for the pendulum to swing back again from the current craze of data-driven/"deep learning"/continuous modeling to the alternative of symbolic/"Good Old-Fashioned AI"/discrete modeling.
The switch happens every ~30 years: around the ~1950s, in the late 1980s, and perhaps again soon...
I realize I'm glossing over many nuances of how the most successful AI/ML approaches combine together continuous and discrete modeling, and symbolic and numeric techniques.
Still, this is a real cultural divide for researchers. The "deep learning" types just want to increase their match/success percentages as high as they will go--with little regard for how the sausage is made--while the symbolic types care about whether the model itself provides actual explanatory value.
I participated in the WSC with GOFAI, though I am not representative of the state of the art. I would recommend GOFAI logic combined with a knowledge database assembled through machine learning (my database was virtually empty). However I found the main problem in this challenge to be that one had to solve all aspects of language processing before one could begin to solve the pronouns. My home-made parser just wasn't up for processing relative clauses of relative subclauses.
https://artistdetective.wordpress.com/2016/07/31/winograd-sc...
The highest scoring entry used a deep neural network:
https://www.cc.gatech.edu/~alanwags/DLAI2016/(Liu+)%20IJCAI-...
I'd say it's not doing great because deep learning here merely observes the statistical co-occurrence of words, and hence are only correct insofar as the sentences are common. I'm sure with more data they could get a good 75% of it right. More of a problem is that pure textual comparison is oblivious to the underlying logic. Sometimes the verbs may correlate as a common sequence of events, but a preposition like "in" denotes a logical physical constraint that should overrule the mere correlation of verbs.
It is scary to see that all the deep learning methods can be defeated by such simple methos (both in NLP and vision). I am genuinely curious why none of the recent Deep NLP approaches work here.
I heard a researcher describe deep learning as just standard ML with feature engineering replaced by architecture engineering.
If that is the case, we need fundamentally new ML approaches.