Solving Verbal Questions in IQ Test by Knowledge-Powered Word Embedding [pdf]

one-more-minute · on June 13, 2015

> Considering that IQ tests have been widely considered as a measure of intelligence, we think it is worth making further investigations whether we can develop an agent that can beat human on solving IQ tests ... we could be a further step closer to the true human intelligence.

This is a bit like saying, "the best runners tend to be pretty tall, and we've made a robot which is really tall – so running robots are just around the corner."

IQ tests certainly correlate well with intelligence (in humans), but they're a metric, not the thing itself. Another metric would be mental arithmetic; people who can do sums quickly tend to be pretty smart, but that doesn't mean that calculators are a step away from super-intelligences.

Interesting and cool work but let's be careful in the interpretation (and remember that people said the same things about chess playing not that long ago).

Jimmy · on June 13, 2015

People fall victim to this fallacy in their own thinking all the time. They want to get good at doing X, but task Y is a lot easier and success at Y is mildly correlated with success at X, so they start practicing Y instead of X. It's the reason why things like "brain games" are so popular. In reality, people's time would almost always be better spent just practicing the primary task they want to be good at.

That being said, making an AI that does better on IQ tests than humans is a rather interesting and worthwhile endeavor.

dschiptsov · on June 13, 2015

In accuracy of decoding (pattern matching)? What it had to do with IQ?

The decades old system for handwriting digits recignition for US Post beated humans (now it is a few lines of Octave in Andrew Ng's course). Still it cannot write a reply to a letter.)

joe_the_user · on June 13, 2015

I'm reading the article and it's talking about solving analogies, antonyms and similar problems ... as found on a standard IQ test.

I understand this isn't intelligence and the title doesn't imply it but I'm not sure what your reference to decoding is about.

dschiptsov · on June 13, 2015

But it still seems like pattern recognition based on a training set only, which is, in my opinion, a task on a lever prior to intelligence, like what visual cortex does. It cannot make new reference between words to produce (infer) new antonyms and analogies (not presented in a training g set), which is intelligence.

ganz · on June 13, 2015

The system isn't training on antonyms and analogies - it's training on wikipedia. It's learning the meaning (and multiple senses) for every word it can find.

The test they use to see if it actually learned what these words meant, in a limited sense, is to test it against a subset of verbal IQ tests (not what it was trained on!). You could ask it the antonym, synonym, or analogy for anything in English. This is an extension of word2vec / word embeddings.

That it beats the scores of college graduates impresses me.

fauigerzigerk · on June 13, 2015

"it's training on wikipedia. It's learning the meaning (and multiple senses) for every word it can find."

I don't think that is entirely correct. After cursory reading of the paper, my understanding is that they look up a list of word senses for each word in a dictionary (or multiple dictionaries). And then they try to learn something about each of those word senses from wikipedia (that is they create seperate word embeddings for each of those senses). So what they do not do is to learn what senses a word has. That is done by the humans who created the dictionaries.

What that means is that they cannot pick up new senses of words, which doesn't matter for answering IQ test questions because these questions rarely change and are typically based on well established word meanings.

Unfortunately it makes this approach less than ideal for things like understanding the news (something I'm working on), where new contexts of words keep popping up all the time.

joe_the_user · on June 13, 2015

Well, it is teaching a computer to do well on an "intelligence test" and ironically many of the ways that humans use to distinguish each other's intelligence tend to not measure the unique, flexible and adaptable properties of human intelligence and rather tend to be tests of more computer-like behavior in human - playing chess well was for a long time considered a measure of high intelligence, for example.

Qantourisc · on June 13, 2015

Personally I'm not sure this helps for generic AI (directly). But could serve as an autonomies piece to help the generic AI with linquesitics (especially on shaving CPU cycles)? A bit like you can use your visual memory for certain things?

huac · on June 13, 2015

How reliable is MTurk for an IQ test? I presume the respondents don't put in that much effort on these quizzes.

compbio · on June 13, 2015

MTurk is representative of the human population. I hear it often that people presume that MTurkers do not put in any effort for a few cents. But for most it is a hobby/nice pastime. They do it to make a few bucks, instead of playing chess or reading a book. Most people do not get a nickel playing chess, though really put in the effort. MTurkers are similar.

See here for an MIT experiment in blurry text transcription: http://groups.csail.mit.edu/uid/deneme/?p=329 for the unexpected accuracy resulting from crowd-sourcing.

jyzzmoe · on June 13, 2015

And yet ...

http://arxiv.org/pdf/1412.1897v1.pdf%EF%BB%BF

Houshalter · on June 13, 2015

Not really relevant to this. That deals with images, where you can perturb a huge number of pixels slightly to exploit weird edge cases. Even linear classifiers break on it, e.g. https://karpathy.github.io/2015/03/30/breaking-convnets/

empiricus · on June 13, 2015

I'm pretty sure that if you move just a little the image, the convnet is no longer fooled. So this kind of exploit works only for completely stationary images, and it is very to overcome.

JuliaLang · on June 13, 2015

I can break humans too: https://en.wikipedia.org/wiki/List_of_optical_illusions

jyzzmoe · on June 13, 2015

Humans are able to recognize these as optical illusions. (otherwise we wouldn't be calling them "illusions") Further, none of the optical illusions shown are so wildly "off" and "crazy" as the algorithmic goofs demonstrated in the paper referenced above.

DavidSJ · on June 13, 2015

They're obviously crazy to us. Our illusions might look obviously crazy to them.

And generally humans don't natively know when they're experiencing an optical illusion. They have to be taught it. And in either case, it's not the human vision system that learns the lesson, it's some other part of the brain that learns to discount the vision system's conclusions.

breuderink · on June 13, 2015

Further, these inputs were /optimized/ to confuse the particular classification software. If only we would be able to optimise optical illusions for a given human viewer...

one-more-minute · on June 13, 2015

I bet you could do it with sound, too. Imagine producing what would seem like abstract noise – unrelated to anything in the natural world – but with particular structures and sequences of tones optimised to produce particular emotive responses; pleasure, excitement, calm, energised dancing, romantic dancing...

... wait, don't we already do that?

clickok · on June 13, 2015

Interesting and cool. It's almost like a magic trick, particularly in the sense that an almost unbelievable result has been achieved through putting in a serious amount of effort. It's not just a gimmick, though-- projects like this demonstrate the steady creep of progress on seemingly unsolvable problems[1]. But of course, there's thousands of such problems out there, with millions of potential customers ready to make whoever solves one a billionaire, so even if "true" AI fails to appear, odds are good we'll make a fair approximation for specific applications.

One thing that popped out at me, however, was the distribution of the human scores: monotonically increasing with age, which was somewhat odd. Shouldn't it be more or less normally distributed?

1. I wouldn't have thought to make a serious attempt towards beating a verbal comprehension test with deep learning; the possibility of succeeding would've seemed tiny in compared to the work. Similar to the notion that trying to prove the Riemann Hypothesis is hard to think of as a real pursuit, at best it's like a quixotic hobby.

jphilip147 · on June 13, 2015

It is great to know how deep learning is exploring various possibilities.