Great project and I enjoyed reading the paper. Thanks for posting it.
I was on a DARPA neural network advisory panel in 1998 and 1999, and I used simple 1 hidden layer or 2 hidden layer back prop networks for several projects.
My mind is blown by both the computational tricks for training many layer networks and the advances of using GPUs to train large networks. We built our own neural network hardware that was excellent for the time, but the progress in the last decade is enormous.
Still I also believe in the power and utility of so-called 'symbolic AI', but I think I am in the minority.
This is the type of project which really interests me. In the future we plan on building a service to wholesale trivia questions to pubs, schools, fundraisers, etc. I'll read this paper with interest, because at a quick glance the generated questions do sound great.
The next step from generating good quality questions that sound natural, is to make sure the the questions are up to date and topical. This means keeping up with current affairs, memes, popular cultue, and other trends. Adding questions about "which politician is in trouble for x" really only remain relevant for about a week or two and sit in a different category than these pure fact style questions presented here.
It would be great to see this research team or others to continue in this direction, and look at an increasingly broad metrics of what makes a "good" question, in addition to whether the English phrasing sounds good.
Read this paper from July of last year and it'll make more sense as to why this research is interesting: http://arxiv.org/pdf/1506.05869.pdf.
Seeding a dataset with human labels, then generating new, better data is pretty cool. Similar to deep mind watching go games, then learning to play better than said games. Add to which, humans can't tell the difference between the human generated data and algo generated data.
We're about 3 months from where you can bootstrap a system like microsofts with freely available code and data. From there, if "memory modules" start working better so models can "remember specific context" (it can already remember general context), you'll have a bot that's a pretty good model for passing the Turing test.
I'm defining a practical Turing test as text-based, human gets 5 questions, other party replies 5 times, then human must predict: human/not human.
So, the generated question and answer match, but the answer isn't exclusively the right answer. There are thousands of "right answers".
They do note this: "We have also observed that the questions are often ambiguous: that is, one can easily come up with several possible answers that may fit the specifications of the question."
But they don't say anything more about it. I suppose that's just out of scope for them, and something someone else works on?
This doesn't make the second definition of factoid (in GP post) wrong. Prefixes, suffixes, and even roots end up being used in words that don't follow the literal definition suggested by them. They typically don't start life that way, usually closer to the literal meaning pedants will later expect to be used, but, overtime, usage causes a definitional shift.
I was on a DARPA neural network advisory panel in 1998 and 1999, and I used simple 1 hidden layer or 2 hidden layer back prop networks for several projects.
My mind is blown by both the computational tricks for training many layer networks and the advances of using GPUs to train large networks. We built our own neural network hardware that was excellent for the time, but the progress in the last decade is enormous.
Still I also believe in the power and utility of so-called 'symbolic AI', but I think I am in the minority.