A Deep Learning Dissenter Thinks He Has a More Powerful AI Approach

yconst · on Dec 17, 2015

I found the first comment to this article quite interesting:

"Marcus has a point - even is some of what is said about deep neural networks is incorrect (for instance, they can learn and generalize from very few example, one shot learning).

However, he got it wrong with the answer. The key for machines to reach the symbolic abstraction level is the way we train them. All training algorithms, supervised, unsupervised or reinforcement learning with LSTM rely on the assumption that there is an "utility function" imposed by some external entity. Problem is, by doing so, we are taking away their capacity [of the machines] to make questions and create meaning.

The most important algorithm for learning is "meaning maximization" not utility maximization. The hard part is that we cannot define what is meaning - maybe we can't, I'm not sure. That is something I will be glad to discuss."

thom · on Dec 17, 2015

You could always compare how much your perceived reality matches predictions from your mental model. If you look at the concept of 'schemas' in child development, you'll see children focusing on one thing over and over again, like 'containment'. Putting a block in a box, taking it out, putting it in, taking it out. Each time they're learning that when you put something inside something else, it's still there even if you can't see it, that you can pick them up together etc etc. They're slowly exercising the different ways they can interact with their environment, learning to model likely outcomes. Learn enough of these schemas and you can start making more complex plans to see what the world does in response.

orthoganol · on Dec 17, 2015

> The hard part is that we cannot define what is meaning - maybe we can't, I'm not sure

I think you've correctly identified the million dollar question in AI, and that's why I believe the major breakthroughs in artificial general intelligence won't come from scientific researchers. They might even come from PG's dreaded literary theory types, ones who are also talented programmers (...Lit theory being maybe the only field that thinks about structuralism, deconstruction, how meaning is manufactured, etc. - things outright dismissed by the scientific community as useless, but which still hold potential to produce a 'meaning algorithm').

emcq · on Dec 17, 2015

This is perhaps a bit philosophical, but assuming we had a "meaning maximization" function, what stops us from writing it as a loss function and using our current supervised machine learning frameworks?

sawwit · on Dec 17, 2015

Just because we can formulate an optimization objective, it is not guaranteed that we will find an algorithm that solves it within a reasonable amount of time. In case of humans, these objectives or preferred states are possibly very simple ones, like hunger and pain avoidance, reproduction and curiosity; and it is actually easy to write down an algorithm that optimizes these objectives (if you ignore how reality actually works): Simply try out all possible ways of reacting to the environment and choose the best one.

This works in theory, but in practice you only have a limited amount of chances to try something out (because of the arrow of time). This makes learning a necessity. You need to keep a record of all trials you have performed so that you can reuse this information later when the same situation reoccurs. How to do this in an optimal way is described by Bayes' theorem.

The key to AI will be a certain set of priors, biases and fixed function units that make this computationally tractable; we'll likely need things like invariance of the learned information to various changes so that it can be reused in different settings, segmentation of data coming from the world into episodes (hippocampus), attention, control (basal ganglia), mental rotation (cortex) and path integration (hippocampus, grid cells).

emcq · on Dec 17, 2015

That's true, there are certainly many optimization objectives computationally intractable, or perhaps too abstract to be useful for learning.

However, I would argue the prior of Bayesian modeling can be just as nebulous and computationally intractable as an optimization objective. Like supervised learning, Bayesian modeling is just a tool.

I'm skeptical that we will reach AI through a deep understanding or modeling of the brain. Technology and computer science advances more quickly than the biological sciences, at least in recent times. You might argue a success in robotics like [0] is a motor control system. But they built this extending mathematical frameworks not being biologically inspired, and the big wins there didn't come from fixating on a learning framework or biological mimicry; just like humans learning to fly didn't come about from flapping wings like a bird. At some point we hacked an engine (invented for other purposes) onto a wing and came up with powered flight.

As an aside, only seeing input a limited number of times would likely improve your ability to find models that generalize as your model must be able to take these one off learnings and unify them in some way to achieve high training performance. With respect to human learning, a specific individual only has one chance, but nature has had many. We are only a selection of those chances that seemed to work well enough. There are many commonalities to existence that allow for this to work well in practice.

[0] http://groups.csail.mit.edu/rrg/papers/icra12_aggressive_fli...

an_opabinia · on Dec 17, 2015

Your agent may need a way to ask for a "kind" of training instance from the world in order to maximize meaning. Like maybe I've seen mammals, and now I need to see another kind of animal to maximize my understanding of the meaning animal. A human being—perhaps instinctively, or perhaps by some other force—has the curiosity to go find / pay attention to fish and birds. A kid can tell you he wants to go to the zoo.

A supervised machine learning framework can't tell the researcher what training instances it needs to see in order to improve its meaning. A supervised learning framework can't imagine where it might find that training instance, or describe what it may look like. A supervised learning framework never asks to go to the zoo.

saulrh · on Dec 17, 2015

Yes, in theory an offline supervised learner should never beat an online reinforcement learner. Adding a set of actions A that can be used to bias future examples in a predictable manner is certainly an advantage that will yield better convergence properties in almost all scenarios, simply because it lets you gain more information per observation.

jcfrei · on Dec 17, 2015

I'm just curious: Can there be intelligence without some externally imposed utility function? For us humans it usually consists of access to (in decreasing order of importance) oxygen, water, nutrition, shelter, closure, etc. If an AI was free of all these constraints - meaning it had no predetermined utility function to maximize - then what is it supposed to do? It would have no motivation, no reason to learn or do anything. And if it had no goals which it tries to reach, how could we determine its performance or rather its intelligence?

yconst · on Dec 18, 2015

Interesting point. However, I'd like to argue that there is a fine line between the conventional definition of a utility function and the ability of an organism to survive in an environment. The latter is truly open-ended, and not really externally imposed, i.e. different animals/people fare throughout life "optimising" completely different functions (closure?). A quote from a Kozma paper comes to mind: "Intelligence is characterized by the flexible and creative pursuit of endogenously defined goals". I believe this quite well summarises the open-ended nature of the task at hand. As I understand it, indeed, goals, rewards, risks, hazards, they are all in the game and shape the decision-making of the agent, it's "policy". But the way they are formalized for each individual and situation, well, is probably subject to constant redefinition itself.

JulianMorrison · on Dec 17, 2015

> All training algorithms, supervised, unsupervised or reinforcement learning with LSTM rely on the assumption that there is an "utility function" imposed by some external entity. Problem is, by doing so, we are taking away their capacity [of the machines] to make questions and create meaning.

It's just a simplification. We are consuming the algorithm's output, rather than it being for internal consumption in a larger mind. So we supply the goal, where in a larger more natural mind the goal comes from other parts of the mind - still computing ultimately against a fixed utility function. That function being our human instinctual basis and the "firmware" of culture it's primed to load.

joe_the_user · on Dec 17, 2015

Well,

It's not just training that implies utility maximization, basic internal processes of a neural network, such as back propagation, are based on the utility function maximization. Back propagation is an approach that allows the network to do gradient descent on a utility function by propagating errors through the system.[1] Back propagation has been the way that deep networks have become tunable and effective. Without a utility function, it's hard to see how one would tune them.

And it's hard to think of what "mean maximization" could be but maximizing a utility function based on meaning.

And while you can construct a neural network that learns from a single example, the nn framework is fundamentally based on learning from multiple examples so such a construct is mostly meaningless.

But the point about machine learning being limited by the training methodology is good, it's just I'm pretty sure you'd different algorithms if you used a different methodology, the existing algorithms are used specifically because they fit the methodology.

[1]https://en.wikipedia.org/wiki/Backpropagation

vonnik · on Dec 17, 2015

I'm glad people are working on better approaches to AI, but I don't think articles like this, based on sources like Marcus, contribute much.

Mostly because Marcus isn't willing to talk about what he's working on right now. Hand-wavy stuff about how children learn, a tired analogy that's been trotted out many times before. This article was too early, and it doesn't contain real news. Reproducible results, or it didn't happen...

Fwiw, the statement below is simply untrue.

> “If you want to get a robot to learn to walk, or an autonomous vehicle to learn to drive, you can’t present it with a data set of a million examples of it falling over and breaking or having accidents—that just doesn’t work.”

The entire premise of Deepmind's work combining DL and reinforcement learning shows that autonomous agents learn from millions of examples.

And in fact, data is not the limiting parameter here. Trying to create an AI that doesn't need big data will be great for edge computing -- but we're actually swimming in data and using it to set new records every year -- http://deeplearning4j.org/accuracy.html -- so we might as well rely on it. Ignoring big data in the pursuit of AI is like ignoring sunlight in the pursuit of clean energy.

Deep learning is one of the best ways we have to use that data. In a sense, deep is the new normal.

_dps · on Dec 17, 2015

> but we're actually swimming in data ...

I think this is a "looking under the lamppost for your keys because that's where the light is" situation. We hear about tons-of-data and deep learning because teams like Google/FB have problems that fit those situations well. This is not representative of the vast sea of learning applications most organizations face.

I work on learning applications day-in-day-out. In a typical year I work with over 50 different organizations/projects. Almost none of them have data anywhere near what deep learning requires. Here are just a few examples from the past year:

1) optimizing recruiting pipeline. Maybe you have ~1000s of data points per year

2) medical billing applications: typically hundreds of data points per year

3) novel but slow-moving financial instruments, maybe 1 data point per day

4) automated sensor calibration where each run of the hardware costs you a few hundred dollars: depends on your budget, but thousands of data points in a year are representative

I think Google/FB and the "web+mobile is everything hivemind" (I'm not saying that applies to you) have deeply distorted people's expectations for how much data is available to solve a typical problem.

justifier · on Dec 17, 2015

    Ignoring big data in the pursuit of AI is like ignoring sunlight in the pursuit of clean energy. It's everywhere. Why not use it?

I'm persuing clean energy without sunlight

My argument in response would be we are swimming in a lot of things

Data and sunlight are only two of them

wisevehicle · on Dec 17, 2015

I hope that anyone who incorporates this concept into their AI also measures the number of mistakes which become heuristics and biases in the AI. I would hypothesize that there is a link between the willingness or ability of the human mind to learn this way and the propensity to accept ( even DESIRE ) an answer to a question that, in truth, there is not adequate information to answer correctly.

This tendency leads to both helpful and dangerous heuristics and biases. From these biases humans build false beliefs that are damaging to themselves, their communities, and in the long run the species as a whole. If AI is about enabling humans to do more and better, should we not accept the failings of the current technologies in favor of assuring that they do not fall prey to the same biases and heuristics that lead humans to slaughter each other over the religious dogma of the past, destroy the environment with impunity, and accept ideas that 'feel' correct over being correct?

rm_-rf_slash · on Dec 17, 2015

A reason to fear human AI would be that they would realize how destructive humans are and how little they perceive us necessary to the evolution of the species.

wahsd · on Dec 17, 2015

You make a point that really needs far more consideration. Do we really want human AI? It strikes me that the very notion of a human like AI at exponential capacity is a terrifying thing not for its potential for advancement and good, but for its potential for falling for the same fallacies that humans fall for on a regular basis and are mostly relative as is. Humans are wildly imperfect even, and especially when they think they are correct or right. That does not sound like a characteristic I would want an AI system to have.

mindcrime · on Dec 17, 2015

OK, to be fair, I skimmed TFA and didn't read every word. But to the extent that I get the gist of it, I'd say this:

I don't know that anybody seriously proposes that deep learning is the be-all end-all of AI techniques. It's VERY powerful for a lot of things, but I think DL researchers are aware of things DL doesn't do / isn't good at. Look at the recent book The Master Algorithm which breaks down a lot of what it would take to create a truly general purpose learning algorithm: If you believe the author's thesis, Deep Learning (or something like that) is just one piece of a much larger picture.

And without trying to start a debate over the merits of ML versus "GOFAI" or symbolic computation, etc., I think it's fair to say that DL doesn't really add anything in terms of reasoning. It's great at saying "this picture has a cat in it" or "this wav file says 'Hello, my name is mindcrime'", but that's a pretty small part of what human intelligence can do.

cjauvin · on Dec 17, 2015

I'm curious about that book (The Master Algorithm): I began reading it, but stopped early because I got the impression it would be too "entry-level" for me, thus a waste of my time. Should I consider keeping on with it, and why?

mindcrime · on Dec 17, 2015

I'm curious about that book (The Master Algorithm): I began reading it, but stopped early because I got the impression it would be too "entry-level" for me, thus a waste of my time. Should I consider keeping on with it, and why?

It's hard to say. It's not a deeply technical book, I will say that. The main value I found in it, is that he covers a broad base of different techniques and then lays out some ideas on how they could all be combined to make a "general purpose learner". I found it worth reading, but YMMV.

rayuela · on Dec 17, 2015

If you're experienced in ML or AI then most of the material in the book will seem pretty elementary. His thesis is interesting as he expresses an interest in unifying the various different approaches to the creation of an intelligent machine. I would recommend youtubing one of the talks he's given in recent months. He's basically been going around giving a 1 hour summary of the book.

mattmcknight · on Dec 17, 2015

I would say no, if you already understand the varieties of techniques it covers. It doesn't get more advanced. However, it is a decent historical overview.

He also pushes the same idea over and over again that there must be a master algorithm, when I got the feeling very early on that it wasn't a necessary thing or a practical concept to guide research. It's like arguing that you have to choose between electricity and magnetism, when they might just be aspects of the same underlying forces. Also, his reasons for claiming that certain algorithms are not the master algorithm are pretty weak and based on current progress, not theoretical limitations.

eli_gottlieb · on Dec 18, 2015

What does he have to say about the No Free Lunch Theorem and the bias-variance tradeoff?

cshimmin · on Dec 17, 2015

Does someone have a link to a concise technical explanation of what this is about? I can't handle the vague long-form writing about some guy's startup and 2-year-old on my morning pass through the news. I'm reminded of the current southpark season, feeling like this is an ad being passed off as news.

eli_gottlieb · on Dec 17, 2015

TL;DR: Deep learning can't generalize to novel inferences as well as probabilistic programming.

rdtsc · on Dec 17, 2015

> In contrast, a two-year-old’s ability to learn by extrapolating and generalizing—albeit imperfectly—is far more sophisticated.

How does he know? He only sees the output and not how the brain works or how much information has been processed before hand. So it seems like the brain took an easy shortcut and generated a new rule, and you could just write a bit of code to do that as well.

But isn't this rule learning just a few more layers in a learning network. Basically a learning network that operates on concepts not on just raw pixels.

One thing I noticed with my kids is they have a ridiculously good memory. For a while they almost fooled me that they could read. They would remember a word in a book and then a month later point to it and pronounce the word. So I thought, oh wow, you could read that. But then realized they just remembered it. It seems a lot of how a brain learns is just be absorbing and processing massive amounts of data, not unlike a learning project Google or these other companies have.

Now on the subject at hand, haven't there been expert systems that do what he proposes. In a way IBM's Watson is a successor to many of those expert systems. So one can say this has already been implemented and tried, it works in some instances but not others.

tinco · on Dec 17, 2015

Until our current big data AI revolution, AI was one investment catastrophe after another. Now money finally is flowing into AI again, and intuitions like this idea, backed by a little big data power can be explored again.

So even though the idea is fairly simple, I think anyone who thinks about AI for any trivial amount of time will come up with the idea that AI's somehow have to automatically generate imperfect generalizations, the final result might be a truly innovative product.

Actually building an AI that does this on a scale that results in useful applications is something that probably has not been practical for a long time, and might or might not be now.

mindcrime · on Dec 17, 2015

Until our current big data AI revolution, AI was one investment catastrophe after another.

Maybe from the point of view of the investors themselves, in most cases, if they were expecting a big, immediate financial return. But the technology has been improving steadily (ish) throughout the years. As AI people sometimes say "AI is whatever computers can't to $today". That is, as soon as AI reaches a point where it can do something useful, people stop considering it AI.

I saw an example of this yesterday while having a conversation with somebody much younger than me (I'm 42) when we got on this topic. I was saying that things like the voice recognition in that Google Now, Siri, Cortana (sp?), etc. use would be good examples of what would have been mind-blowing AI capabilities 20 or 30 years ago and he said "Oh, that's AI?"

No, it's not AI now, because now it's common-place. :-)

Of course, none of that is to say that we aren't still a long way away from a true "AGI". But AI research has been yielding useful techniques for quite a long time, even if we didn't get "there yet" for AGI.

0xdeadbeefbabe · on Dec 17, 2015

Do you have reference for one investment catastrophe after the other? I know it failed to deliver on the promise of translating human language, for example.

fractallyte · on Dec 17, 2015

This article lays out those failures: https://en.wikipedia.org/wiki/AI_winter

minort · on Dec 18, 2015

I saw it first hand at an investment management firm in the late 1990s. They hired a team of Neural Nets experts out of academia. Never produced much useful, cost millions and wasted lots of peoples time. My understanding is that there were many other similar situations at other firms around that time. As far as I know there were no great successes.

arnia · on Dec 17, 2015

An interesting article, but (understandably, if frustratingly) very light on the details. It reminds me of some of the work being done in the Artificial General Intelligence (AGI) community. In AGI, you are looking to come up with more universal approaches to mimicking intelligence, rather than bake-an-architecture-solve-a-specific-problem.

In particular, this reminds me of some of the operational logic work done in OpenCog (http://opencog.org/) and, especially, Pei Wang's Non-Axiomatic Reasoning (http://cis-linux1.temple.edu/~pwang/papers.html). I've liked Non-Axiomatic Reasoning for a long time, and this sounds in the same broad area.

Pei Wang, incidentally, came up with the best operational definition of intelligence I've seen (and one I apply to other systems and approaches): Intelligence is the ability to act appropriately with limited knowledge and limited resources (including time and space).

For what it's worth, my (published, although very open to change) position is that a layered architecture (akin to old-school subsumption architectures) is probably going to be most effective here. Combining the 'symbol creation' abilities of a deep neural network with the evidential reasoning, generalisation, and planning capabilities of a cognitive layer will allow us to get the best of both worlds.

elcritch · on Dec 18, 2015

Personally I haven't looked into the AIG fields above, however it appears "obvious" that layered "hybrid" approaches will be fundamentally more useful than either deep learning or expert systems alone. In that context the article doesn't seem light on details to me since it's primarily advocating layered approaches and I think there are only limited set of layered approaches that would be useful. Why would a practicioner in AI not agree with a hybrid approach being necessary is what I'm wondering. Are there arguments supported with theory or evidence to the contrary?

The definition from Wang makes sense. How does AIG differ from other methods? For example, I noticed this article on Bayesian Program Learning a few days ago which IMHO takes a layered approach and is based on human learning models. [https://news.ycombinator.com/item?id=10723485] Interestingly there were no comments on it.

Could you post refs to some the relevant publications your published opinions? I'm curious about your approach!

mindcrime · on Dec 17, 2015

For what it's worth, my (published, although very open to change) position is that a layered architecture (akin to old-school subsumption architectures) is probably going to be most effective here.

I have been going back and spending some time with the old "blackboard architecture" idea lately. I harbor a suspicion that that, or something like that, will turn out to be a useful way to integrate the capabilities of various different elements of cognition.

arnia · on Dec 17, 2015

Yes, although it is probably going to have a notion of flow to it too. As someone who subscribes to the cognitive science idea of embodiment, the concepts in our head have meaning entirely because of how they relate senses to actions (however indirectly, and vice versa). I believe AI will be no different.

murbard2 · on Dec 17, 2015

If I had to guess I would say that:

- It involves generative models (needed to make inferences from very few examples)

- It is still a connectionist approach (typical probabilistic programming is great if you have a lot of insight into the model, but not if you're trying to solve a general case... unless you're doing program induction but you need to represent programs...)

- It doesn't involve MCMC sampling for inference, because that's too slow or even intractable.

Some type of variational program induction where programs are represented as differentiable neural networks would be in that corner. Or it could be something totally different, but speculating is fun.

BenoitEssiambre · on Dec 17, 2015

I dunno, I still think it might be possible to make tractable program induction through MCMC.

I've been trying (but failing) to do so by clustering similar generative rules of different complexity together then getting the algorithm to search the grammar space in a way that it recursively tries simpler models first then goes on to try generating (and learning) more complex rules that are known to have similar output to the best simpler ones.

My intuition is that you have to cluster your generative rules into an taxonomy that your algorithm can navigate from the top. It's exponentially inefficient to try to recognize "dog" until you have recognized that the simpler "animal" is a good approximation.

I also think that in the ultimate solution, the output leaf statements of the grammar will be parametrized like a normal programming language, so that for example, the algorithm can generate a color, a radius and then generate 100 circles referencing this single color and radius to represent repeated patterns without having to learn their size and color individually when they are clearly homogeneous or invariant across a bunch. The bayesian, occam's razor, solution to a bunch of similar things that you don't have a category for yet is shared parameters. These parameters enable learning with very few examples. The algorithm doesn't have to learn a full new category to make good predictions, it can simply notice a functional parametric pattern in a part of a scene and extrapolate immediately.

I haven't cracked it however. It's grueling to debug probabilistic algorithms compared to clear cut binary ones!

murbard2 · on Dec 17, 2015

Having some type of hierarchical structure is probably the right inductive bias to have (it may be one of the reasons why deep learning is so successful), but you also need a search direction. It's hopeless to randomly hop in a high dimensional space until you find matches.

BenoitEssiambre · on Dec 17, 2015

The hierarchical structure means you don't hop randomly. You recognize simple and vague approximate models first then bias your search towards the more complex rules that are under you best top level approximations in the hierarchy.

It's like a probabilistic binary search in the model taxonomy.

One thing that makes me hopeful that the learning process is possible is that the non-terminal grammar rules are generic and can always become anything. You don't tend to get stuck in local maxima even if you only search a small part of the program space when any node can always be swapped for a node of any another category.

My problem right now is that my MCMH doesn't mix well. Even though I tried to shorten the jumps between models of different complexity for a particular thing, once the vague model traces are well fitted to the training examples, there is too deep a score chasm to jump to the next level of complexity.

My simulations seem to get stuck in simpler modelisations of things. I want simpler models to recursively open the door to learning and recognizing slightly more complex models just under them in the hierarchy but this doesn't seem to be happening. The hierarchy is not really even forming properly.

I might be missing something stupid. I don't consider myself an expert in these things. I just dabble for fun. It's one of the reason I'm discussing it here. Maybe someone more knowledgeable will make me have an aha! moment.

There is still a bunch of things I could try but like I said, it's long and grueling debugging work.

nickpsecurity · on Dec 17, 2015

Interesting work to see as I've promoted this view myself. Specifically, that the brain is designed to rapidly acquire knowledge with relatively little data in childhood then changes overtime somehow. Not to mention people's reasoning ability that often seems closer to a self-modifying, expert system that NN's don't seem anything like. So, it all uses neurons but different types that smoothly integrate.

Hence, the DNN's modelling a small part of the brain that doesn't even do our thinking isn't going to make thinking machines. The machines will be really stupid and hard to train. Humans are hard enough to train: takes two decades on average. The ability to automate abstraction, understanding, and feedback via example generation all on small amounts of data is critical.

If that's not found, then my next question is, "Will DNN's deliver acceptable models for non-recognition tasks they're aiming at or will they experience The Second AI Winter when truth sets in?"

emcq · on Dec 17, 2015

Perhaps Gary Marcus said it best himself when critiquing Numenta, yet another brain inspired company [0]:

[Brain inspired] models are arguably closer to how the brain operates than artificial neural networks. "But they, too, are oversimplified," he says. "And so far I have not seen a knock-down argument that they yield better performance in any major challenge area."

There are a few of these brain inspired machine learning companies, yet I cant think of a single one acquired by Google.

Probabilistic modeling is great when you dont have much data and want to inject prior knowledge in a cohesive way. However Google's approach appears to be to create large and robust training sets to feed into somewhat conventional supervised learning frameworks.

[0] http://www.technologyreview.com/news/536326/ibm-tests-mobile...

daveguy · on Dec 17, 2015

It seems that he implies that brain models aren't good because they haven't piqued Google's interest. Almost immediately he goes on to say that his method is great even though Google isn't really interested in it either. Seems a little contradictory.

amoruso · on Dec 17, 2015

The article doesn't give any technical details. The website of the profiled company, Geometric Intelligence, doesn't have any technical details.

Cofounder Gary Marcus has a publication list at his academic webiste:

http://www.psych.nyu.edu/gary/marcus_pubs.html

Cofounder Zoubin Ghahramani has his research available at his academic website:

http://mlg.eng.cam.ac.uk/zoubin/

scottlocklin · on Dec 17, 2015

If you look around Zoubin's website you'll find this:

http://www.automaticstatistician.com/about/#

"The current version of the Automatic Statistician is a system which explores an open-ended space of possible statistical models to discover a good explanation of the data, and then produces a detailed report with figures and natural-language text. While at Cambridge, James Lloyd, David Duvenaud and Zoubin Ghahramani, in collaboration with Roger Grosse and Joshua Tenenbaum at MIT, developed an early version of this system which not only automatically produces a 10-15 page report describing patterns discovered in data, but returns a statistical model with state-of-the-art extrapolation performance evaluated over real time series data sets from various domains. The system is based on reasoning over an open-ended language of nonparametric models using Bayesian inference."

discardorama · on Dec 17, 2015

Talk is cheap. He is more than welcome to compete in ILSVRC or on Kaggle or any other competitions of his choosing. Maybe release some code on Github or even an API. There are a lot of options out there to really show how your technique is better, not just talk about it.

"How humans do it" isn't necessarily the only way to solve a problem. We don't know how we add 2 numbers in our minds; but the computer can add 2 numbers orders of magnitude faster than us.

hyperbovine · on Dec 17, 2015

From tfa it does not sounds like he is interested in competing in ILSVRC. His point is that fitting a 500M-dimensional model to hundreds of millions of training examples != intelligence. That is, deep learning is incredibly good at pattern recognition but it is not going to get us to AI as most people understand it.

tachyonbeam · on Dec 17, 2015

Pattern recognition is just one component of intelligence. It's a big, important one though. People need to stop thinking that we'll one day discover some one algorithm for strong AI. It will be a composition of different parts, just like the human brain.

debacle · on Dec 17, 2015

> We don't know how we add 2 numbers in our minds;

We actually have a pretty good idea of the mechanisms for basic arithmetic. We do things about as inefficiently as possible.

spdionis · on Dec 17, 2015

The part that always looked strange to me of comparing AI to humans/children is that humans have had years of time to learn while e.g. neural networks a few days at most?

an_opabinia · on Dec 17, 2015

Yeah, but it's not obvious if every single frame of the whole movie children have experienced is necessarily important.

To put it one way, what is the _rate_ of learning? How many _concepts_ do children learn per unit time?

Concepts: Our computer systems take whole images and train on them, but before a pixel-based image gets to the learning part of a child's brain, it has probably been conceptualized in some way already.

Rates: And then, from this feature vector—maybe 1 feature vector per 10 seconds, for a child—how often is it incorporated in a learned concept? Maybe one concept per hour? Per day? Think about how challenging it is for children to learn vocabulary. A young child doesn't know that much.

The important thing is that a child is a self-learning machine. It chooses what's important and has a way to shift its attention, a critical enhancement of new (2015) AI approaches, deep learning and otherwise. It can explore its own world and somehow choose the "training instances" that matter.

By comparison, a neural network training for days looks at millions of images. A very high rate. But the most effective approaches (the massively-multilayer networks, the deep in deep learning) are making concrete what a child's brain must do already: separate perceptual components from learning components, in a conclusively more structured and hierarchical way.

spdionis · on Dec 17, 2015

What if you can achieve such high learning rate only because (in ML) you process a really narrow view of the data which cannot lead even close to how a human perceives things? I mean in terms of conceptualizing and connecting things/thoughts.

pfista · on Dec 17, 2015

Exactly. Our brains process information constantly, even while we sleep. And they also process multiple senses - not just pixels in an image, but sound, sight, touch, etc. I think it's a little early to assume that consciousness can't arise from complexity because we are far from representing that complexity with any technology today.

pmelendez · on Dec 17, 2015

True, but probably humans are more power-efficient although the math to compare this is not trivial.

AndrewKemendo · on Dec 17, 2015

Gary isn't the first to think of this.

In fact I was working with Dr. Frank Guerin [1] back in 2008 about this exact approach. He wrote an interesting paper that approached ML from a pedagogy perspective titled A Piagetian Model of Early Sensorimotor Development

[1] http://homepages.abdn.ac.uk/f.guerin/pages/ [2] http://homepages.abdn.ac.uk/f.guerin/pages/EpiRob2008.pdf

lovelearning · on Dec 17, 2015

He "refused to explain exactly what products and applications ...for fear that a big company like Google might gain an advantage".

Does this actually happen, or is it baseless paranoia?

tachyonbeam · on Dec 17, 2015

In my experience, if you start spreading a good idea out there, what can happen is that people will initially act as though your idea is outlandish and worthless, but later on, this idea will percolate in other people's mind, they'll do something with it, and claim it was purely their own invention. Google has more resources than this guy, so they can prototype things much quicker.

on Dec 17, 2015

[deleted]

ced · on Dec 17, 2015

His partner is Zoubin Gharamani. Zoubin is a brilliant researcher, he had one of the best invited talks at NIPS this year.

eli_gottlieb · on Dec 17, 2015

Which is what makes it so funny that people can't be bothered to read further into the article than "HERESY against glorious deep learning!".

mindcrime · on Dec 17, 2015

"HERESY against glorious deep learning!"

Which is kinda silly when you think about it... This is science, and - ideally - science should not be "fad" driven. When we get to a place where a certain approach becomes orthodoxy and few people are willing to (consider|fund|think about|acknowledge|whatever) alternative approaches, that's a Bad Thing. I think some people would argue that a similar thing has happened in the past with physics where String Theory (in its various forms) became SO dominant that it was hard to get taken seriously if you were working on something that wasn't ST.

imh · on Dec 18, 2015

To be fair, there's not much to read in the article. There's fluff about Marcus as a person and some tiny amount of amazingly vague talk about what he's doing.

lacker · on Dec 17, 2015

How exactly does someone run a company for one year during a sabbatical? It seems like you are signaling that you don't believe it will succeed, if you are pre-announcing that you only intend to run this new company for a year. Has this structure ever succeeded?

nl · on Dec 17, 2015

The Talking Machines podcast (which is great BTW) has an interview with Zoubin Ghahramani[1], one of the founders mentioned here.

He spoke about how he was determined to prove that non-deep learning methods could perform as well as deep learning at tasks like ImageNet. The interview was in March 2015, and AFAIK they haven't published anything yet.

OTOH, his work on the Automatic Statistician sounded very interested.

[1] http://www.thetalkingmachines.com/blog/2015/3/26/3mixrq61fb0...

meeper16 · on Dec 17, 2015

There were others thinking of this approach in AI before this guy:

"The idea for a search engine that maps associations came to Franks by way of his three young children. He noticed how each child processed information by taking two pieces of knowledge, combining them, and coming up with something new. Franks wondered whether he could get a computer to do the same thing" From:

A Search Engine that Thinks, Almost http://newscenter.lbl.gov/2005/03/31/a-search-engine-that-th...

yeukhon · on Dec 17, 2015

I just realized why wearable like Google Glass was such a big deal for Google. We human brain is constantly feeding in data and rationalizing objects we see, things we do. We are supervised as we grow and we continue to rationalize things on our own terms.

With wearable, if you allow the eyes of machines to read the world with you, hear your conversation, with enough time, your machine should be capable of learning enough to become like you. With millions of people doing so, you essentially create an AI that is like human able to answer complex questions.

aoeusnth1 · on Dec 17, 2015

I feel like YouTube data should be enough for doing this.

yeukhon · on Dec 18, 2015

The problem with YouTube data is that they are either very noisy and poor quality, or they are very short and meaningless. Surely having a GG on you can procedure similar result, but the idea is that the data is more consistent with how we grow up. We don't travel every day and the things we encounter every day are pretty consistent. Once you refine an AI that represents you, you can now take others' data and rationalize the data. I have never been to France but I kind of have an idea what Paris as a city may look like, the buses they have, how they are different from MTA here in NY.

noir_lord · on Dec 18, 2015

Unless it was trained on youtube comments in which case it would probably go something like this https://youtu.be/hDsSrCFvz_A?t=29s

spyder · on Dec 18, 2015

From the article:

"A deep-learning system can be trained to recognize particular species of birds, but it would need millions of sample images and wouldn’t know anything about why a bird isn’t able to fly."

Deep-learning ≠ only image recognition

Of course if you train it only on images it will only know how a bird looks like, but if you train it on the properties of the birds and their ability to fly, then it will learn why a bird isn't able to fly.

sgt101 · on Dec 17, 2015

How weird that there isn't a single mention of SOAR in this article or the comments. http://soar.eecs.umich.edu/ I guess fashion is a powerful thing!

_0ffh · on Dec 20, 2015

I have read far too much about some guy's groundbreaking new theory for AI in the past few decades.

Shut up and show us the code!