Learning a hierarchy

canjobear · on Oct 26, 2017

It seems to me there's been an interesting turn in AI recently, toward focusing on adaptability as a goal in itself. Deep learning has shown that there is incredible power in stochastic gradient descent over a space of functions, but so far that has mostly been applied to rigid tasks. Now work like this is about turning that power towards adaptability itself as a goal, and it seems to me that this brings us towards "real" intelligence.

The logical extreme of this thinking would be agents that actually maximize entropy of future actions as the only objective function, like in [1]

[1] http://paulispace.com/intelligence/2017/07/06/maxent.html

sixdimensional · on Oct 27, 2017

A little bit philosophical and not scientific, but I found this blog post [1] to be good for musing about the relationship of "intelligence" and "adaptability".

[1] http://www.actinginbalance.com/intelligence-is-adaptability/

zan2434 · on Oct 26, 2017

Related article on similar hierarchical / compositional policies learned by maximum entropy optimization: http://bair.berkeley.edu/blog/2017/10/06/soft-q-learning/

ionforce · on Oct 26, 2017

Is this like maximizing for movement options in a Chess AI?

Retra · on Oct 26, 2017

This reminds me of a thought I had some years ago. The idea was that we can think of general intelligence not as an optimization for a specific given goal, but as optimization for a special position from which some wider set of goals can be most rapidly converged upon. Thus optimizing for future flexibility rather than current results.

At the time, I remember being excited to hear of a physics paper[1] concerning an inverted pendulum, where they solved the system for some dynamic forces which would keep the system at the position of maximum instability, and claimed that it was, in some sense, a description of dynamic intelligence. The analogy there is that this is the unique position from which the pendulum can be efficiently made to move quickly in any 'required' direction (the 'goal'.)

I still think that idea has some merit, but putting together a coherent formalization of it seems really tricky and requiring some genius far beyond my own meager pondering.

[1] I found the article: https://physics.aps.org/articles/v6/46

ewjordan · on Oct 27, 2017

As a step in that direction, you can take some inspiration from what the brain does: as you learn things better and better, the knowledge essentially gets pulled down to neuronal layers that are closer to sensory input. This leaves the higher layers more free to do other stuff (potentially reusing the results from the surface layers), which is a step in the direction of optimizing for future flexibility.

It's possible to create rules that operate on an already-trained network and push it in this direction without totally destroying what it's learned, by "fuzzing" the original network to generate a bunch of input/output pairs, and then using that dataset to retrain smaller sub-networks. For instance, if you have a 5 layer network that you've trained on a classification task, you can often use that network as a teacher to train a smaller network to do pretty damn well on the same classification task, even in some cases where training the smaller network directly would have been very difficult. There are several reasons that this trick can work, not the least of which is that in a sense it is a way to expand the training set dramatically.

NB: the above approach is probably not how you'd implement this, there are less crude methods to incentivize shallower levels to have more activation than deeper ones that would probably work better

I can easily imagine a phased training strategy that oscillates between a) learning new things by making the deeper layers more malleable and the shallower ones fairly rigid, and b) compressing all the data by opening up the shallow layers to change and replaying input/output into itself. I have no idea if there are any benchmarks around this sort of thing, though, typically benchmarks have fixed goals so ability to retrain for additional tasks is not really measured.

hacker_9 · on Oct 26, 2017

Structures that can query and adapt their own structure. Reminds me of reflection in a managed language.

anon404123 · on Oct 26, 2017

super cool that this was done by a high schooler

tejohnso · on Oct 26, 2017

Article about him in Wired magazine:

https://www.wired.com/story/meet-the-high-schooler-shaking-u...

akhilcacharya · on Oct 26, 2017

More discouraging to me to be completely honest.

fjsolwmv · on Oct 26, 2017

Why have a whole humanity if you only think a single best person has value?

anon404123 · on Oct 26, 2017

"It is not enough that I should succeed - others should fail."

akhilcacharya · on Oct 26, 2017

No it's not that...as tempting as that is often...

It's that the spoils of the new economy are accumulating in a way that completely forgets the middle 90% of the country. Kevin is obviously really smart, but has access to things I don't even have in a state school by virtue of being a sharp high schooler in Palo Alto, much less when I was in high school.

nostrademons · on Oct 26, 2017

Life is long. If his location gives him access to opportunities that you don't have, figure out a way to get access to those opportunities and execute on it once you graduate from college. Many prominent Silicon Valley people came from small towns in the mid-west (Marc Andreessen, Evan Williams) or immigrated from poor political situations abroad (Sergey Brin, Jan Koum, Elon Musk).

LrnByTeach · on Oct 26, 2017

very well said with annotated sample personalities who made it top of Silicon Vally ...

> Life is long. If his location gives him access to opportunities that you don't have, figure out a way to get access to those opportunities and execute on it once you graduate from college. Many prominent

> Silicon Valley people came from small towns in the mid-west (Marc Andreessen, Evan Williams) or immigrated from poor political situations abroad (Sergey Brin, Jan Koum, Elon Musk).

comboy · on Oct 26, 2017

I think that Kevin may be diving in ML papers instead of writing about things that discourage him on HN ;) But for real, the access to knowledge is really easy now. There were also so many threads on HN where to start and what good materials are. Sure, having pros around you help, but they just don't gather around random people.

Being in the Bay Area already gives you huge advantage over most of the population, especially when you compare to less developed countries.

akhilcacharya · on Oct 26, 2017

> I think that Kevin may be diving in ML papers

that's what I do the rest of the day because it's part of my job

I more mean the hardware access part - at 15 my parents would have never given me their debit card to spend hundreds of dollars on GCP GPUs - good luck training GANs on a laptop CPU!

shardo · on Oct 26, 2017

You first say

> but has access to things I don't even have in a state school by virtue of being a sharp high schooler in Palo Alto, much less when I was in high school.

and then you go on to say

> I more mean the hardware access part - at 15 my parents would have never given me their debit card to spend hundreds of dollars on GCP GPUs - good luck training GANs on a laptop CPU!

That has literally nothing to do with location as you seem to allude to in the earlier post. It has nothing to do with the spoils of an economy being distributed unequally. Maybe if the hardware was only accessible in certain parts of the country, sure your point makes sense. But anybody with money could've bought it.

So your post now reads as "I'm going to blame me not achieving as much as Kevin on my parents for not spending money on me when I was young."

That article was encouraging, if anything. It shows exactly how available educational resources to the field of AI have become that a 15-year old can have access to them and make significant progress. it shows if you take initiative, you can actually go ahead and get things done.

akhilcacharya · on Oct 27, 2017

> That has literally nothing to do with location as you seem to allude to in the earlier post. It has nothing to do with the spoils of an economy being distributed unequally

Palo Alto is one of the wealthiest cities in the country.

anon404123 · on Oct 26, 2017

no reason to be discouraged. plenty of good times to go around. AI is wide open and there's plenty of basic discoveries to be had by those willing to look.

Furthermore, outside of AI there is so much fun to be had in the world that it's probably not worth being discouraged by discoveries made by some preternatural high schooler. https://xkcd.com/1024/

fjsolwmv · on Oct 26, 2017

? The blog post has 5 authors

anon404123 · on Oct 26, 2017

first author on the paper is a high school student

namelost · on Oct 27, 2017

FWIW authors on mathematics papers are always listed in alphabetical order.

gormo2 · on Oct 27, 2017

Author order matters for AI/ML papers, which this one is

hacker_9 · on Oct 26, 2017

Does this optimise the hierarchy as the environment changes? For example when cooking, I unpackage food as needed, but when it starts to clutter the workspace I make a decision to fit in a 'clean up cycle' while waiting on some other food to cook.

sharemywin · on Oct 26, 2017

As far as I understood it, it learned sub-tasks then learned to apply those sub tasks.

Kind of reminds me of the Soar system except using Deep learning instead.

https://en.wikipedia.org/wiki/Soar_(cognitive_architecture)

zardo · on Oct 26, 2017

I was mulling over this idea yesterday in the context of RTS games... There's no reason to consider changing your overall strategy every frame. Nice to see it works!

It will be interesting to see how it performs with more tiers in the hierarchy, and with more structured tasks.

Controlling a virtual arm to play a board game for example.

sharemywin · on Oct 26, 2017

Found the paper from the wired article below

https://s3-us-west-2.amazonaws.com/openai-assets/MLSH/mlsh_p...

ohitsdom · on Oct 26, 2017

There are buttons below the first video to read the paper and view the code.

indescions_2017 · on Oct 26, 2017

Next step: transfer learning and sharing amongst sub-policies in the graph hierarchy. If an Ant Agent learns to "move up" to avoid obstacle or reach goal. Why can't it infer the same for any cardinal or diagonal direction, after observing the world around it. It's just a rotation or translation after all.

Also, for small numbers of sub-policies, would Monte Carlo playouts be faster. Where we are searching over the next step the Any may encounter. Which presumably is a finite set of possible "wall-floor" configurations ;)

In any case, great work! Always love watching OpenAI vids...

derefr · on Oct 26, 2017

> It's just a rotation or translation after all.

My intuition from working on various computer vision tasks, is that animal brains would do this more by rotating the perspective at the post-optic synapses, rather than having a generalized plan. We still only know how to "move up"; we just change the angle we're understanding the scene from, and so change what "up" means.

jng · on Oct 26, 2017

Well, it's really hard to read text upside down.

fiddlerwoaroof · on Oct 27, 2017

I’ve heard this many times, but I’ve never had any issues reading text at any angle.

sputknick · on Oct 26, 2017

I don't understand where the 'hierarchy' comes into play? This reads to me as a standard computer program where you execute code, and some of those lines execute other segments of code which might be much more complex than what I see. If I execute the line 'printline('Hello World')' I only excuted one line, but many other things happened that I did not directly execute. I'm sure I'm missing something, and this is somehow different and novel, but I'm just missing it from this blog post.

zardo · on Oct 26, 2017

It is effectively a system of reinforcement learning agents working in a command hierarchy to solve problems that single reinforcement learning agents fail to.

It's (somewhat)obvious that this is an idea worth trying. But that doesn't mean actually getting it to work is easy.

sputknick · on Oct 26, 2017

Got it, okay, so it is different from a traditional computer program, and more like a business or military unit, where the agent at a high level "determines" an action, and delegates the action to a lower level entity that doesn't necessarily have the knowledge as to why it's doing this thing?

setr · on Oct 27, 2017

Is it just me or is there something revolting about the character model?

Good work nonetheless but for god's sake give it six legs and make it black

gthinkin · on Oct 26, 2017

Great work, Kevin!

kevinfrans · on Oct 26, 2017