Hacker News new | past | comments | ask | show | jobs | submit login
Learning a hierarchy (blog.openai.com)
267 points by gdb on Oct 26, 2017 | hide | past | favorite | 39 comments



It seems to me there's been an interesting turn in AI recently, toward focusing on adaptability as a goal in itself. Deep learning has shown that there is incredible power in stochastic gradient descent over a space of functions, but so far that has mostly been applied to rigid tasks. Now work like this is about turning that power towards adaptability itself as a goal, and it seems to me that this brings us towards "real" intelligence.

The logical extreme of this thinking would be agents that actually maximize entropy of future actions as the only objective function, like in [1]

[1] http://paulispace.com/intelligence/2017/07/06/maxent.html


A little bit philosophical and not scientific, but I found this blog post [1] to be good for musing about the relationship of "intelligence" and "adaptability".

[1] http://www.actinginbalance.com/intelligence-is-adaptability/


Related article on similar hierarchical / compositional policies learned by maximum entropy optimization: http://bair.berkeley.edu/blog/2017/10/06/soft-q-learning/


Is this like maximizing for movement options in a Chess AI?


This reminds me of a thought I had some years ago. The idea was that we can think of general intelligence not as an optimization for a specific given goal, but as optimization for a special position from which some wider set of goals can be most rapidly converged upon. Thus optimizing for future flexibility rather than current results.

At the time, I remember being excited to hear of a physics paper[1] concerning an inverted pendulum, where they solved the system for some dynamic forces which would keep the system at the position of maximum instability, and claimed that it was, in some sense, a description of dynamic intelligence. The analogy there is that this is the unique position from which the pendulum can be efficiently made to move quickly in any 'required' direction (the 'goal'.)

I still think that idea has some merit, but putting together a coherent formalization of it seems really tricky and requiring some genius far beyond my own meager pondering.

[1] I found the article: https://physics.aps.org/articles/v6/46


As a step in that direction, you can take some inspiration from what the brain does: as you learn things better and better, the knowledge essentially gets pulled down to neuronal layers that are closer to sensory input. This leaves the higher layers more free to do other stuff (potentially reusing the results from the surface layers), which is a step in the direction of optimizing for future flexibility.

It's possible to create rules that operate on an already-trained network and push it in this direction without totally destroying what it's learned, by "fuzzing" the original network to generate a bunch of input/output pairs, and then using that dataset to retrain smaller sub-networks. For instance, if you have a 5 layer network that you've trained on a classification task, you can often use that network as a teacher to train a smaller network to do pretty damn well on the same classification task, even in some cases where training the smaller network directly would have been very difficult. There are several reasons that this trick can work, not the least of which is that in a sense it is a way to expand the training set dramatically.

NB: the above approach is probably not how you'd implement this, there are less crude methods to incentivize shallower levels to have more activation than deeper ones that would probably work better

I can easily imagine a phased training strategy that oscillates between a) learning new things by making the deeper layers more malleable and the shallower ones fairly rigid, and b) compressing all the data by opening up the shallow layers to change and replaying input/output into itself. I have no idea if there are any benchmarks around this sort of thing, though, typically benchmarks have fixed goals so ability to retrain for additional tasks is not really measured.


Structures that can query and adapt their own structure. Reminds me of reflection in a managed language.


super cool that this was done by a high schooler



More discouraging to me to be completely honest.


Why have a whole humanity if you only think a single best person has value?


"It is not enough that I should succeed - others should fail."


No it's not that...as tempting as that is often...

It's that the spoils of the new economy are accumulating in a way that completely forgets the middle 90% of the country. Kevin is obviously really smart, but has access to things I don't even have in a state school by virtue of being a sharp high schooler in Palo Alto, much less when I was in high school.


Life is long. If his location gives him access to opportunities that you don't have, figure out a way to get access to those opportunities and execute on it once you graduate from college. Many prominent Silicon Valley people came from small towns in the mid-west (Marc Andreessen, Evan Williams) or immigrated from poor political situations abroad (Sergey Brin, Jan Koum, Elon Musk).


very well said with annotated sample personalities who made it top of Silicon Vally ...

> Life is long. If his location gives him access to opportunities that you don't have, figure out a way to get access to those opportunities and execute on it once you graduate from college. Many prominent

> Silicon Valley people came from small towns in the mid-west (Marc Andreessen, Evan Williams) or immigrated from poor political situations abroad (Sergey Brin, Jan Koum, Elon Musk).


I think that Kevin may be diving in ML papers instead of writing about things that discourage him on HN ;) But for real, the access to knowledge is really easy now. There were also so many threads on HN where to start and what good materials are. Sure, having pros around you help, but they just don't gather around random people.

Being in the Bay Area already gives you huge advantage over most of the population, especially when you compare to less developed countries.


> I think that Kevin may be diving in ML papers

that's what I do the rest of the day because it's part of my job

I more mean the hardware access part - at 15 my parents would have never given me their debit card to spend hundreds of dollars on GCP GPUs - good luck training GANs on a laptop CPU!


You first say

> but has access to things I don't even have in a state school by virtue of being a sharp high schooler in Palo Alto, much less when I was in high school.

and then you go on to say

> I more mean the hardware access part - at 15 my parents would have never given me their debit card to spend hundreds of dollars on GCP GPUs - good luck training GANs on a laptop CPU!

That has literally nothing to do with location as you seem to allude to in the earlier post. It has nothing to do with the spoils of an economy being distributed unequally. Maybe if the hardware was only accessible in certain parts of the country, sure your point makes sense. But anybody with money could've bought it.

So your post now reads as "I'm going to blame me not achieving as much as Kevin on my parents for not spending money on me when I was young."

That article was encouraging, if anything. It shows exactly how available educational resources to the field of AI have become that a 15-year old can have access to them and make significant progress. it shows if you take initiative, you can actually go ahead and get things done.


> That has literally nothing to do with location as you seem to allude to in the earlier post. It has nothing to do with the spoils of an economy being distributed unequally

Palo Alto is one of the wealthiest cities in the country.


no reason to be discouraged. plenty of good times to go around. AI is wide open and there's plenty of basic discoveries to be had by those willing to look.

Furthermore, outside of AI there is so much fun to be had in the world that it's probably not worth being discouraged by discoveries made by some preternatural high schooler. https://xkcd.com/1024/


? The blog post has 5 authors


first author on the paper is a high school student


FWIW authors on mathematics papers are always listed in alphabetical order.


Author order matters for AI/ML papers, which this one is


Does this optimise the hierarchy as the environment changes? For example when cooking, I unpackage food as needed, but when it starts to clutter the workspace I make a decision to fit in a 'clean up cycle' while waiting on some other food to cook.


As far as I understood it, it learned sub-tasks then learned to apply those sub tasks.

Kind of reminds me of the Soar system except using Deep learning instead.

https://en.wikipedia.org/wiki/Soar_(cognitive_architecture)


I was mulling over this idea yesterday in the context of RTS games... There's no reason to consider changing your overall strategy every frame. Nice to see it works!

It will be interesting to see how it performs with more tiers in the hierarchy, and with more structured tasks.

Controlling a virtual arm to play a board game for example.


Found the paper from the wired article below

https://s3-us-west-2.amazonaws.com/openai-assets/MLSH/mlsh_p...


There are buttons below the first video to read the paper and view the code.


Next step: transfer learning and sharing amongst sub-policies in the graph hierarchy. If an Ant Agent learns to "move up" to avoid obstacle or reach goal. Why can't it infer the same for any cardinal or diagonal direction, after observing the world around it. It's just a rotation or translation after all.

Also, for small numbers of sub-policies, would Monte Carlo playouts be faster. Where we are searching over the next step the Any may encounter. Which presumably is a finite set of possible "wall-floor" configurations ;)

In any case, great work! Always love watching OpenAI vids...


> It's just a rotation or translation after all.

My intuition from working on various computer vision tasks, is that animal brains would do this more by rotating the perspective at the post-optic synapses, rather than having a generalized plan. We still only know how to "move up"; we just change the angle we're understanding the scene from, and so change what "up" means.


Well, it's really hard to read text upside down.


I’ve heard this many times, but I’ve never had any issues reading text at any angle.


I don't understand where the 'hierarchy' comes into play? This reads to me as a standard computer program where you execute code, and some of those lines execute other segments of code which might be much more complex than what I see. If I execute the line 'printline('Hello World')' I only excuted one line, but many other things happened that I did not directly execute. I'm sure I'm missing something, and this is somehow different and novel, but I'm just missing it from this blog post.


It is effectively a system of reinforcement learning agents working in a command hierarchy to solve problems that single reinforcement learning agents fail to.

It's (somewhat)obvious that this is an idea worth trying. But that doesn't mean actually getting it to work is easy.


Got it, okay, so it is different from a traditional computer program, and more like a business or military unit, where the agent at a high level "determines" an action, and delegates the action to a lower level entity that doesn't necessarily have the knowledge as to why it's doing this thing?


Is it just me or is there something revolting about the character model?

Good work nonetheless but for god's sake give it six legs and make it black


Great work, Kevin!


:)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: