As someone working on a reinforcement learning/neuroevolution problem right now, I find this to be extremely exciting. Fewer parameters, ceteris paribus, is always better—the fact that the experiments in this paper were run on one workstation, rather than on a massive farm of TPUs à la AlphaGo, implies quicker development iteration time and more accessibility to the average researcher.
The staging of components in this paper (compressor/controller), where neuroevolution is only applied to a low-dimensional controller, reminds me of Ha and Schmidhuber's recent paper on world models (which is briefly cited) [1]. They employ a variational autoencoder with ~4.4M parameters, an RNN with ~1.7M parameters, and a final controller with just 1,088 parameters! Though it's recently been shown that neuroevolution can scale to millions of parameters [2], the technique of applying evolution to as few parameters as possible and supplementing with either autoencoders or vector quantization seems to be gaining traction. I hope to apply some of the ideas in this paper to multiple co-evolving agents...
Thanks so much! I read this (and a few related papers) today. Besides the novel algorithm discussed in the new Atari paper, do you have a reference implementation of online vector quantization you might be able to recommend? I think I could probably figure it out from the paper alone, but sometimes it's nice to see code other people have already optimized. :)
Uhm unfortunately I do not, I could search for some on Google but I doubt I would fare better than you at it. I went to code my own version, it is quite straightforward. You can find it here: https://github.com/giuse/machine_learning_workbench/blob/mas... although polluted by research's trial and error, you can easily check the minimal code necessary to run.
Here's an example of how to use it: https://github.com/giuse/machine_learning_workbench/blob/mas...
Let me know if that works for you or if you have further questions!
Author here. The idea is low-hanging indeed, several friends (including @togelius!) commented "I always wanted to do that -- eventually". Realization is another matter. Have a look at the mess necessary to make it work: we had to discard UL initialization for online learning, accept that the encoding would grow in size, adapt the network sensibly to these changes, and tweak the ES to account for the extra weights.
I have been wolfing down RL articles, videos and publications after a intro to deep learning via Manning's Deep Learning for some time now and while the overall concept of RL is easy to grasp (agents, actions and state etc) some of the finer details and processes are quite confusing.
I am tempted to blame inconsistency across terminology and implementations for this lack of understanding but I suspect it has more to do with approaching this field through the lens of a developer and not a researcher or academic. Trying to understand the code without grasping the "science" of the mechanisms completely.
Edit: My point that I forgot to mention was that I always feel like I am playing catch-up to understand what is going on half the time as the amount of new content being released exceeds what I can absorb.
Sure, familiarity matters. But I believe in the rational reasons that brought me back to this tool over and over, until I built familiarity indeed :)
I posted an overbloated discussion on it on Reddit, feel free to read as little as you need ;)
https://www.reddit.com/r/MachineLearning/comments/8p1o8d/r_p...
Sure! It's quite simple: works like a charm. Completely transparent. You `import` with Python-like syntax, and you get a Ruby object that transparently forwards any message (i.e. method calls) to the corresponding Python object on the underlying Python interpreter.
This means that Ruby does not need to know _anything_ about the Python object: whatever you call on the Ruby object is just forwarded to the Python one, and whatever result is passed back to Ruby.
About the overhead, I sincerely do not know; I expected to have some so my code does part of the image pre-processing directly in Python (`narray`) in order to pass a smaller object to Ruby, but besides that I could perceive none -- grain of salt advised, as that was possibly hidden from me because my computation in Ruby was orders of magnitude more complex/time consuming than what was going on on the Python interpreter.
Definitely ping Murata-san either on GitHub https://github.com/mrkn/pycall.rb/ or Twitter `@mrkn`, I will send him a link to this thread so he can contribute if he feels like it. Personally, I am a fan of his work and elegant approach, I owe him for enabling me to keep working in Ruby while everybody publishes code in Python :)
Sounds great - I also prefer Ruby very strongly, and have tended to avoid Python code because I didn't expect to have an easy way of wrapping it, but will definitively have to play with Pycall.
Uhm maybe I should have pointed this out earlier but the algorithms implementation can be found (independent of deep neuroevolution) in my Ruby machine learning workbench repo (in turn imported in DNE):
https://github.com/giuse/machine_learning_workbench
Okay, a true story, my second disappointing interaction with Atari marketing.
One fine day my boss came to me and said that he had an ask from Atari Marketing (in the Home Computer arm of the company).
The marketing drone came to my office (yes, we had offices in those days). "My idea is to pre-copyright all possible 8x8 bitmaps so that people can't use them without our permission. Can you print them out for me so we can submit them to the copyright office?" He actually meant all possible 8x8 bitmaps containing five colors, with colors chosen from an 7 or 8 bit space (I forget which).
I told him the story of the guy who supposedly invented chess, and was offered a choice of reward by his king. The fellow simply asked, "Just give me one grain of rice for the first square, two grains of rice for the second, four for the third, and so on." Most of you know how this ends, it's grade school math.
I explained to the marketing guy that the printout would probably outweigh the planet, maybe the solar system, maybe the galaxy. He went away, a little disgusted with those pesky engineers. (I don't know if he was the same oxygen waster who wanted me to write a 16K cartridge in just a couple of weeks, but he certainly was in the same department).
So I'm still sticking with three brain cells, despite all the downvotes :-)
Actually it's a LOT more than that, since each of the 5 colors has 128 possible values. There's some duplication (same color slot, trivial rotations and reflections and such) but I think the order of magnitude is probably "cluster of galaxies" at a minimum :-)
Wow, that increases it by a lot; it's much, much heavier than the mass of the visible universe [0]. Universe is ~1e50 kg, the marketing exec's request was ~1e500kg. Now you can go back in time and tell them just how wrong they are ;)
OK, you're so grayed out but your bio says you've been programming since '79 and you've written games for Atari. So perhaps all we need is some elaboration? They seem like a successful company, don't they?
There is not really much if anything left of the original Atari. That pretty much failed after the video game crash in 1983, and was split in two, and bounced around various owners.
The Atari that brought out the Atari ST etc. was one of those, but that pretty much failed and Tramiel merged it into JTS and later sold the remains to Hasbro which then sold it to Infogrames Entertainment. The current Atari Inc. used to be Infogrames, and just licensed the name. Infogrames Entertainment itself then renamed itself to Atari SA.
The other part of the original Atari, Atari Games Inc. failed in 2003. The intellectual property of that division is as far as I know now owned by Warner.
The staging of components in this paper (compressor/controller), where neuroevolution is only applied to a low-dimensional controller, reminds me of Ha and Schmidhuber's recent paper on world models (which is briefly cited) [1]. They employ a variational autoencoder with ~4.4M parameters, an RNN with ~1.7M parameters, and a final controller with just 1,088 parameters! Though it's recently been shown that neuroevolution can scale to millions of parameters [2], the technique of applying evolution to as few parameters as possible and supplementing with either autoencoders or vector quantization seems to be gaining traction. I hope to apply some of the ideas in this paper to multiple co-evolving agents...
[1]. https://worldmodels.github.io
[2]. https://arxiv.org/abs/1712.06567