Algorithms for Reinforcement Learning

lego_bot · on Feb 19, 2019

Does RL work in practice? I have seen many impressive applications of end-to-end deep learning, but nothing impressive at all when it comes to RL in the real world (especially when compared to control theory in robotics domain).

currymj · on Feb 19, 2019

Yes especially if you are willing to broaden your definition to include multi-armed or contextual bandits (of which Markov decision processes that RL algorithms solve are a generalization). Lot of internet ad companies etc. use those successfully.

fridsun · on Feb 19, 2019

What do you mean by "nothing impressive"? DeepMind's AlphaStar has beaten the world's best Protoss player 5-1 in Protoss vs. Protoss.

https://deepmind.com/blog/alphastar-mastering-real-time-stra...

onorton · on Feb 19, 2019

Maybe best foreigner (non-Korean) Protoss player. The match that AlphaStar lost had actions restricted to a single screen. Before AlphaStar could see and control everything that was not in the fog of war simultaneously. Still quite impressive though.

AlexTelon · on Feb 19, 2019

I agree with you and would like to add that he is probably not the best non-korean Protoss either. The blog presenting AlphaStar itself only claims he is one of the worlds strongest players and link to this post: https://liquipedia.net/starcraft2/2018_StarCraft_II_World_Ch... showing Mana at 5th best non-korean Protoss players in the World Championship series.

AlphaStar has gotten critisism for it having unfair advantages. It was 5-0 against Mana when it could see and control the whole map at once for example. But after a camera-restriction was given (so it sees the map like humans) it lost 0-1.

With all this said, it is still impressive. Best bot we have by far in sc2!

yonkshi · on Feb 19, 2019

Less than four years ago our state-of-the-art RL system (DQN) could only beat some Atari games. Now we can almost beat best human player in SC. That to me is very impressive.

boulos · on Feb 19, 2019

Disclosure: I work at Google (and with the folks at OpenAI).

Are you familiar with OpenAI's Dactyl? [1]. I could see you saying it's perhaps not a leap forward versus classical techniques like their DOTA bot (OpenAI Five), but I assume part of that is just that they're getting started.

There's also the PlaNet work from Google and DeepMind [2] that's been announced more recently.

[1] https://blog.openai.com/learning-dexterity/

[2] https://ai.googleblog.com/2019/02/introducing-planet-deep-pl...

clickok · on Feb 19, 2019

Yes.

I don't know what your thresholds for impressive might be, or what your basis of comparison is, but pure control theory seems to be at a bit of a dead end, whereas reinforcement learning allows for greater flexibility and robustness for control tasks, subject to your willingness to gather (or simulate) a lot of training data[0].

See for example this video[1] in which OpenAI shows off a robotic arm with "unprecedented" dexterity.

-----

0. https://www.youtube.com/watch?v=ZVIxt2rt1_4 for a video (there's also an associated paper) about setting up RL tasks for robots.

1. https://www.youtube.com/watch?v=jwSbzNHGflM

dunwoj · on Feb 19, 2019

Deepmind's alpha go, alpha go zero, dqn, openAI's DotA bot. There's a lot of RL success stories.

gambler · on Feb 19, 2019

> nothing impressive at all when it comes to RL in the real world

Apparently, many people on this website think Go and StarCraft qualify as "the real world". Majority of them probably never even heard of blocks world and SHRDLU.

AFAIK, the biggest problem in reinforcement learning still is connecting the final outcome with individual actions you took to achieve it. This hits you hard when you move from toy domains to real-world problems, because you can't replay real-world scenarios thousands of times, and even if you do, you can can get completely different results due to various random and external factors.

This can be somewhat mitigated by taking Marvin Minsky approach and building / using a simulation of the problem instead of the real thing. However, in many domains building a realistic simulation of the domain is significantly harder than supervising learning.

boulos · on Feb 19, 2019

Somehow this reminds me of one I missed: all the simulation-based driving improvements at Waymo, Cruise and others. Hard to know if it's strictly RL or not from "we train neural networks while driving in simulation" but it's similar enough for my taste.

I'll be curious to see the set of domains where a hybrid approach (some real world, lots of simulation) works out. The nice thing about simulation is that you can experience lots of things that you never want happening in the real world (e.g., child runs in front of car with only X00 ms to impact). Trading off the difficulty of accurate simulation versus needing to trust that the model will behave correctly under a situation you could have simulated, will (likely) be an interesting liability challenge for autonomous driving at the least.

jimfleming · on Feb 19, 2019

A common misconception is that self-driving car companies (outside of a few smaller startups) are using RL to drive the car. They are not. They use deep learning for perception systems which produce tangible outputs that can be processed by what amounts to expert systems.

I work in this space and even if you could assume the RL would never make a mistake it's not auditable in the way you would need it to be for things like insurance. In general, RL isn't ready to be used in complex situations where people can die when things go bad. This ignores the sample efficiency challenges and handling unseen data.

soohyung · on Feb 19, 2019

I believe AlphaGo Zero used reinforcement learning. I would say that's quite an impressive application of it.

isseu · on Feb 19, 2019

For sure, but usually a less general algorithm has better performance at an specific problem

a_imho · on Feb 19, 2019

For some definition of work, yes. Generating hype is very real and valuable.

1vn · on Feb 19, 2019

Yes, I know a quant firm using it in the market.

arbie · on Feb 19, 2019

Without going into identifiable specifics, could you provide some information on what type of decisions the model makes?

pmalynin · on Feb 19, 2019

Great book from one of my favourite profs at UofA. It's very focused and provides examples on how to implement what I would call classic RL algorithms, and can serve as a decent intro. Don't expect to read it and implement Deep RL right away however.

integricho · on Feb 19, 2019

Any ETA on the printed edition of the updated version of the book?