Does RL work in practice? I have seen many impressive applications of end-to-end deep learning, but nothing impressive at all when it comes to RL in the real world (especially when compared to control theory in robotics domain).
Yes especially if you are willing to broaden your definition to include multi-armed or contextual bandits (of which Markov decision processes that RL algorithms solve are a generalization). Lot of internet ad companies etc. use those successfully.
Maybe best foreigner (non-Korean) Protoss player. The match that AlphaStar lost had actions restricted to a single screen. Before AlphaStar could see and control everything that was not in the fog of war simultaneously. Still quite impressive though.
I agree with you and would like to add that he is probably not the best non-korean Protoss either. The blog presenting AlphaStar itself only claims he is one of the worlds strongest players and link to this post: https://liquipedia.net/starcraft2/2018_StarCraft_II_World_Ch... showing Mana at 5th best non-korean Protoss players in the World Championship series.
AlphaStar has gotten critisism for it having unfair advantages. It was 5-0 against Mana when it could see and control the whole map at once for example. But after a camera-restriction was given (so it sees the map like humans) it lost 0-1.
With all this said, it is still impressive. Best bot we have by far in sc2!
Less than four years ago our state-of-the-art RL system (DQN) could only beat some Atari games. Now we can almost beat best human player in SC. That to me is very impressive.
Disclosure: I work at Google (and with the folks at OpenAI).
Are you familiar with OpenAI's Dactyl? [1]. I could see you saying it's perhaps not a leap forward versus classical techniques like their DOTA bot (OpenAI Five), but I assume part of that is just that they're getting started.
There's also the PlaNet work from Google and DeepMind [2] that's been announced more recently.
I don't know what your thresholds for impressive might be, or what your basis of comparison is, but pure control theory seems to be at a bit of a dead end, whereas reinforcement learning allows for greater flexibility and robustness for control tasks, subject to your willingness to gather (or simulate) a lot of training data[0].
See for example this video[1] in which OpenAI shows off a robotic arm with "unprecedented" dexterity.
> nothing impressive at all when it comes to RL in the real world
Apparently, many people on this website think Go and StarCraft qualify as "the real world". Majority of them probably never even heard of blocks world and SHRDLU.
AFAIK, the biggest problem in reinforcement learning still is connecting the final outcome with individual actions you took to achieve it. This hits you hard when you move from toy domains to real-world problems, because you can't replay real-world scenarios thousands of times, and even if you do, you can can get completely different results due to various random and external factors.
This can be somewhat mitigated by taking Marvin Minsky approach and building / using a simulation of the problem instead of the real thing. However, in many domains building a realistic simulation of the domain is significantly harder than supervising learning.
Somehow this reminds me of one I missed: all the simulation-based driving improvements at Waymo, Cruise and others. Hard to know if it's strictly RL or not from "we train neural networks while driving in simulation" but it's similar enough for my taste.
I'll be curious to see the set of domains where a hybrid approach (some real world, lots of simulation) works out. The nice thing about simulation is that you can experience lots of things that you never want happening in the real world (e.g., child runs in front of car with only X00 ms to impact). Trading off the difficulty of accurate simulation versus needing to trust that the model will behave correctly under a situation you could have simulated, will (likely) be an interesting liability challenge for autonomous driving at the least.
A common misconception is that self-driving car companies (outside of a few smaller startups) are using RL to drive the car. They are not. They use deep learning for perception systems which produce tangible outputs that can be processed by what amounts to expert systems.
I work in this space and even if you could assume the RL would never make a mistake it's not auditable in the way you would need it to be for things like insurance. In general, RL isn't ready to be used in complex situations where people can die when things go bad. This ignores the sample efficiency challenges and handling unseen data.
Great book from one of my favourite profs at UofA.
It's very focused and provides examples on how to implement what I would call classic RL algorithms, and can serve as a decent intro. Don't expect to read it and implement Deep RL right away however.