After all these iterations of Alpha-[blank], [blank]-Zero, now MuZero, etc, I'm wondering:
If I'm interested in building a toy version following the Deepmind spec, which can be trained to reach super-human capabilities on a particular board game (Reversi, Chess, checkers, possibly even Go given enough compute), which of these "versions" of the project would be the easiest for me to understand/implement? (assume I have a basic understanding of the high-level concepts and lots of enthusiasm, but I'm not an expert).
My understanding is, AlphaZero is not just stronger than AlphaGo, but architecturally simpler and more efficient. That's what I'm looking for -- the implementation with the highest result/difficulty ratio.
AlphaGo Master, unsurprisingly, was significantly stronger than AlphaGoZero. AlphaZero, although it can play multiple games, was weaker yet. In both cases, they compared the 40 block version of the one with the 20 block version of the other (they had to double the network size to approach the level of the predecessor.)
Recently, Katago has reached similar levels of strength using a small fraction of the resources: https://arxiv.org/abs/1902.10565
It depends on what you mean by "more efficient." The significance of AlphaZero was that you can reach good results in a variety of domains even without human expert knowledge to provide supervised learning data or engineer features. It's efficient in terms of engineering resources.
A precisely tailored approach can always get better results.
Do they have any sort of chart showing zeroes are able to learn more games/state spaces with less domain specific information and less compute and space requirements? For instance, if we are getting into an exponential tradeoff curve (seems possible due to enormous number of GPUs), then it is hard to see how this will scale to human level type intelligence.
These one off experiments make it hard to know if AI is truly progressing or not. Naively I'd assume due to the decision tree leaves growing exponentially with depth, then we are facing an inherently unscalable problem, and we only are getting current gains due to advances in hardware, but the gains are only linear with exponential hardware improvements, especially if Moore's law is giving out and even with parallel computation we might end up turning the earth into a giant GPU array before we can reach parity with human intelligence.
Why do you assume this has been tried? It's not even clear what the game is. In this setting, what state and actions would the algorithm have access to?
In some games it could find an equilibrium where it could keep the game going on indefinitely by moving back and forth, for example (which won't work in a game like Go[1], though).
> In Go, MuZero slightly exceeded the performance of AlphaZero, despite using less computation per node in the search tree (16 residual blocks per evaluation in MuZero compared to 20 blocks in AlphaZero). This suggests that MuZero may be caching its computation in the search tree and using each additional application of the dynamics model to gain a deeper understanding of the position.
It also strikes me as possible that just not giving the system the rules to start with might have allowed it to explore more efficient strategies.
"They trained the system for five hypothetical steps and a million mini-batches (i.e., small batches of training data) of size 2,048 in board games and size 1,024 in Atari, which amounted to 800 simulations per move for each search in Go, chess, and shogi and 50 simulations for each search in Atari"
If we and up following this approach, it is clear that it will be combined with some sort of virtual-world building, where a machine builds an approximate world based on the real-world data, then runs the simulation inside the virtual world for eons, ends up with the best possible, but still inferior action model (of course because the virtual world was not real), goes back to real world, adjusts the model, and repeats etc.
It is even possible that our brains just do the same thing BTW. How many times do you run that scenario of a job interview in your head before you go there? How many times does it run in your subconscious virtually? How many times does it happen in dreams? And more profoundly, how often do those scenarios in our head are very inaccurate and simplified, and yet they still help us act in the real world nonetheless?