> The parameters θ of the deep neural network in AlphaZero are trained by self-play reinforcement learning, starting from randomly initialised parameters θ. Games are played by selecting moves for both players by MCTS, at ∼ πππt. At the end of the game, the terminal position sT is scored according to the rules of the game to compute the game outcome z: −1 for a loss, 0 for a draw, and +1 for a win.
In this case sounds like better algorithm + a lot of self-generated data (but no prior knowledge other than rules).
All of the observable data, implicitly, plus an evaluation criterion. They can operate on that, including starting to mine it in a strategically effective order, because the whole game system except the other player is defined by a small set of rules.