Hacker News new | past | comments | ask | show | jobs | submit login

What about AlphaZero? No prior data.



> The parameters θ of the deep neural network in AlphaZero are trained by self-play reinforcement learning, starting from randomly initialised parameters θ. Games are played by selecting moves for both players by MCTS, at ∼ πππt. At the end of the game, the terminal position sT is scored according to the rules of the game to compute the game outcome z: −1 for a loss, 0 for a draw, and +1 for a win.

In this case sounds like better algorithm + a lot of self-generated data (but no prior knowledge other than rules).


All of the observable data, implicitly, plus an evaluation criterion. They can operate on that, including starting to mine it in a strategically effective order, because the whole game system except the other player is defined by a small set of rules.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: