Hacker News new | past | comments | ask | show | jobs | submit login

That is a staggering amount of possible Go games, no wonder tree search failed to improve without the Convnet pruning.

Makes me wonder if Deepmind could learn Go without first learning from the big dataset of expert games to train the convnet to prune the tree.

Which implies that Deepmind couldn't learn to play Go without first being taught by us (the expert games).

So AlphaGo learnt Go from us. It took a human brain to crack the problem of Go and the AI learned from our solutions it did not discover them itself - still a very great breakthrough.

Would Lee Sedol have won if he could use MCTS to assist his evaluation ?

( Arguably the MCTS is a non-AI component of deepmind & does not learn ).

MCTS = Monte Carlo Tree Search, where repeated random playouts evaluate moves by randomly sampling the tree of following (googolplex) possible moves.




The tree searched by MCTS is WAY smaller than a googolplex, since it excludes the eye-filling moves that allow games to go on past 400 or so moves.


In the OP article Norvig suggests 10^172 possible games and games average 200 moves.

Walraet paper has 10^10^103, so 1000 googolplexes - this is very large.

Monte Carlo methods are specifically for guess-timating intractably big searches.

MCTS is close to the human heuristic for evaluating Go positions - counting who currently controls what, multiplied by the strength of each position (in Go 2 'eyes' make a block impervious to harm). Early on this is very subtle to discern - which makes Go a subtle art.

The human heuristic is complex, the MCTS random playout is overly simple but uses gigaflops of random sampling. MCTS was the basis of the best computer (the new innovation is the convnets that prunes the tree before MCTS).

The convnets are trained on a big database of expert games which is why I wonder if the MCTS is the differenting factor & Lee Sedol with an MCTS would beat the Convnet & MCTS.

This is important because: does AlphaGo plan ?

If not, the implication is planning is a poorer heuristic compared with whatever AlphaGo actually does.

The convnet provably doesn't plan, it estimates what a human expert would do and performs tne same whether playing game or just predicting next move from random configurations.

AI research has always held planning in high regard.


Actually, MCTS usually doesn't allow eye-filling. That's how it determines a game is over - all of the territory is "eyes". Otherwise, the playout would go on forever, or would stop at an arbitrary stopping point (board width * board length * 3 or something) which is not as accurate.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: