I would still love to see the Stockfish team go up against the AlphaZero team, with the Stockfish team being allowed to modify their code however they need to in order to get the performance they need for the event. I think they should agree on some hardware equivalences upfront since they dont have the same requirements, and I think that the Stockfish team should have an API to play the AlphaZero algo in preparation, just the same way that AlphaZero already has unfettered access to StockFish.
I'd relish to read about whoever wins that: a fair competition.
Honestly, Leela Chess Zero was built on top of the AlphaZero algorithms, and I'd argue that LC0 is more relevant and interesting today than what happened two years ago.
Two open source implementations with active development is equal footing. The importance is the algorithm and how it improves, not on a specific implementation that has been abandoned. For those, we have TCEC and CCCC continually measuring those and a fair number of other implementations.
Stockfish and Leela, after 100 games, were virtually tied - Stockfish had one extra win. Leela has passed Houdini and Komodo at this point. At this point, it becomes the question if Stockfish can improve at the same speed as Leela.
You might be missing the point. AlphaZero is a research project, not a commercial project.
The most interesting result is AlphaZero's playstyle. It's a lot less robotic and something that humans can actually learn from compared to Stockfish's.
Its not code that's the big differentiator here. Its hardware.
Neural Networks are an embarasingly parallel problem that can be solved with accelerated matrix-multiplication hardware. The entire network can be easily represented as just matrix-multiplication problems.
Stockfish is still written in a classical fashion, and has issues scaling above and beyond 64 cores. But Neural Networks can keep getting bigger, and bigger, taking advantage of extremely parallel hardware like GPUs (10,000+ shaders per GPU) or Google's Tensor Processing Units.
In some ways, the fight will never be fair. GPGPUs and TPUs can simply throw far more math, using far less power, at the problem.
Furthermore, games of perfect information, like Chess and Go, have always been played well using neural nets.
In effect, Neural Nets are being used as a tool to allow most computation to occur on far more parallel and efficient hardware.
-----------
I think the next big stage for Chess AI / Go AIs is for someone to figure out how to take advantage of SIMD-compute, to achieve similar scalability. As it is right now, both MCTS and Alpha-Beta pruning can only really be done on a CPU, so you can't leverage the huge computational power available today.
I think Chess would be the easier game: Bitboards are represented as 64-bit integers and would easily map to just 2x32-bit registers of a GPU shader. But I get stuck whenever I think of an algorithm that would have low-divergence.
Perhaps Go would be easier due to the similarity of pieces?
I dunno, I've got way too many personal projects I wanna work on. So this isn't really a thing I can do. But its definitely an interesting research problem. My best guess is to convert the whole Chess-playing AI into a SAT-solver and then write a GPU-accelerated SAT-solver (because SAT is uniform in theory, so I would expect a GPGPU-based search of the 3-SAT space to be low-divergence)
I'm just shooting from the hip here, no serious suggestions really. But that's my quickie 2-minute thought process on this particular problem. You don't necessarily have to make the whole thing uniform, modern GPUs are either 32-warp size (NVidia) or 64-wavefront size (AMD). So you just need to figure out how to consistently get batches of 32 or 64 to execute the same code without diverging across if-statements or loops...
Heh, easier said than done.
EDIT: Perhaps organizing the code so that "All Bishop Moves" are evaluated in a batch. Lets say 1-million chess boards are to be evaluated on a GPU, how would it be done? Perhaps, first, analyze all Bishop-moves. Then analyze all Rook Moves. Then analyze all Queen moves. The goal of Stockfish was to "process the most boards" at any given time. Changing the architecture for bandwidth-oriented GPGPU compute would be very much in the same spirit as Stockfish's original goals.
I'd relish to read about whoever wins that: a fair competition.