This is amazing. It sounds like the summary is that they turned matrix multiplication of various sized matrices into a kind of game, then had AlphaZero, the game playing AI, play the game and find optimal strategies. The new optimal strategies for “winning” the game in less moves were actually new matrix multiplication algorithms, some of which had been discovered previously, some of which were new. Many thousands of algorithms were found across a range of matrix sizes.
And of course aside from the intellectual curiosity, our GPUs are mega matrix multiplication machines and these new algorithms could be tuned to specific hardware and improve the overall graphics and ML performance by 10-20% on existing hardware.
"Since 1969 Strassen’s algorithm has famously stood as the fastest way to multiply 2 matrices - but with AlphaTensor. Deepmind has found a new algorithm that’s faster, with potential to improve efficiency by 10-20% across trillions of calculations per day!"
Strassen has _not_ stood as the fastest way, whether you look at the exponent or the constant factor.
It's probably more accurate to say Strassen was the first in a series of complexity improvements to matrix multiplication, most of whom are theoretical curiosities and have prohibitively high constant factors preventing them from ever been used in practice.
Even if all you cared about was the exponent, Strassen was beaten decades ago.
In practice, the constant factor and the exponent need to be considered together. The constant factor on its own is very uninteresting, practically meaningless without the exponent.
>”Even if all you cared about was the exponent”
There are really only two things to care about: the exponent on its own, or practical runtime.
The claim is that there is a significant range of computations for which Strassen has stood as the fastest.
"If you just want the executive summary, here it is: these are definitely interesting algorithms from an arithmetic complexity theory standpoint – especially for the case of 4×4 matrices over finite fields, where (to the best of my knowledge) Strassen’s algorithm from 1969 was still the reigning champion.
These algorithms are also practically relevant, meaning that not only do they have better asymptotic lower bounds than Strassen’s algorithm, they are still algorithms you might actually use in practice, unlike essentially everything else that has been written on the topic in the last 50 years: these algorithms are correct, and will in principle win over Strassen’s algorithm with large enough matrices, but that cut-off is well beyond the sizes that anyone is actually doing computations with."
I wonder if this technique could be used to find better/more mathematically efficient self attention algorithms for transformers in general. Meta programming with AI…
And of course aside from the intellectual curiosity, our GPUs are mega matrix multiplication machines and these new algorithms could be tuned to specific hardware and improve the overall graphics and ML performance by 10-20% on existing hardware.
Cool!