Discovering Novel Algorithms with AlphaTensor

futureshock · on Oct 5, 2022

This is amazing. It sounds like the summary is that they turned matrix multiplication of various sized matrices into a kind of game, then had AlphaZero, the game playing AI, play the game and find optimal strategies. The new optimal strategies for “winning” the game in less moves were actually new matrix multiplication algorithms, some of which had been discovered previously, some of which were new. Many thousands of algorithms were found across a range of matrix sizes.

And of course aside from the intellectual curiosity, our GPUs are mega matrix multiplication machines and these new algorithms could be tuned to specific hardware and improve the overall graphics and ML performance by 10-20% on existing hardware.

Cool!

homarp · on Oct 5, 2022

see https://news.ycombinator.com/item?id=33096580 for discussion of the Nature paper

homarp · on Oct 5, 2022

"Since 1969 Strassen’s algorithm has famously stood as the fastest way to multiply 2 matrices - but with AlphaTensor. Deepmind has found a new algorithm that’s faster, with potential to improve efficiency by 10-20% across trillions of calculations per day!"

xdavidliu · on Oct 5, 2022

Strassen has _not_ stood as the fastest way, whether you look at the exponent or the constant factor.

It's probably more accurate to say Strassen was the first in a series of complexity improvements to matrix multiplication, most of whom are theoretical curiosities and have prohibitively high constant factors preventing them from ever been used in practice.

Even if all you cared about was the exponent, Strassen was beaten decades ago.

https://en.wikipedia.org/wiki/Matrix_multiplication#Computat...

falseprofit · on Oct 6, 2022

In practice, the constant factor and the exponent need to be considered together. The constant factor on its own is very uninteresting, practically meaningless without the exponent. >”Even if all you cared about was the exponent”

There are really only two things to care about: the exponent on its own, or practical runtime.

The claim is that there is a significant range of computations for which Strassen has stood as the fastest.

xdavidliu · on Oct 8, 2022

> The claim is that there is a significant range of computations for which Strassen has stood as the fastest.

Thought about this some more, and I'm gonna conclude two things: - that claim wasn't made in any articles - if made, that claim would be incorrect.

xdavidliu · on Oct 7, 2022

> The claim is that there is a significant range of computations for which Strassen has stood as the fastest.

Interesting, I wasn't aware of this. Where is this claim made, and for which significant range of computations?

homarp · on Oct 7, 2022

see https://news.ycombinator.com/item?id=33117192 for more discussion

"If you just want the executive summary, here it is: these are definitely interesting algorithms from an arithmetic complexity theory standpoint – especially for the case of 4×4 matrices over finite fields, where (to the best of my knowledge) Strassen’s algorithm from 1969 was still the reigning champion.

These algorithms are also practically relevant, meaning that not only do they have better asymptotic lower bounds than Strassen’s algorithm, they are still algorithms you might actually use in practice, unlike essentially everything else that has been written on the topic in the last 50 years: these algorithms are correct, and will in principle win over Strassen’s algorithm with large enough matrices, but that cut-off is well beyond the sizes that anyone is actually doing computations with."

klyrs · on Oct 5, 2022

shhhh don't tell Coppersmith... or Alman&Williams

dqpb · on Oct 6, 2022

So, obviously they need to use these algorithms in AlphaTensor and run it again.

buildbot · on Oct 5, 2022

I wonder if this technique could be used to find better/more mathematically efficient self attention algorithms for transformers in general. Meta programming with AI…