I'm not sure what you meant with the link, but the parent is right, so adding an explanation here: M1 Ultra has about 400GB/s theoretical bandwidth but Anandtech shows that none of the SoC blocks can actually reach that, pretty far for it. It seems that Apple summed all the bandwidth to all the blocks to get there, which does mean something but not that the GPU has access to this (the GPU memory controllers seem to be the bottleneck).
On the contrary, a 3080 laptop does reach 400GB/s, I'm personally seeing this routinely on AI workloads, so that's part of the explanation for subpar perf here (the other ones being probably matrix math and mixed precision)
On the contrary, a 3080 laptop does reach 400GB/s, I'm personally seeing this routinely on AI workloads, so that's part of the explanation for subpar perf here (the other ones being probably matrix math and mixed precision)