I think the flops comparison you’ve presented is not fair: for nvidia it is “tensor” floops, not generic float multiplication (which is 10 times smaller), while for intel it is any float multiplication.
So for i9 the number would be higher if fma operations used, no?
It doesn’t make sense. Why it is fair to compare matrix multiplication with generic float operations? It should be either comparison of matrix multiplication to matrix multiplication or generic float to generic float.
So for i9 the number would be higher if fma operations used, no?