I understand the rotation part and some uses, but what's the intuitive explanati...

xyzzyz · on Jan 17, 2022

When mathematicians think about matrix multiplication (and matrices in general), they don't really think about "rotating" matrices like in the animation above, but rather about operators and their composition. The matrix multiplication is what it is, because that's how function composition works.

Look: consider two functions, f(x, y) = (x + 2y, 3x + 4y), and g(x, y) = (-x + 3y, 4x - y). What's f(g(x, y))? Well, let's work it out, it's simple algebra:

f(g(x, y)) = f(-x+3y, 4x-y) = ((-x+3y)+2(4x-y), 3(-x+3y) + 4(4x-y)) = (-x + 3y + 8x - 2y, -3x + 9y + 16x - 4y) = (7x + y, 13x + 5y).

Whew, that was some hassle to keep track of everything. Now, here's what mathematicians typically do instead: they introduce matrices to make it much easier to keep track of the operations:

Let e_0 = (1, 0), and e_1 = (0, 1). Then f(e_0) = f(1, 0) = (1, 3) = e_0 + 3 e_1, and f(e_1) = f(0, 1) = (2, 4) = 2 e_0 + 4 e_1. Thus, mathematicians would write that f in basis e_0, e_1 is represented by the matrix

[1 2] [3 4]

so that when you multiply it by the (coulmn) vector [x, y], you get

      [x]
    * [y]   
 [1 2][x + 2y]
 [3 4][3x + 4y]

Similarly, g(e_0) = (-1, 4) = -e_0 + 4e_1, g(e_1) = (3, -1) = 3e_0 - e_1, so it's represented by the matrix:

[-1 3] [ 4 -1]

Now, let's multiply matrix of f by matrix of g:

      [-1        3]
    * [ 4       -1]
 [1 2][-1*1+2*4  3*1-1*2]  = [7  1]
 [3 4][-1*3+4*4  3*3-1*4]    [13 5]

and when we multiply the resulting matrix by column vector [x, y]:

       [x]
     * [y]
 [7  1][7x + y]
 [13 5][13x + 5]

So, what did we get was in fact our original calculation of f(g(x, y)) = (7x + y, 13x + 5y).

The conclusion here is that matrix multiplication is what the function composition forces it to be.

jagrsw · on Jan 17, 2022

Thanks all in the thread for explanations. I had matrix multiplication at the Uni (IT faculty, many years ago) as part of algebra courses, I memorized, I passed the exam, and forgot the topic. Though, I don't remember anyone explaining it then why it's used or useful.

My early understanding after reading your responses and the wiki article, it that's useful if we have some input data (vector), which then undergoes some sequential manipulation by several functions, and we want to know the result in one step, instead of many?

ogogmad · on Jan 17, 2022

That's right, yeah.

gizmo686 · on Jan 17, 2022

The short answer is that it leads to a more consistent notation.

The longer answer is that matrix multiplication is essentially 2 different operations. Consider the linear funtions:

  f :: R3 -> R3
  h :: R3 -> R3

As well as the point

  x :: R3

If we fix a set a basis vectors, we can represent f and h as 3x3 matrices, and x as a 3x1 matrix.

The product [f][x]=[f(x)] then represents the result of a function application, while [f][h] = [f∘h] represents function composition.

For the function application portion, there is no problem with your proposal. We simply represent the point as a 1x3 vector instead of a a 3x1 vector (or similarly transpose the convention for representing a function as a matrix).

The problem comes with the function composition use-case. For your proposal to work, we would need to transpose only one of the two matrices, which means that the matrix representation of a function is determined both by the basis vectors, and a transposition parity bit. With multiplication as composition only making sense when the transposition parities don't match.

ogogmad · on Jan 17, 2022

A different convention would make matrix-matrix multiplication no longer express composition of linear maps. So the geometric -- or deeper meaning -- would be lost.

The operation you're describing is nevertheless equal to A^T B. In other words, it can be expressed using a combination of matrix multiplication and matrix transpose. I don't see what it could be used for, though.