I understand the rotation part and some uses, but what's the intuitive explanation for why it's done this way? Convention?
IOW, what matrix multiplication tries to achieve by doing this manipulation which looks like rotation visually? Why not doing it w/o rotation instead (with the precondition that numbers of columns in two matrices should be equal, instead of column vs row)?
When mathematicians think about matrix multiplication (and matrices in general), they don't really think about "rotating" matrices like in the animation above, but rather about operators and their composition. The matrix multiplication is what it is, because that's how function composition works.
Look: consider two functions, f(x, y) = (x + 2y, 3x + 4y), and g(x, y) = (-x + 3y, 4x - y). What's f(g(x, y))? Well, let's work it out, it's simple algebra:
Whew, that was some hassle to keep track of everything. Now, here's what mathematicians typically do instead: they introduce matrices to make it much easier to keep track of the operations:
Let e_0 = (1, 0), and e_1 = (0, 1). Then f(e_0) = f(1, 0) = (1, 3) = e_0 + 3 e_1, and f(e_1) = f(0, 1) = (2, 4) = 2 e_0 + 4 e_1. Thus, mathematicians would write that f in basis e_0, e_1 is represented by the matrix
[1 2]
[3 4]
so that when you multiply it by the (coulmn) vector [x, y], you get
[x]
* [y]
[1 2][x + 2y]
[3 4][3x + 4y]
Similarly, g(e_0) = (-1, 4) = -e_0 + 4e_1, g(e_1) = (3, -1) = 3e_0 - e_1, so it's represented by the matrix:
Thanks all in the thread for explanations. I had matrix multiplication at the Uni (IT faculty, many years ago) as part of algebra courses, I memorized, I passed the exam, and forgot the topic. Though, I don't remember anyone explaining it then why it's used or useful.
My early understanding after reading your responses and the wiki article, it that's useful if we have some input data (vector), which then undergoes some sequential manipulation by several functions, and we want to know the result in one step, instead of many?
The short answer is that it leads to a more consistent notation.
The longer answer is that matrix multiplication is essentially 2 different operations. Consider the linear funtions:
f :: R3 -> R3
h :: R3 -> R3
As well as the point
x :: R3
If we fix a set a basis vectors, we can represent f and h as 3x3 matrices, and x as a 3x1 matrix.
The product [f][x]=[f(x)] then represents the result of a function application, while [f][h] = [f∘h] represents function composition.
For the function application portion, there is no problem with your proposal. We simply represent the point as a 1x3 vector instead of a a 3x1 vector (or similarly transpose the convention for representing a function as a matrix).
The problem comes with the function composition use-case. For your proposal to work, we would need to transpose only one of the two matrices, which means that the matrix representation of a function is determined both by the basis vectors, and a transposition parity bit. With multiplication as composition only making sense when the transposition parities don't match.
A different convention would make matrix-matrix multiplication no longer express composition of linear maps. So the geometric -- or deeper meaning -- would be lost.
The operation you're describing is nevertheless equal to A^T B. In other words, it can be expressed using a combination of matrix multiplication and matrix transpose. I don't see what it could be used for, though.
IOW, what matrix multiplication tries to achieve by doing this manipulation which looks like rotation visually? Why not doing it w/o rotation instead (with the precondition that numbers of columns in two matrices should be equal, instead of column vs row)?