I was afraid to ask the same question. I can do it, I can program it, and I can check to see if it's correct, but I can't for the life of me understand why somebody saw fit to describe matrix multiplication the way they did.
Matrix multiplication is the composition of linear maps. It’s sometimes lost in the more computational approach, but we can think of this geometrically. If you know the first matrix rotates the plan by some amount, and the 2nd rotates it in the same direction as well, then the product must be the rotation matrix that rotates by the sum of the original rotations. That’s a very simple example. Since every nonsingular matrix is just a change of basis, you can get a rich geometric understanding for multiplying matrices. Moreover when things get more complicated, we can use invariants like the determinant and the trace to help guide our intuition.
I’d highly suggest watching 3Blue1Brown’s videos on Linear Algebra. He won’t get you to understanding everything (you’ll need to sit down and do problems for that) but he will help you see what intuition is out there in a very beautiful way. He makes a very good point that often when we go through the computations without the geometric intuition, we can spend a ton of time crunching numbers to see results that should have been obvious.