Don’t invert that matrix

ewjordan · on Jan 19, 2010

Here's another piece of advice that covers a lot more than matrix inversion: don't solve math problems yourself in code if you can in any way avoid it.

Use a library instead, and use the most specific methods that are available: if there are specialized routines to solve Ax=b, use them, don't use the matrix inversion methods (which are probably also there), or even the LU decomposition ones, even if Real Math tells you that the result is equivalent - in numerical computation, mathematical equivalence is an entirely different entity than what "real" mathematicians concern themselves with.

Seriously. I don't care if you're a math PhD, I don't even care if you know a decent amount about numerical methods. Whatever code you're going to put together to do the job will be inferior to what a mature library can do, simply because the library has been put through many iterations and has had a lot more time to get patched up to handle all those special cases (usually related to finite precision or discretization) that Real Math doesn't have to worry about. And it will probably be vastly more optimized than anything you'll put together, because it's spent years of heavy use in some critical research environments.

The only time you should be writing your own math code (beyond simple arithmetic) is if no mature libraries are available to you. In which case you should allocate at the very least twice the amount of time it will take to research and implement the methods, and probably a lot more, because you're going to have a lot of fiddling to do.

DrJokepu · on Jan 20, 2010

Another reason would be (self-)education. There's no better way to gain a deeper understanding of a subject or learn the ins-and-outs of optimization than writing some code and then possibly trying to improve it. If you always blindly depend on third party libraries, you will remain ignorant and dumb.

Then, you can go on and use third party libraries in production while feeling satisfied that you know what's going on inside the library.

gaius · on Jan 20, 2010

If you always blindly depend on third party libraries, you will remain ignorant and dumb.

Depends if your goal is to become an aeronautical engineer or an assembly language hacker, really. Neither one's "better" than the other, but the the former won't benefit much from understanding exactly how to optimize on SPARC vs x64... Nor will the latter need to worry about his compiler's stalling speed :-)

metaguri · on Jan 20, 2010

The other important reason to use a library is that if someone improves the algorithm someday, you can benefit for nothing more than installing and testing the new version of the library.

I remember this also being the case for Constraint Satisfaction Problem (CSP) algorithms in AI. If you can formulate your problem as a CSP and feed it to a library, you automatically benefit from future improvements in the field of CSP solvers.

nitrogen · on Jan 20, 2010

What sort of keywords would one enter into a search engine to find such a library?

gjm11 · on Jan 20, 2010

<<<linear algebra library>>> finds several good ones with Google.

[EDITED to add: That's if what you want is specifically linear algebra. <<<numerical library>>> does pretty well if your needs are more general.]

brettnak · on Jan 19, 2010

Are people interested enough in this for a blog post? I studied numerical analysis and would be happy to post some of the basics if people are interested.

nimrody · on Jan 19, 2010

What is usually missing from these articles is even a simplified error analysis.

We can all count the floating-point operations, and today caches sometimes influence matrix algorithms more than raw flop counts. The problem gets more interesting when the matrix becomes ill conditioned and different algorithms will produce very different results.

indigoviolet · on Jan 19, 2010

Yes! Please.

sown · on Jan 20, 2010

In the same vein as "Statistics Hacks" I'm sure those same fans will be fond of "Numerical Hacks". :)

roundsquare · on Jan 20, 2010

I'd be interested, but probably if you went into less common subjects. "Don't invert that matrix" is probably the first thing I, or anyone else, learned in numerical analysis.

Side note: My professor also mentioned it at least 3 times in each lecture... so it is important.

profquail · on Jan 19, 2010

I did numerical analysis too...I don't think most CS departments cover it though, so chances are a lot of people would be interested in it.

imurray · on Jan 19, 2010

A Matlab/Octave example of inv being slower and less accurate:

  >> A = randn(100); x = randn(100,1); b = A*x;
  >> tic; x1 = inv(A)*b; toc
  Elapsed time is 0.142381 seconds.
  >> tic; x2 = A\b; toc     
  Elapsed time is 0.000736 seconds.
  >> max(abs(x-x1))
  ans =
     7.1054e-14
  >> max(abs(x-x2))
  ans =
     4.8406e-14

Edit: Actually this is a warning about timing stuff. The inv function hadn't "warmed up". On a second run both times drop, and the difference is much smaller:

  >> tic; x1 = inv(A)*b; toc
  Elapsed time is 0.000737 seconds.
  >> tic; x2 = A\b; toc
  Elapsed time is 0.000471 seconds.

ibsulon · on Jan 19, 2010

What happens when you run it 1000 times, skipping the first?

imurray · on Jan 19, 2010

If I use this:

http://www.mathworks.com/matlabcentral/fileexchange/18798-ti...

which warms up the function, averages over repetitions and tries to account for various overheads, the ratio states about the same (1:1.5 ish). I would normally use timeit(), but for a quick post, I was being sloppy.

jacobolus · on Jan 19, 2010

Does anyone trying to solve problems involving matrices with millions of entries not have some idea of the computational complexity of various operations they can do on them, given the nature of their matrices, etc.? I’d expect that either (a) someone has a small enough problem that the computational time is irrelevant and they should just arrive at the solution in whatever manner is conceptually clearest, given their current mathematical knowledge, or (b) the problem is going to take a lot of computation time, in which case it’s worth exploring to learn exactly what the fastest method will be.

lliiffee · on Jan 19, 2010

You can solve amazingly large problems with a sparse matrix with out understanding what you are doing, as long as you use direct (factorization) methods. If you type A\b with some gigantic sparse A, matlab will apply reordering heuristics that often just work, and are actually incredibly hard to improve upon by manual fiddling. When you get to really, really big problems, you need to use iterative methods (like conjugate gradient), and for those you have to have a very firm understanding of everything. The gap between direct and indirect methods is huge.

jedbrown · on Jan 19, 2010

Matlab's backslash is the GPL'd Umfpack (relicensed from the author). The most important thing to know about iterative solvers is that the Krylov method is a minor component, preconditioning is the hard part.

whyenot · on Jan 19, 2010

You need to have a basic understanding of the different Krylov subspace methods to be able to use them. If you are using CG on a matrix that isn't SPD, then you are doing something very wrong. Same goes for the other methods -- they all have their strengths, weaknesses and gotchas, and it's pretty important to know what they are if you are going to use them! Preconditioning can greatly reduce the number of iterations you have to do once you have chosen a method, but it's not particularly helpful if you are using the wrong method.

jedbrown · on Jan 19, 2010

With CG, the preconditioner also needs to be SPD (or have suitable factors). You can get by pretty well using only GMRES and spend all your time thinking about preconditioners. Once you have some good candidate preconditioners, you can try other Krylov methods, but the gains are rarely very large. I have frequently found SPD systems that are more efficiently solved with GMRES and a non-symmetric preconditioner, and even more so for symmetric indefinite systems where popular wisdom says to use MINRES, CR, or SQMR.

For amusement, it's really worth knowing about

http://www.comlab.ox.ac.uk/people/nick.trefethen/publication...

which shows that each of the popular nonsymmetric iterations can beat all others by a huge factor.

But this is all justification that you should use a library like PETSc for your linear solvers. Then all aspects of the solve become run-time (typically command-line) options including an algebra for composing preconditioners and other components, with a plugin architecture for every component.

jedbrown · on Jan 19, 2010

If you are using CG on a matrix that isn't SPD, then you are doing something very wrong.

Actually, this isn't so weird. There are a number of domain decomposition schemes that use CG, but keep all iterates in a benign space on which the operator is SPD, even though the real system is indefinite (porous media and Stokes/mixed elasticity). See work by Axel Klawonn, and recent stuff from Xuemin Tu.

liuliu · on Jan 19, 2010

When I working on l1-minimize problem, I wrote a very simple cg solver here and it can solve for a sparse matrix of A (Ax=B where x and B are dense) (c with cxcore from opencv):

http://github.com/liuliu/l1qc/tree/master/src/

Retric · on Jan 19, 2010

Speed is not the only measurement, accuracy can also be a problem.

roundsquare · on Jan 20, 2010

Not necessarily. The specific mathematical/programming techniques can be very complex. It is probably not reasonable to learn them just because you are dealing with large matrices. Of course, some knowledge of the computational complexity is definitely helpful.

memetichazard · on Jan 19, 2010

I'm a bit rusty on my linear algebra. Without inverting the matrix, what's the algorithm for solving Ax=b? Factorization is mentioned but I'm not sure what it means in this case.

JabavuAdams · on Jan 19, 2010

Gaussian elimination. Remember where you subtract multiples of one row from another, in order to get zeros below the diagonal? That way you can back-solve a system of equations by reading off solutions starting with the bottom-right corner.

The result of carrying out the elimination steps is a triangular matrix with all zeros below the main diagonal. Call it U, for "upper".

The elimination steps themselves can be encoded as a matrix, rather than describing them in words. For instance, "subtract 2 times row 1 from row 2" is the matrix:

[ 1 0 0]

[-2 1 0]

[ 0 0 1]

So, to solve the system, you can perform elimination on A x = b, which gives you two matrices (U, and the elimination steps L). In some sense, you've factored A in to L and U.

Example:

Using Strang's terminology, let's call the matrix that "eliminates (i.e. sets to 0) the (i,j) entry" Eij

So, to solve a 3x3 system A x = b, you want to do elimination as: (E32 E31 E21) A = U

I.e.

1) Take A and subtract some multiple of row 2 from row 1, to put 0 in (2,1)

2) Take the result and subtract some multiple of row 3 from row 1 to put 0 in (3,1)

3) Take the result and subtract some multiple of row 3 from row 2 to put 0 in (3,2)

Now the goal of elimination is to get an uppper-triangular matrix U. I.e. if you were solving some system of equations in x, y, z, then because U is triangular, you could just read off the answer to z. Then plug that into the equation for row 2 to solve for y, then plug that into the equation for row 1 to solve for x. That's back-substitution.

BUT, let's return to our operations. We have:

E32 E31 E21 A = U

(E32 E31 E21) A = U

inverse(E32 E31 E21) (E32 E31 E21) A = inverse(E32 E31 E21) U

I A = inverse(E32 E31 E21) U

Now, it just so happens that the inverse of (E32 E31 E21) is lower-triangular. I.e. all entries above the diagonal are zero, so let's call it L.

I A = L U

A = L U

LU-decomposition. Or "factorization", since we've factored A into two matrices: L, and U.

whyenot · on Jan 19, 2010

Yes, but it's a little more complicated than that. In most cases you cannot trust the solution of straight Gaussian elimination. You need to take additional measures to avoid excessive error growth, such as using partial pivoting.

see http://en.wikipedia.org/wiki/Pivoting and http://en.wikipedia.org/wiki/Gaussian_elimination

ajross · on Jan 19, 2010

LU decomposition is the most common choice, which is a similar operation to matrix inversion in that you're iterating row (or column) operations on both sides of an equation. Numerical Recipes has a great section on this, IIRC.

But that kind of obscures the fact that, while algorithms like LU decomposition are faster than a pure matrix inversion, they're faster by a constant factor of about 2 or so. It's not a fundamentally different algorithm, just a different set of tricks to solve a bunch of simultaneous equations. If you aren't hugely performance constrained and already have a matrix inversion routine, you'd never be bothered to re-write your equation solver just for this speedup.

eru · on Jan 19, 2010

As far as I know LU decomposition is more than a constant factor faster than inversion. And it is numerically more stable.

ajross · on Jan 19, 2010

No. They're both O(N^3) in the dimension of the matrix. And it's more stable only in the sense that it's doing fewer operations. There's actually a great trick (again, in Numerical Recipes) when doing inversion to spend a few more (constant factor) cycles to iterate a correction to the limit of the precision of your number representation.

Really, check Numerical Recipes. I like their section a lot, though I don't have my copy handy to give you a cite.

jedbrown · on Jan 19, 2010

The asymptotics are completely different for sparse and banded matrices. And NR really isn't a good reference unless you are looking for a mildly flawed 30 year old analysis.

ajross · on Jan 19, 2010

Numerical Recipes has both sound theory exposition and working code (yes, with some bugs here and there). This blog post has neither. Nor, to my knowledge, does any other single reference out there. It's easily the best "how to think about numerics work" text I know; if you have a better one please point me there.

It's easy for a wonk to flame about something as pedestrian as a how-to text for working scientists, but it's really not helpful.

jedbrown · on Jan 20, 2010

You are probably familiar with this page

http://www.fceia.unr.edu.ar/~fisicomp/apuntes/biblios/wnotnr...

and the suggested alternatives

http://www.fceia.unr.edu.ar/~fisicomp/apuntes/biblios/altnr....

In addition to those suggestions, here are a few more relevant to the present discussion. For dense matrices and how to think about linear algebra algorithms, I highly recommend Trefethen and Bau "Numerical Linear Algebra". For sparse direct solvers, Tim Davis' book (http://www.ec-securehost.com/SIAM/FA02.html) is good, and you can transition from the "learning"-level implementation to Umfpack which is his production-quality solver (it's what is behind Matlab's backslash). For iterative solvers, Saad is pretty standard. For multiphysics solvers, I don't think any book can match the Knoll and Keyes' 2004 review on Jacobian-free Newton-Krylov methods (it's very accessible).

The GSL has better implementations for many general-purpose algorithms in NR. SciPy is great if you work in Python. For scalable linear and nonlinear solvers, look at PETSc.

chollida1 · on Jan 20, 2010

> And NR really isn't a good reference unless you are looking for a mildly flawed 30 year old analysis.

Could you recommend a better book?

petewarden · on Jan 19, 2010

They have old editions of Numerical Recipes online: http://www.nrbook.com/a/bookcpdf.php

Sadly they're using the horrid 'File open' encryption on the PDFs.

eru · on Jan 19, 2010

I guess I'll have to go back to my lecture notes. There was a reason why we did LU-decomposition (and similar things) instead of inverting matrices directly.

SandB0x · on Jan 19, 2010

As others have said, Gaussian elimination, or if the matrix is not invertible (square and singular or not square) then the Moore-Penrose pseudoinverse can be used to find the least squares solution.

jedbrown · on Jan 19, 2010

Except that the normal equations are rarely a good way to compute the action of the pseudoinverse (though explicitly computing it would be even worse). QR is the workhorse for under- and over-determined dense systems, with SVD as a fallback for especially nasty ones.