Hacker News new | past | comments | ask | show | jobs | submit login
Fundamentals of Linear Algebra and Optimization [pdf] (upenn.edu)
339 points by mindcrime on Dec 14, 2017 | hide | past | favorite | 59 comments



My test for linear algebra books is how they first present matrices and matrix multiplication.

If they define a matrix as an NxM table of numbers with a multiplication operation defined as this complicated formula with a couple of nested sigmas, and then much later a lemma is mentioned that says every linear transformation can be represented as a matrix and then the composition of two transforms is the matrix multiplication of their matrix forms, I throw the book away in disgust. I throw most books away in disgust.

This is the second book I've seen that does it right, but unlike the other one [1] this one wraps the very correct presentation of matrices in so much technical language and such a boring cover story that I almost threw it away in disgust anyway.

My greatest wish in STEM education is that we teach linear algebra better. It's such low hanging fruit and could change so much!

[1] which I saw linked in HN a year ago and don't remember the name of, but it was something like "linear algebra taught the correct way" and was apparently well known in the States so ask your friends


I think the book you're talking about is Axler's "Linear Algebra Done Right." I worked through the problems in this book with some friends, and it is a very good presentation.

After doing that book I wanted to see what other books were out there, which ones were the good ones, and sort of classify them. I came up with this list: https://begriffs.com/posts/2016-07-24-best-linear-algebra-bo...

I classify the books as Generalist (like Axler's), Theoretical (starting from e.g. modules rather than vector spaces), Numerical (matrix normal forms and algorithms), Practice (books full of problems), and a few other categories.


> After doing that book I wanted to see what other books were out there, which ones were the good ones, and sort of classify them. I came up with this list: https://begriffs.com/posts/2016-07-24-best-linear-algebra-bo...

Ironically you put "Jänich, K. (1994). Linear algebra. New York: Springer-Verlag." into the "Theoretical" section: In Germany this book (its German original) is not that well-regarded, since it is considered as far to shallow. Much better (and more hard to read) German textbooks are

Gerd Fischer - Lineare Algebra: Eine Einführung für Studienanfänger (Linear Algebra: An Introduction for freshmen)

Siegfried Bosch - Lineare Algebra (and its companion book: Siegfried Bosch - Algebra)


You're right, I made a mistake. Just reviewed the table of contents on Amazon and it looks pretty short, and not particularly theoretical. Also poorly reviewed. I don't remember how it got on my list but I'm removing it.


Aweomse. Thank you. Jealous of you and your friends. Wish I had a Pythagorean squad :)


I've heard Shilov's text described as a great complement to Axler's.


Starting with an abstract formal description of vector spaces and linear transformations is not a pedagogically useful introduction, and it leaves out a huge amount of the historical/motivational context explaining most of the conventions used in linear algebra, and even the mental/conceptual understanding most working mathematicians have about the meaning of linear models and their use in various parts of mathematics (not even to mention people in other STEM fields).

The two places to start historically (and easily accessible to high school students) are:

(1) understanding and working with displacement vectors in 3-dimensional Euclidean affine space and in general thinking about transformation geometry (sometime later this can be extended to other kinds of non-Euclidean or non-metrical geometry), especially with reference to problems in Newtonian mechanics.

(2) systems of linear equations: this one is inherently coordinate-heavy and matrix based, at least to start out, and explains our conventions for how matrices are written, index order, multiplication of matrices by "column" vectors on the right, the use of matrix equations to fold several equals signs into one, etc.

After that I'd call out the manipulation of vectors of polynomial coefficients as an accessible additional concrete example of a linear space.

Discussion of other linear spaces or more purely abstract treatments proving properties from axioms can come sometime later, after students have familiarity with some of those tools, and after they have applied them to some problems in statistics, multivariable calculus, computer graphics, ODEs, optimization, etc.


And to follow-up, note that this book in particular is for a graduate-level course in the department of computer and information science, which has an undergraduate-level linear algebra course as a prerequisite.


Strang’s textbook (and lectures) introduce matrix multiplication AB as combinations of A’s columns. He uses the cell-by-cell / “dot product” formulas just to check his work, which seems reasonable to me. I’m a big fan of “combinations of columns”. I sometimes even hear that in Strang’s voice when I’m doing some matrix multiplications.

Axler was way too abstract for me as a first linear algebra book. I look forward to returning to it once I finish Strang.


Strang's exposition can make you a wizard of matrix multiplication and give you quite a good intuition for various properties of matrix operations, in my opinion.


Might [1] be "Linear Algebra Done Right" by Axler? http://www.springer.com/gp/book/9783319110790


To this book's credit, it covers a LOT more material than Axler's book. And it's more heavily focused on optimization methods for which the details of tabular representations of matrices takes center stage.


To me, matrices and matrix multiplication really represent the "calculus" of linear algebra. When you are studying real analysis, the theory part (closed, open sets, definition of limit points, continuity, etc) are not actually that helpful when doing a computation (calculating and integral or derivative). It often makes sense to teach a student the calculation side before the theory because its easier to get a intuition for.


I'm curious what you think of the "No Bullshit Guide to Linear Algebra" [1]? I'm considering buying it to refresh my knowledge from school. Or what books do you suggest?

[1] https://www.amazon.com/No-bullshit-guide-linear-algebra/dp/0...


It... more or less passes the test?

I mean, it defines matrix-vector product in a nice but abstract way, and then in the next paragraph explains why we chose that definition. And it does say "this, this here is the one important idea in this book", which it gets many points for.

I really would prefer a textbook to start with "ok, here's something we want to do. Lets figure out a formula for it. Now lets give it the name matrix-vector product".

Same for matrix-matrix products.

I don't know any really good linear algebra books, in my school it was taught the worst possible way (define a field, rote-learn the mechanical operation of matrix multiplication, talk about vector spaces for a while, talk about matrices, first mention linear transformations). Some people on this thread gave other suggestions, I'd start reading a few and choose the one I connect with the best.

Anyway, I really learned linear algebra from using it.


Here is a direct link to the page in the preview of the book where the matrix-product-is-the-same-as-linear-transformation idea first appears: https://minireference.com/static/excerpts/noBSguide2LA_previ...

I can't quite tell if the book passes the test, since the entry-point is the definition of matrix-vector product which is very close to a "complicated formula with a couple of nested sigmas" but the book also mentions the notion of matrix representations, so hard to tell overall.


Another text that starts with linear transformations is Apostol’s Calculus. It’s somewhat more concrete than Axler, IIRC — e.g. it doesn’t try to avoid determinants, but starts with properties of volume-scaling and then develops the determinant formula out of those. We used this text for freshman/sophomore math back in the 80s, so my memory is fuzzy; but I liked it a lot.


I don’t necessarily agree though. I’ve seen books first trying to introduce linear transformation without matrices and frankly to a beginner it’s even more confusing to be told f is linear if f(ax+y)=af(x)+f(y). Some students really need a concrete grasp of matrices with actual numbers they are familiar with before going into the abstract definition.

I’ve seen worse though: those that attempt to shoehorn abstract algebra in the process by first rigorously defining a field.

And no, teaching multiplication doesn’t involve memorizing a formula; it’s a simple mechanical process of arranging one matrix on the left the other at the top and multiplying/adding their corresponding rows and columns. Once this process is familiar to a student, they will have no trouble writing out the formula with nested sigmas.


I don't understand what's so wrong about that. Surely the interested student would like to learn about the properties of matrix arithmetic as well as vector spaces, linear transformations and their representation as matrices, etc.

I've had a quite accomplished mathematician for a professor who was very adamant that students should learn about basic matrix operations before the abstract theory. According to him, his undergraduate education at Princeton did things the "right" way, but left him quite confused about the main ideas and motivations. He ended up making some quite important contributions to the theory of representations of some certain algebraic structures if I recall correctly.


They way I was taught was like learning how to do long-form multiplication without mentioning that "the product of 5 and 6 is 5+5+5... 6 times".


The Friedberg, Insel, Spence book [1] passes your test. Linear transformations first, then matrices as a representation of transformations with respect to a specified basis. Depending on your tastes, you may find it too dry, but imo it is a clean presentation.

[1]: https://www.amazon.com/Linear-Algebra-4th-Stephen-Friedberg/...


Do you have any other tests for calculus, analysis, abstract algebra, discrete mathematics, probability or statistics books?


For me my test for multivariable calculus text is their treatment of the chain rule. If the book says something like “derivative of a composition is the composition of the derivatives” it’s good. If they instead say something involving a sigma and things like ∂f/∂x ∂x/∂t it’s likely bad.


What do you mean by "derivative of a composition is the composition of the derivatives"?


You need to think of a derivative as just a function that takes a function, a point, and returns a linearized function that best approximates the original function at that point.

So basically the chain rule states that if you have two functions F, G composed together and you want to find the derivative (the linearized approximation), you simply compute the linearized function for both F and G, and compose the linearized version afterwards.

You can read the Wikipedia article on the chain rule and it eventually gives this simple and elegant formula for the chain rule: https://wikimedia.org/api/rest_v1/media/math/render/svg/8b0f...

But the unfortunate reality is that too many textbooks formulate the chain rule in such a complicated manner that it obscures the simplicity and elegance of the chain rule.


Thank you for the explanations. I agree with you that this more abstract view of the chain rule (and the derivative itself) is superior to the sum-of-products formula one usually sees in a first course in multivariable calculus, but I feel most students have to learn the complicated, technical version first before they can see the beauty of the more abstract one.


IIRC, Apostol's Calculus, volume 2, does it right (linear transformations first, then matrices).


Stupid question: is the order of elements in matrix notation just convention or is there a deeper reason for it? Could we just as well have used „transposed matrices“?


I would teach linear algebra with a book that barely mentions linear transformations.

In finite dimensions, linear transformations and matrices are exactly the same object mathematical objects, with very different notations (matrix notation (boxes with numbers inside) vs the linear space/linear transformation notation). I would rather the students to learn deeper mathematics only in matrix notation, rather than to master less substantial mathematics with both notations.

Teaching both notations may reinforce the idea that matrices and linear transformations are different mathematical objects in finite dimension. Teaching the more abstract notation is mainly useful in infinite dimension (Hilbert spaces).


This is exactly backwards.

Whether you're discussing the Jacobian of a function, or change of basis matrices, learning the matrix formula is a lot less useful than seeing how it falls out of the linear function definition.

The formula is hard to memorize and gives no intuition for why anything is true. But from the linear function definition it is easy to reconstruct the formula.

In fact this is so true that I would say that anyone who only knows the matrix definition does not actually understand linear algebra.


You have succumbed to the coordinate virus, http://geocalc.clas.asu.edu/pdf/MathViruses.pdf


> In finite dimensions, linear transformations and matrices are exactly the same object mathematical objects

They are not: Matrices represent linear transformations with respect to a given basis. Linear transformations are completely independent from any chosen basis.


And you would forever cripple your students. As potential users of linear algebra, they would be mentally mutilated beyond all hope of regeneration.


> My greatest wish in STEM education is that we teach linear algebra better.

That's easy. Vector spaces and linear transformations are best understood abstractly. Matrices are best understood as visually and computationally convenient representations of tensor products, so don't mention matrices at all until well after you have established the basic properties of tensor products, in particular, the natural isomorphism between `L(V,W)` and `Dual(V) (x) W`.


I may be a troglodyte, but I was bored senseless in my linear algebra, calculus, discrete mathematics, statistics, etc classes in college. Then they tag all the interesting stuff like AI (we didn't call it Machine Learning back in the 90s) at the end of your major where you actually use it. I had to go back and relearn it all because I didn't pay attention the first time. Note to mathematicians (or at least math professors): lead with the interesting stuff before the theory. It'll make learning easier and more fun.


> lead with the interesting stuff before the theory. It'll make learning easier and more fun.

As a mathematician I say:

What is considered as interesting is different for each person. I, personally, for example deeply love this really abstract stuff (but of course I am aware that other people have different preferences). So I would say even finding "interesting stuff" that many students in the lecture hall might be interested in is really, really hard.

Another important argument against your idea is: To be even able to formulate the ideas from AI, one first have to learn and understand the words of the language in which one will formulate this. And these words are like "vector space", "linear map", "tensor product" etc. and understanding their meaning means knowing theorems about them. Starting with advanced topics, such as your AI example, is like giving beginner language learners a really advanced text in the foreign language that they just begin to learn. In other words: A reallz dubious idea.

Considerung your post I can only ask you why you did not have a talk with the course advisor of your faculty. He would immediately have told you why these lectures are important for the things that you are actually interested in.


Same: I never gave a crap about algebra, or calculus, until AP physics. When I learned that the point (and the origin story!) of all this is to model the behavior of the universe I suddenly wished I had been paying attention for years prior.

Seconding your wish that curricula would lead with motivations and then drill mechanics rather than drilling mechanics for 11 years and finally giving you the motivation in year 12.


It reminds me a bit of the motivation behind fast.ai, that you should get your hands dirty with the things you care about instead of learning the tough things until you're ready.

I found it easier to learn things like stats, linear algebra, and calculus once I had a personal application to it. Even something that I love in Reinforcement Learning, I would sometimes get sleepy reading pages and pages from the Sutton and Barto book, but as soon as I worked on a coding exercise, I could spend hours actively participate with trying to solve the problem.


I'm basically in the same boat, except I thought our AI class was basically just A* and other forms of search and didn't go back to learning any of it until after university!


Folks looking for introductory books in this area may also enjoy this new in-progress book: https://web.stanford.edu/~boyd/vmls/

As a follow-up I would recommend Trefethen & Bau, Numerical Linear Algebra.


Boyd and Vandenberghe will have a monopoly in anything optimization related if they keep on...


This is great!

There's so much hype about machine learning but so few people seem to appreciate the fundamental importance of linear algebra (especially in ML).

This fast.ai course is also great: http://www.fast.ai/2017/07/17/num-lin-alg/


Cool new course! I'm thinking about taking their Practical Deep Learing for Coders course at the moment: http://course.fast.ai/


If you're at all interested in actually doing deep learning, then do the course! It's by far my favorite resource for actually learning to do deep learning.

Of course it won't give you the math background, but you can abd should pick it up after.


Wait till January when they will release version 2 of this course which teaches DL with pytorch. I was one of the international fellows for that course, and it is so good you would want to do the updated one.


I took Prof. Gallier (the author of this document)'s class when I was a graduate student at Penn. http://www.cis.upenn.edu/~jean/home.html

It is one of the class I enjoyed the most when I was there. Some of his other books are quite enjoyable as well.

Dr. Gallier's PhD work was focused on logic, but his current research has a lot to do with computer graphics and computational geometry. He spent a long time study graduate-level algebraic geometry and algebraic topology while having the duty to be a professor at computer science department. His stories is always an inspiration for me to learn more.


If you're into Clojure, there is a fast linear algebra library Neanderthal (I'm the author) that covers both GPU and CPU, is easy to use, and comes with lots of tutorials that cover a linear algebra textbook chapters, as well as more advanced numerical computing uses (solvers, factorizations, etc.):

The list of guides is here: http://neanderthal.uncomplicate.org/articles/guides.html


I'd recommend watching these videos for some motivation: https://www.youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2x...

And remember, linear algebra has broader application than just geometry!


Is this going to be a textbook? I can't find it in print, but it looks like a great text.


It's probably the course book that the professor or department wrote for the specific course because they couldn't find a textbook that covered the material they wanted in the way they wanted. A lot of courses at U of M had them in the 90s; you'd buy the course book from a local printer's shop as a tape-bound, soft-cover book. They were very nice because they covered exactly what you needed using the same language as the professor. It was like having notes you could read before the lectures.


Good question. I'm honestly not sure. I just stumbled across it while looking for linalg resources. It appears to be legit, since the author is with UPenn and it's on a upenn.edu site. But I don't see an explicit license notification or anything, so not sure if you're even allowed to print it out yourself or not.


Was in his class at Penn, he gives all this stuff away with no issue. Didnt have to pay anything for course materials and its all open to the public.


Did fields other than R and C eat the authors' children or something?


Often linear algebra tools are applied to Hilbert spaces or Banach spaces, where fields other than R or C don't make much sense.

Looking at page 25 of TFA:

> In Definition 1.2, the field R may be replaced by the field of complex numbers C, in which case we have a complex vector space. It is even possible to replace R by the field of rational numbers Q or by any other field K (for example Z/pZ, where p is a prime number), in which case we have a K-vector space (in (V3), ∗ denotes multiplication in the field K). In most cases, the field K will be the field R of reals.


Yeah, lots of courses pay lip service to the existence of fields of characteristic p during the initial few weeks, then proceed to ignore them completely.


How relevant do you think finite fields are for solving applied optimization problems? What proportion of a course do you think should be devoted to them?


Every ordered field has characteristic 0, so I don't think finite fields are likely to be terribly useful for solving optimization problems. But:

(0) There is way more to linear algebra than linear programming.

(1) A linear algebra course is not the right place for an extensive study of how to solve optimization problems anyway. That belongs in a real analysis course.


This is a graduate-level course in the department of computer and information science.

> Prerequisite(s): Undergraduate course in linear algebra, calculus

> The goal of this course is to provide firm foundations in linear algebra and optimization techniques that will enable students to analyze and solve problems arising in various areas of computer science, especially computer vision, robotics, machine learning, computer graphics, embedded systems, and market engineering and systems. The students will acquire a firm theoretical knowledge of these concepts and tools. They will also learn how to use these tools in practice by tackling various judiciously chosen projects (from computer vision, etc.). This course will serve as a basis to more advanced courses in computer vision, convex optimization, machine learning, robotics, computer graphics, embedded systems, and market engineering and systems.


You had me at Bourbaki.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: