The title is misleading. In general, I like an applications-first approach, but I have hard time liking a tutorial with the title like that and which proceeds to say things like
>Neural networks consist of linear layers alternating with non-linear layers.
without providing a proper definition nor intuition what is linear and what is not. Next step they are constructing NN layers of ReLU's and not telling what is a rectified linear unit and only barely hinting it is supposed to do ("non-linearity").
The article is not a worthless, it's a nice tutorial to building a NN classifier for MNIST, but don't expect full mastery of the mathematics relevant in understanding NNs after reading this tutorial.
Chapter 2 of the book by Goodfellow et al. is a serviceable concise summary of the relevant concepts, but I don't think that chapter alone is a good primary learning material if you are not already familiar with the subject. For that I'd recommend reading a proper undergrad level textbook (and doing at least some of the problem sets [1]), one example: http://math.mit.edu/~gs/linearalgebra/ and then continue with e.g. the rest of the Goodfellow's book.
[1] There's no royal road into learning mathematics. Or for that matter, learning anything.
It's the notes for a 40 min talk Rachel will give at the O'Reilly AI conference. It was originally meant to be a longer tutorial, so the scope had to be cut down significantly, whilst the title remained. There's lots of links in the notebook to additional resources with more background info.
Having said that, there really isn't much more linear algebra you need to implement neural networks from scratch. You'll need convolutions of course, although that's not too different from what's shown here.
For those interested in much more detail, Rachel has a full computational linear algebra course online http://www.fast.ai/2017/07/17/num-lin-alg/ . Most of that isn't needed for most deep learning, however.
Hrm. I find a lot of stuff on linear algebra not so intuitive or motivating. It's a bit harder to get through. A big motivator is the sheer scale of application, along with calculus. Yeah it's a must for AI but really linear algebra underpins much much more and gives you awesome problem solving tools in general.
As for intuition, a better approach may be to get 3 good books and cycle through them. Where one doesn't feel so intuitive at a given point, the other 2 might. They will explain things in a different way with different examples. Connecting the ideas across 3 sources will help it sink in.
For a light intuitive introduction, give this a try:
Is there any book which actually explains where matrix and its rules come from? Instead of throwing on you matrix multiplication rules in dogmatic way so you blindly follow them like mindless robot?
I know there are lectures in YouTube from 3Blue1Brown:
Matrices are not fundamental or interesting by themselves as just Excel-like grids of numbers. The reason we care about them is because they are a convenient notation for a certain class of functions.
I have to catch a flight so I don't have time to explain this fully, but the key points are:
1/ A "linear function" is a function where each variable in the output is "linear" in all the variables of the input (i.e., a sum of constant multiples of the input variables). e.g. f(x,y) = (x + 3y, y - 2x) is a linear function, but g(x, y) = (x^2, sin(y)) is not.
2/ All linear functions can be represented by a matrix. The `f` I mentioned above corresponds to the matrix:
[ 1 3
-2 1 ]
3/ The rules of matrix multiplication are defined so that multiplying by the matrix of a linear function corresponds to applying that function.
For example, again using the definitions above:
f(7, 8) = (31, -6)
And notice that we get the same thing when we do matrix multiplication:
[ 1 3 * [ 7 = [ 31
-2 1] 8 ] -6 ]
4/ Matrix multiplication also corresponds to function composition. If `f` is as defined above, and `h` is defined by h(x, y) = (-3y, 4x + y), then the matrix for h is
[ 0 -3
4 1 ]
and the function `f ο h` you get by applying `h` and then `f` is (you can check this...)
f ο h(x, y) = (12x, 4x + 7y)
The matrix for this functions happens to be
[ 12 0
4 7 ]
But, lo and behold, this matches matrix multiplication:
[ 1 3 * [ 0 -3 = [ 12 0
-2 1 ] 4 1 ] 4 7 ]
4/ Why do we care about linear functions? Well, linear functions are interesting for a lot of reasons, but one in particular is that (differential) calculus is all about approximating arbitrary differentiable functions by linear ones. So you might have some weird function but, if it's differentiable, you know that "locally" it is approximated by some (constant plus a) linear function
"Linear Algebra Done Right" is a fine book but its enduring popularity leads people to recommend it as a universal default answer.
The parent asked if there was a LA book that covered the material in the same style as 3Blue1Brown's videos. If that's the criteria, Sheldon Axler's book isn't the best book. One can compare a sample chapter to the youtube videos and realize they use different pedagogy:
I second this. Linear Algebra Done Right is an awesome book. It also comes with a helpful selection of exercises after each chapter with detailed answers available on the website, which is great for self-learners. If you are a student and your Uni has a Springer subscription, you might be able to get the PDF for free.
I like the look of the presentation and the active teaching method adopted. I also like the way Savov gives out the definitions and the facts as a pdf but keeps the exercises, investigations and examples for the paid version - the exercises are the value added in Maths in my limited experience.
I just bought the paper version off Lulu (I like being able to read and scribble and then go on the computer for the computational exercises).
And now to set up SymPy on Debian...
There was a web book I found a while ago that built up some sort of motivation for linear algebra. Unfortunately I don't remember what it was or the title.
This is the link for the first part. You'll find further articles there.
It walks you through the code, explain things briefly, and points you to the exact places in a good Linear Algebra with Application textbook where this is explained in detail.
my linear algebra teacher taught it in a visual and proof focused way and it was amazing. He also tied it upwards into abstract algebra WRT vector spaces. He also taught my abstract algebra where he tied things back into linear algebra. That was an amazing set of classes...
Thank you for this! I've been re-learning Linear Algebra since my question[1] was answered on an earlier HN thread[2] on fast.ai. This will definitely help.
This left me with a lot of questions, it could definitely use more explanations between code examples. I had to do a lot of "reading between the lines".
If AI == Neural Networks only, maybe matrix knowledge is enough, but when you start to include NLP, Expert systems, SEarch algorithms such as A*, Minimax, etc. in the AI category, you'll have to know more mathematics..
Even for NN alone, those few operations are not enough. Not only that, but to even understand what the operations that I need directly do, I need far more background knowledge of linear algebra.
I wonder why there's so much more emphasis on Linear Algebra over Calculus (I've seen a number of courses teaching LA to complement DL courses), given that without Calculus, it's hard to understand optimization and backpropagation. Rote memorization might help for copying the same network over and over but isn't enough when you have to customize it.
How do you know a stationary point is a local optimum? Eigenvalues of the Hessian matrix! Even though vector calculus is taught without assuming linear algebra, a lot of the material is coupled. Further, many of the "first steps" machine learning ideas are essentially linear algebra problems (e.g., linear regression, PCA, etc).
> The purpose of this notebook is to serve as an explanation of two crucial linear algebra operations used when coding neural networks: matrix multiplication and broadcasting.
Um... no. While I like Rachel, and she's really smart and all that, this is a vaaast oversimplification.
I have yet to go through the tutorial. LA is one of the areas where I am trying to improve... Coding the Matrix course/book is well written for those that find a good/intuitive approach to learning.
I may be biased because I have a background in mathematics, but after working with tensorflow for about 6 months now, I dont think you really need to understand linear algebra or multivariate calculus to work with neural nets, unless you're trying to implement your own engine.
Edit: I dont mean to discourage anyone from learning, as an understanding of the relatively simple mathematics behind NNs may afford the user an additional intuition about the behavior of neural nets, but it appears that one can treat neural nets almost like black boxes, given a suitable engine to work with like TF.
As an aside, TF is a pretty magnificent library. It pretty much works out of the box, and in addition to python and CPP bindings, there appears to be an unnoficial port to c#, although I haven't tried it yet. I strongly recommend the tutorials at tensorflow for anyone interested in experimenting.
The title is misleading. In general, I like an applications-first approach, but I have hard time liking a tutorial with the title like that and which proceeds to say things like
>Neural networks consist of linear layers alternating with non-linear layers.
without providing a proper definition nor intuition what is linear and what is not. Next step they are constructing NN layers of ReLU's and not telling what is a rectified linear unit and only barely hinting it is supposed to do ("non-linearity").
The article is not a worthless, it's a nice tutorial to building a NN classifier for MNIST, but don't expect full mastery of the mathematics relevant in understanding NNs after reading this tutorial.
Chapter 2 of the book by Goodfellow et al. is a serviceable concise summary of the relevant concepts, but I don't think that chapter alone is a good primary learning material if you are not already familiar with the subject. For that I'd recommend reading a proper undergrad level textbook (and doing at least some of the problem sets [1]), one example: http://math.mit.edu/~gs/linearalgebra/ and then continue with e.g. the rest of the Goodfellow's book.
[1] There's no royal road into learning mathematics. Or for that matter, learning anything.