I've been reading through the notes and you present the material extremely well. I especially like how you discuss naive approaches before going about a better way to do things (e.g., computing a gradient numerically vs analytic). This is rare in teaching but, from a student's perspective, it really helps fill in the gaps of knowledge as you try to reason and understand the process on your own.
Thanks! Unfortunately a lot of teaching is very relative and strongly depends on prior background. A different student gave me feedback on that section as: "Why are you expanding out all the random steps nonsense? Gradient descent takes one line to explain". It's the same for my lectures: No matter what I say or cover, at any point during the lecture some people are bored and some are completely lost. All you can hope for is hitting the median well and then learning to ignore (to some degree) the person who just asked a question that indicates that they are not following at all, and the person next to them who is yawning and on their phone.
I'd like to second karpathy here. When you do a practical interpretation of something like machine learning (and even deep learning!) I've had to cater to different tastes. Usually people in these classes fall in to either the more engineering side where breaking down gradient descent can help, or mathy where they've already done convex optimization and know the trade offs of LBFGS vs Conjugate Gradient and all properties of parametric models are obvious. The best thing you can do here is work with the students 1 on 1 to fill in the gaps. There's no silver bullet. Which is why I'd say taking the class in person is always going to be better than notes. I think karpathy is hitting a wider audience with the way he's handling the notes though.
Confused as to the downvote..but maybe I can clarify here. People wanting to apply deep learning tend to fall in to one of two camps: heavy CS with some applied ML experience and mathematicians who might not have as much experience building things. In karpathy's case, he's likely going to get a mix of students who have taken different classes. There's a lot of variance either way.