Tangentially related (and also using the ubiquitous MNIST dataset), Sebastian Lague started a brilliant, but unfortunately unfinished video series on building neural networks from scratch.
This video was an absolute eye-opener me [1] on what classification is, how it works and why a non-linear activation function is required. I probably learned more in the 5 minutes watching this than doing multiple Coursera courses on the subject.
One "ah ha" about that required non-linear function I had was the fact that if you were just passing numbers through a series of linear functions they could by definition just be combined into one equation.
This video was an absolute eye-opener me [1] on what classification is, how it works and why a non-linear activation function is required. I probably learned more in the 5 minutes watching this than doing multiple Coursera courses on the subject.
[1] https://www.youtube.com/watch?v=bVQUSndDllU