If you enjoy these kinds of explanations, "Data Science from Scratch" by Joel Gr...

em500 · on June 19, 2020

I don't think they've missed the fact that for linear regression there's an algebraic solution, but machine learners typically treat linear regression a simple special case, and as soon as want to go a bit beyond you have to go with an iterative numerical optimizer anyway so why bother with the special case solution.

My main criticism on the article would be that the nitty gritty section only makes sense to a reader that has already done a linear algebra / multivariate calculus course. In which case they've likely already covered least squares in greater depth (including the exact solutions) than this article. So I don't really see the purpose of the math section, except maybe to signal that the writer has a descent understanding of the algorithmic detail.

simonwardjones · on June 19, 2020

I had never actually seen "Data Science from Scratch" - sounds like a book I should read!

Fair point about the gradient descent Vs normal equations closed form solution. I am planning on working through a few algorithms so thought it would be better to introduce gradient descent with something simple before talking about gradient boosted decision trees and Neural Networks. Also I would have to explain more complex matrix stuff like invertibility issues and linear dependance like you said.

I guess I just dodged that bullet and went for gradient descent. Maybe another post for the linear algebra fans! Thanks for reading though!

melling · on June 19, 2020

I’m working through Joel’s book and implementing the solutions in Swift:

https://github.com/melling/data-science-from-scratch-swift

Of the last 5, only the Linear Regression chapter is finished:

https://github.com/melling/data-science-from-scratch-swift/b...

blackbear_ · on June 19, 2020

> it's hard to see why everybody talks about linear regression all the time when with gradient descent or any other numerical optimizer there's really no limit to what f(x) can look like.

Even though there is no limit to what f can look like (except being differentiable), the kind of models that you can reasonably fit and obtain good performance from is still very much limited by how much data you have, and of which quality. And data is indeed the bottleneck in the vast majority of cases.

criddell · on June 19, 2020

Another resource I'd recommend is Andrew Ng's Machine Learning course on Coursera.