Hacker News new | past | comments | ask | show | jobs | submit login

If you enjoy these kinds of explanations, "Data Science from Scratch" by Joel Grus explains many machine learning algorithms and has you implement simple versions of them in Python as you read along. It also covers linear regression and I wonder if that book is where the author got the idea for this series of blogposts. Kudos anyway.

Something of a nitpick, but one thing that both Simon Ward-Jones and Joel Grus miss is that linear regression is typically not implemented using gradient descent at all, there's an analytical solution and you can get the beta coefficients with straightforward matrix algebra. It's much harder to explain than gradient descent so I get why they don't bother, but on the other hand without that background it's hard to see why everybody talks about linear regression all the time when with gradient descent or any other numerical optimizer there's really no limit to what f(x) can look like.




I don't think they've missed the fact that for linear regression there's an algebraic solution, but machine learners typically treat linear regression a simple special case, and as soon as want to go a bit beyond you have to go with an iterative numerical optimizer anyway so why bother with the special case solution.

My main criticism on the article would be that the nitty gritty section only makes sense to a reader that has already done a linear algebra / multivariate calculus course. In which case they've likely already covered least squares in greater depth (including the exact solutions) than this article. So I don't really see the purpose of the math section, except maybe to signal that the writer has a descent understanding of the algorithmic detail.


I had never actually seen "Data Science from Scratch" - sounds like a book I should read!

Fair point about the gradient descent Vs normal equations closed form solution. I am planning on working through a few algorithms so thought it would be better to introduce gradient descent with something simple before talking about gradient boosted decision trees and Neural Networks. Also I would have to explain more complex matrix stuff like invertibility issues and linear dependance like you said.

I guess I just dodged that bullet and went for gradient descent. Maybe another post for the linear algebra fans! Thanks for reading though!


I’m working through Joel’s book and implementing the solutions in Swift:

https://github.com/melling/data-science-from-scratch-swift

Of the last 5, only the Linear Regression chapter is finished:

https://github.com/melling/data-science-from-scratch-swift/b...


> it's hard to see why everybody talks about linear regression all the time when with gradient descent or any other numerical optimizer there's really no limit to what f(x) can look like.

Even though there is no limit to what f can look like (except being differentiable), the kind of models that you can reasonably fit and obtain good performance from is still very much limited by how much data you have, and of which quality. And data is indeed the bottleneck in the vast majority of cases.


Another resource I'd recommend is Andrew Ng's Machine Learning course on Coursera.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: