Hacker News new | past | comments | ask | show | jobs | submit login

I'm very much not a fan of this introduction.

You use gradient descent, but do not introduce the normal equations. This is problematic for at least two reasons.

Case 1: Design matrix has full rank

Omitting the normal equations obfuscates what is really going on. You are inverting a matrix to solve the first order condition of a strictly convex objective, which therefore has a unique optimum.

Case 2: Design matrix does not have full rank

Omitting the normal equations hides the fact that there are multiple solutions to the first order condition. Gradient descent will find one of them, but you need a principled method for selecting among them. The Moore-Penrose pseudo-inverse method gives you the solution with the smallest L2 norm.

Omitting these details is setting learners up for failure.




I agree with this 100%.

You may amused to learn that "How would you program a solver for a system of linear equations?" was an informal interview question for a top machine learning PhD program, and applicants were not looked upon favorably if they mindlessly gave gradient descent as an answer.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: