Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I learned about machine learning way after I learned mathematics, so it always amused me that

back propagation = chain rule = forward differentiation = adjoint differentiation

and that different disciplines have different words for what is just the chain rule.




None of the parties mentioned actually deny the above equivalence. The reason backprop is a popular idea in deep learning is because people started developing continuous models, where the output (and the error) was a continuous and differentiable function of the input and the weights, which allowed chain rule to be used to compute the gradients, which allowed one to use gradient descent methods. This shift from discrete units to continuous units was termed error backpropogation, and not just chain rule.


It's kind of unfortunate, as it's forced everything to be continuous. Which is not very computationally efficient or easily interpreted by humans.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: