I learned about machine learning way after I learned mathematics, so it always a...

sherjilozair · on Oct 21, 2014

None of the parties mentioned actually deny the above equivalence. The reason backprop is a popular idea in deep learning is because people started developing continuous models, where the output (and the error) was a continuous and differentiable function of the input and the weights, which allowed chain rule to be used to compute the gradients, which allowed one to use gradient descent methods. This shift from discrete units to continuous units was termed error backpropogation, and not just chain rule.

Houshalter · on Oct 22, 2014

It's kind of unfortunate, as it's forced everything to be continuous. Which is not very computationally efficient or easily interpreted by humans.