None of the parties mentioned actually deny the above equivalence. The reason backprop is a popular idea in deep learning is because people started developing continuous models, where the output (and the error) was a continuous and differentiable function of the input and the weights, which allowed chain rule to be used to compute the gradients, which allowed one to use gradient descent methods. This shift from discrete units to continuous units was termed error backpropogation, and not just chain rule.
back propagation = chain rule = forward differentiation = adjoint differentiation
and that different disciplines have different words for what is just the chain rule.