> So, in case of (stochastic) gradient descent to train a neural network, that w...

HarHarVeryFunny · on Aug 1, 2022

I don't really see the distinction. In the basic case of using gradients to optimize a NN, what's really being optimized are the parameters of a function expressing the NN error. We're just following the gradient downhill to find whatever minimum of this function the gradient leads us to (hopefully the global minimum, as is likely to be the case with lots of parameters).

In this "learning to learn" paper the optimization function (used to update the gradients at each step of the gradient descent algorithm) has been parameterized, and we're instead using gradient descent to find a minimum of that parameterized function.

So, yes, differentiable programming isn't limited to minimizing the losses of neural nets, but as the grandparent poster pointed out, the whole concept obviously only applies to parameterized programs calculating some numerical output that we want to miminize.

selestify · on Aug 1, 2022

Wow, it’s interesting that a paper from 2016 is already classic. This field moves fast!

radarsat1 · on Aug 1, 2022

Well, I'd say it's a "classic" within the field of differentiable programming, which is a new field, so... everything is relative.