I don't really see the distinction. In the basic case of using gradients to opti...

I don't really see the distinction. In the basic case of using gradients to optimize a NN, what's really being optimized are the parameters of a function expressing the NN error. We're just following the gradient downhill to find whatever minimum of this function the gradient leads us to (hopefully the global minimum, as is likely to be the case with lots of parameters).

In this "learning to learn" paper the optimization function (used to update the gradients at each step of the gradient descent algorithm) has been parameterized, and we're instead using gradient descent to find a minimum of that parameterized function.

So, yes, differentiable programming isn't limited to minimizing the losses of neural nets, but as the grandparent poster pointed out, the whole concept obviously only applies to parameterized programs calculating some numerical output that we want to miminize.