> So, in case of (stochastic) gradient descent to train a neural network, that would be the neural network that you want to use AD on, not the gradient descent algorithm. Obviously.
The specific proposal of differential programming as a paradigm that goes beyond simple applications of gradient descent for optimising NNs is exactly to apply gradient descent to optimise learning (and other) algorithms themselves. This may be to select hyperparameters, or to select among a family of algorithms, or to include differentiable constraints in the solver, etc.
I wonder, are you familiar with now classic paper, Learning to learn by gradient descent by gradient descent?
I don't really see the distinction. In the basic case of using gradients to optimize a NN, what's really being optimized are the parameters of a function expressing the NN error. We're just following the gradient downhill to find whatever minimum of this function the gradient leads us to (hopefully the global minimum, as is likely to be the case with lots of parameters).
In this "learning to learn" paper the optimization function (used to update the gradients at each step of the gradient descent algorithm) has been parameterized, and we're instead using gradient descent to find a minimum of that parameterized function.
So, yes, differentiable programming isn't limited to minimizing the losses of neural nets, but as the grandparent poster pointed out, the whole concept obviously only applies to parameterized programs calculating some numerical output that we want to miminize.
The specific proposal of differential programming as a paradigm that goes beyond simple applications of gradient descent for optimising NNs is exactly to apply gradient descent to optimise learning (and other) algorithms themselves. This may be to select hyperparameters, or to select among a family of algorithms, or to include differentiable constraints in the solver, etc.
I wonder, are you familiar with now classic paper, Learning to learn by gradient descent by gradient descent?
[0] https://arxiv.org/abs/1606.04474