In my experience if you have even a little smoothness in your problem's cost man...

vinn124 · on May 28, 2018

> Many losses which don't seem differentiable can be reformulated as such...

agreed, especially with policy gradients.

> If the dimensionality is small, second-order methods (or approximations thereof) can do dramatically better yet.

i have not seen second order derivatives in practice, presumably due to memory limitations. can you point me to examples?

hmartiros · on May 28, 2018

They aren't common in deep learning, but if you look to estimation problems like odometry, optimal control, and calibration, the typical approach is to build a least squares estimator that optimizes with a gauss-newton approximation to the Hessian, or other quasi-newton methods. Gradient descent comparatively exhibits very slow convergence in these cases, especially when there is a large condition number. In the case of an actual quadratic loss function, it can (by definition) be solved in one iteration if you have the Hessian. However, getting it efficiently within most learning frameworks is difficult, as they primarily only compute VJPs or HVPs.