Hacker News new | past | comments | ask | show | jobs | submit login

In my experience if you have even a little smoothness in your problem's cost manifold, taking advantage of gradients is invaluable to sample efficiency. Many losses which don't seem differentiable can be reformulated as such - you can look around and see a wide array of algorithms being put into end-to-end learned frameworks. If the dimensionality is small, second-order methods (or approximations thereof) can do dramatically better yet. However, I'm also a fan of evolutionary algorithms. I see no reason why evolutionary rules can't be defined with awareness of gradient signals.



> Many losses which don't seem differentiable can be reformulated as such...

agreed, especially with policy gradients.

> If the dimensionality is small, second-order methods (or approximations thereof) can do dramatically better yet.

i have not seen second order derivatives in practice, presumably due to memory limitations. can you point me to examples?


They aren't common in deep learning, but if you look to estimation problems like odometry, optimal control, and calibration, the typical approach is to build a least squares estimator that optimizes with a gauss-newton approximation to the Hessian, or other quasi-newton methods. Gradient descent comparatively exhibits very slow convergence in these cases, especially when there is a large condition number. In the case of an actual quadratic loss function, it can (by definition) be solved in one iteration if you have the Hessian. However, getting it efficiently within most learning frameworks is difficult, as they primarily only compute VJPs or HVPs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: