Hacker News new | past | comments | ask | show | jobs | submit login

Pure genius. Even with his explanation of the problem domain, I have no idea about how he arrived at the 'magical idea' of random double losses



I don't understand this. Could someone give a quick explanation? Specifically, why changing the cost function helps?

Is the point that the cost functions have incompatible gradients around local minima/different local minima?


Is the point that the cost functions have incompatible gradients around local minima/different local minima?

I think that is part of it: the different cost functions can have different local minima and also different saddle points; ideally even different ridge/valley configurations.

In machine learning there is a well-known technique called Stochastic Gradient Descent (SGD) [1]. There the cost function is the sum of a very large number of terms reflecting how well each element of the training set has been reproduced.

With SGD the optimisation steps use randomly chosen cost functions which are obtained by choosing a random subset of the training set.

I had thought the advantage of SGD was purely in saved computation: by computing the approximate cost function on only a small batch you have only a tiny fraction of the computational expense. That if you could use larger batches it would always help the convergence.

This demo writeup makes me realize there may be a benefit from the randomness. Different cost functions may have different local minima, different saddles, different ridges. That helps you not get stuck or even slowed at these points.

[1] https://en.m.wikipedia.org/wiki/Stochastic_gradient_descent


I gather that's the idea, but mean squared error and mean absolute error are fairly correlated, so I'm not sure if that would be an advantage or disadvantage.

I'm running a MSE hill climbing thing at the moment, I might give it a go and see if it helps.


My guess is the random that helps. Noisey hill climbing is a technique (simulated annealing) to address local optima.


Ah, now that makes sense.


That was also my understanding.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: