Pure genius. Even with his explanation of the problem domain, I have no idea abo...

nomel · on Jan 10, 2019

I don't understand this. Could someone give a quick explanation? Specifically, why changing the cost function helps?

Is the point that the cost functions have incompatible gradients around local minima/different local minima?

dzdt · on Jan 10, 2019

Is the point that the cost functions have incompatible gradients around local minima/different local minima?

I think that is part of it: the different cost functions can have different local minima and also different saddle points; ideally even different ridge/valley configurations.

In machine learning there is a well-known technique called Stochastic Gradient Descent (SGD) [1]. There the cost function is the sum of a very large number of terms reflecting how well each element of the training set has been reproduced.

With SGD the optimisation steps use randomly chosen cost functions which are obtained by choosing a random subset of the training set.

I had thought the advantage of SGD was purely in saved computation: by computing the approximate cost function on only a small batch you have only a tiny fraction of the computational expense. That if you could use larger batches it would always help the convergence.

This demo writeup makes me realize there may be a benefit from the randomness. Different cost functions may have different local minima, different saddles, different ridges. That helps you not get stuck or even slowed at these points.

[1] https://en.m.wikipedia.org/wiki/Stochastic_gradient_descent

Lerc · on Jan 10, 2019

I gather that's the idea, but mean squared error and mean absolute error are fairly correlated, so I'm not sure if that would be an advantage or disadvantage.

I'm running a MSE hill climbing thing at the moment, I might give it a go and see if it helps.

tlarkworthy · on Jan 10, 2019

My guess is the random that helps. Noisey hill climbing is a technique (simulated annealing) to address local optima.

lbj · on Jan 10, 2019

Ah, now that makes sense.

pygy_ · on Jan 10, 2019

That was also my understanding.