I'm surprised that genetic algorithms on resource-bounded general purpose virtua...

cpgxiii · on Nov 5, 2020

It comes down to gradients. When you have useful gradients to work with backprop on a GPU is vastly faster and more directed than GAs. One could uncharitably say that all of the impressive results from NNs in the last decade are the result of this ability to throw large amounts of data into highly parallel training, rather than new theory about the expressiveness and capabilities of NNs.

When you don't have or can't use a gradient, GAs become the go-to tool, for optimizations (if only because you don't have much in the way of other options).

king_magic · on Nov 5, 2020

GAs can be way, way, way more compute intensive as your population size increases.

That said, I use GAs for optimization and find them to be an incredibly useful tool for various optimization tasks - just generally not for learning weights of NNs, where a GPU + backprop generally reigns supreme (or at least works well enough/much faster than a GA).

pchiusano · on Nov 5, 2020

You might be interested in "compact GA" which only requires keeping the % of 1s (or 0s) at each bit position.

I suspect that with some more engineering and attention from people doing ML stuff, GA-style algorithms can be made just as memory and space efficient as gradient methods, while giving better results and being more widely applicable.

Here's a post on this: http://pchiusano.github.io/2020-10-12/learning-without-a-gra...

woah · on Nov 5, 2020

With a genetic algorithm you have to try a bunch of weight variations and see which one works best. With backpropagation you can calculate one derivative and find out which variation will work best in one step. It’s hugely more efficient.

pchiusano · on Nov 5, 2020

GA-style search isn't actually taking multiple samples to decide on a new point to move to - it's taking multiple samples to decide on a smaller region to focus on. This can be more efficient than backprop, depending on how "easy" it is to tell via sampling which subregion has better performance.

Here's a post on this: http://pchiusano.github.io/2020-10-12/learning-without-a-gra...

Traditional GA has a practical problem for training large models - keeping a large population of model weights in memory is not feasible. Like if you had to keep 1000 variations of the GPT-3 model weights in memory for training that's no good. Though people have ideas for how to solve for that as well (again, see the post I linked).

Der_Einzige · on Nov 5, 2020

You can use genetic algorithms for training the weights of a neurL network. This is called neuroevolution and it's competitive on reinforcement learning because gradients are harder to find there...

gmadsen · on Nov 5, 2020

yes, extremely.

parallelism like that isn't possible with GA