It comes down to gradients. When you have useful gradients to work with backprop...

It comes down to gradients. When you have useful gradients to work with backprop on a GPU is vastly faster and more directed than GAs. One could uncharitably say that all of the impressive results from NNs in the last decade are the result of this ability to throw large amounts of data into highly parallel training, rather than new theory about the expressiveness and capabilities of NNs.

When you don't have or can't use a gradient, GAs become the go-to tool, for optimizations (if only because you don't have much in the way of other options).