GAs can be way, way, way more compute intensive as your population size increases.
That said, I use GAs for optimization and find them to be an incredibly useful tool for various optimization tasks - just generally not for learning weights of NNs, where a GPU + backprop generally reigns supreme (or at least works well enough/much faster than a GA).
You might be interested in "compact GA" which only requires keeping the % of 1s (or 0s) at each bit position.
I suspect that with some more engineering and attention from people doing ML stuff, GA-style algorithms can be made just as memory and space efficient as gradient methods, while giving better results and being more widely applicable.
That said, I use GAs for optimization and find them to be an incredibly useful tool for various optimization tasks - just generally not for learning weights of NNs, where a GPU + backprop generally reigns supreme (or at least works well enough/much faster than a GA).