It's nice to see an article about GA, I've been working on them myself recently, fascinating stuff.
However the definition of Tournament Selection[1] is a little bit off. With tournament selection you don't take the fittest "k" individuals but you take a subset of "k" individuals at random and then apply proportionate selection.
That is absurd. Gradient descent assumes you can compute the gradient of your objective function. For a great many functions you cannot define the gradient: indeed you can't even define the objective function well.
Genetic algorithms belong to a family of metaheuristics, algorithms designed to tackle such ill-defined functions where gradient descent is simply impossible to use.
My concern with this webpage is that it is a far too simplistic treatment of genetic algorithms, to the point of being misleading, and has quite a number of outright errors. I wouldn't use it.
Totally obsurd. Gradient descent will 'almost' always get stuck in a local optima. The point of evolutionary algorithms, or any naturally inspired algorithm for that matter, is that it does not tend to converge upon local optima and has the possibility of moving out of this space towards the global optima. The fact that most problems it is applied to are NP-Complete means that it is unlikely it will find the optimal solution in polynomial time.
In practice genetic algorithms don't do much better than hill climbing with random restarts.
You can do random restarts, stochasticism, or population based stuff on gradient methods too.
Yes it does require at least an estimate of the gradient, but that's not too hard to get and it converges much faster. Meta-heuristics are only a last resort when you can't get a gradient.
Although they are a bit simpler so there is nothing wrong with using them if the search space isn't too big.
> Totally obsurd. Gradient descent will 'almost' always get stuck in a local optima
Well... just as evolutionary algorithms are essentially elaborate versions of hill-climbing which enable global search, likewise there are many well-established forms of gradient descent which are global searchers as well.
I think the key distinguishing feature is that gradient descent makes strong assumptions about the nature of the space being searched; and metaheuristics make much more general assumptions.
Meta-heuristics have the flexibility of providing good performance on all Combinatorial Optimisation Problems.
But meta-heuristics, can also be given domain knowledge. and/or be combined with a a heuristic. E.g. Ant Colony Optimisation techniques for Travelling Salesman Problems that combines ACO with Local Search.
Yes. Gradient descent with random restarts is the pragmatists first optimizations technique.
GA's often don't converge to anything useful and require tons of unsubstantiated unintuiative tuning.
I'd love for someone who knows more to chime in but its my understanding that gradient descent does not perform well on non-differentiable functions and only finds local optima.
I spoke with an optimization expert from Argonne National Lab and he gave me the same impression, that gradient descent techniques far outperform evolutionary ones. They're more specialized, make it easier to incorporate domain knowledge, and using stochasticity compensates for local optima.
> I spoke with an optimization expert from Argonne National Lab and he gave me the same impression, that gradient descent techniques far outperform evolutionary ones.
To rephrase this, you might say that "for conventional, well-understood, combinatorial problems, a gradient descent technique may be the best approach."
It's worth noting that in the optimisation community (and often, in the wider Computer Science community) evolutionary algorithms are often looked down upon.
I think the main reason for this is the lack of a solid theoretical foundation. Whilst approaches such as schema theory have proposed some explanations for the way GAs solve problems, they remain somewhat "black-box" and gradient descent methods are much more easily understood.
I'm not an expert in GAs, but I am an expert in Genetic Programming. A simple local search approach to the kind of problems GP is regularly used to solve would be useless in most cases, due to the nature of the search space.
I would imagine the competitiveness of a GA very much depends on the interaction between decision variables in the genome.
I wrote a non-evolutionary algorithm for symbolic regression, the main and most tractable of the GP applications. PGE outperofms GP by several orders of magnitude.
"PGE outperofms GP by several orders of magnitude."
[edit] -- just noticed the referenced paper...
The above is not a fair conclusion to draw from your paper. I haven't looked at it in detail, but comparing an algorithm on a set of benchmarks for one domain doesn't mean that it outperforms "GP". It does claim that your algorithm outperforms some published results on 22 (sensibly chosen) benchmarks.
Don't want to get into a huge debate here, but you should have recreated those results yourself rather than relying on previously published results. Maybe I misunderstood -- I don't have time to go through the paper in detail. If you did recreate them, great, but what about other forms of GP and parameter settings?
How does your system do when turned to the problem of, say, bug-fixing, or circuit board optimisation? Generally speaking, any domain-specific approach that uses more knowledge about the search space is going to outperform a vanilla optimisation algorithm.
I'm not disagreeing that GP isn't a good algorithm for symbolic regression, only that your results do not support your sweeping statement above.
[/edit]
I'm not here to defend GP. I think it has many flaws and I'm working on alternative algorithms. However, GP has an established track record of solutions to problems in specific domains unmatched by other methods. A good place to start is to look at the GECCO Human-Competitive awards.
I hope this doesn't come off as flamey or troll-like. I've been researching Symbolic Regrssion for 5 years, currently writing a dissertation on the subject.
My second paper has the better comparison between GP and PGE. Both papers compare to the most "state-of-the-art" implementations, per Korns. The only domain-specific knowledge PGE uses is the associativity and commutativity of addition and multiplication. Add partial ordering on the node types, non-linear regression of parameters, and techniques for memoization. All of the above is applicable to, and should be a part of, a GP implementation. They are separate from the actual search method, GP, GGGP, Cartesian GP, (insert more GP variants), and PGE.
PGE uses a priority queue to progress the search with local exploration functions. You basically end up with a variation best-fit search. One could use PGE on circuit board optimization or "atomic-regression" by observing that associativity and commutativity can be likened to symmetry and rotations in these spaces. The vast majority of my work has been in Symbolic Regression because it is the most useful of the GP application domains outside of research. Imagine scientists having a microscope for their data!
Fundamentally, the application problems you listed are all graph search problems. They all have domain-specific solutions which outperform a more general problems solving technique such as GP. We have many graph search algorithms (Prim,Djikstra,best-first,A* and variants) which are widely used outside of the academic research environment. Why omit the structural information about the problem, that is embedded within the problem definition and partial solutions, only to perform a randomized search? This is a question I have yet to hear a good answer to.
One of the main problems in the GP field is the difficulty in making legitimate comparisons, for several reasons.
1) Reference implementations, data, and code are scarce or don't work as well as the papers claim.
2) Benchmarks and metrics for comparison vary widely, see the GECCO 2012 paper titled "GP needs better benchmarks"
3) The reproducibility of GP literature is largely impossible, because of point 1&2, but more inherently because it is a randomized search.
I am skeptical that any randomized search method, such as GP/GA/EC, will find wide acceptance outside the research community due to reproducibility concerns and that it doesn't have a solid theoretical foundation on which to stand.
> I hope this doesn't come off as flamey or troll-like.
Not at all, don't worry! I hope mine didn't either.
I don't think I listed any application problems -- I guess you mean the GECCO awards --- but I don't think they are all graph problems in the conventional CS sense, i.e. A* and similar algorithms do not apply.
I don't think we're in much disagreement. I'm not a cheerleader for GP, I just disagreed with your claim that your algorithm simply "outperformed" it.
GAs and GP have been used widely outside of the research community, particularly GAs. They're used for problems where there isn't an obvious approach, human intuition fails, the maths is intractable, etc. There are toolboxes for major applications like Matlab. Sometimes it's not obvious that this is the case. It's been a few years since I've been to GECCO (I was there when the benchmarks paper was presented), but they used to have a lot of industrial applications presented each year.
There are plenty of stochastic algorithms that are more effective by some measure than their non-stochastic counterparts. No rationale for bias there.
Anyway, this is probably a little too detailed a discussion for HN, ha!
Technically combining stochasticity with gradient descent is not pure gradient descent.
As for the domain knowledge - this is not specific to gradient descent either, genetic algorithms and their problem encoding can be adapted to a domain to minimise the search space considered - therefore improving performance of the algorithm.
I didn't read the article, but did read a book about twenty five years ago. My take away was that the big deal with genetic algorithms is they work without much/any domain specific knowledge. That is one of the great underpinning of biological evolution.
Outperform on what? Can you give specific examples? There are a large number of Combinatorial Optimisation Problems. It is a very sweeping statement to say 'gradient descent techniques far outperform evolutionary ones'.
He said something along the lines of "I have yet to see an application where evolutionary techniques outperform gradient descent approaches." His talk was about parameter optimization for costly experiments (e.g., manufacturing and testing a vehicle), so his emphasis was on optimization that minimizes the number of function applications. But he seemed to be an expert with a lot of experience.
Some evolutionary algorithm use gradient descent, like cma-es algorithm. Population based EA can resolve Multi Objective problem, i'm not sure simple gradient descent can.
A couple more points. Depending on how diverse the population is, a high mutation rate may not help when selecting the fittest children. There is also a chance that you will find a local maxima, and not the best solution overall. Introducing a concept of multiple "species" to the population will sometimes help finding other solutions outside the local maxima, but then this complicates the crossover phase. If the "genes" are of equal length and therefore closely related species, you may be able to generate some offspring in the crossover phase, but then you are usually introducing another species at that time. Depending on the solution you are trying to find, this may or may not be appropriate.
I wrote a toy program that takes a source string and uses a genetic algorithm to transform it into a target string. It only took about an hour and was a fun process. I ended up adding in different types of crossover and trying to figure out which would solve it in the fewest generations on average. Fun stuff. Highly recommend it as a quick project for anyone.
However the definition of Tournament Selection[1] is a little bit off. With tournament selection you don't take the fittest "k" individuals but you take a subset of "k" individuals at random and then apply proportionate selection.
[1] http://en.wikipedia.org/wiki/Tournament_selection