A Story to Explain Genetic Algorithms

jacques_chester · on Feb 16, 2013

If you're interested in the world of "computational intelligence" aka "nature-inspired computation", this book is a good high level survey: http://www.cleveralgorithms.com/

(You can also find stuff under the heading of "metaheuristics" -- http://cs.gmu.edu/~sean/book/metaheuristics/).

GAs are one of four (!) different independently developed strands of thought -- Genetic Algorithms, Genetic Programming, Evolutionary Programming and Evolutionary Strategies. Look out for that when you hit the Googletrons.

jdherman · on Feb 16, 2013

Those are some good links, thanks. Are there clear distinctions between these four areas? It seems like the separation is a historical artifact, when they're all basically doing the same thing.

jacques_chester · on Feb 16, 2013

I mostly looked at GAs and GPs. Evo strategy has some interesting mathematical stuff.

Basically GA and GP are the dominant strands. And there's a key difference.

Genetic Algorithms are historically about finding paramaters into a function. You have some function f(X1 ... Xn), and you are trying to find the best set of parameters X1...Xn. So a common representation is a string of floats or integers or whatever, and then you do the crossovers and mutations based on positions in the string.

Genetic Programming is about creating new functions. So instead of a string of parameters, the genome is (usually) a tree of instructions to be interpreted. You do mutations by swapping node types, crossovers by chopping subtrees and moving them etc.

One artefact of the historical "parallel evolution" (teehee) of these fields is that GP practitioners use crossovers much less than GA practitioners.

randomsearch · on Feb 16, 2013

To a large degree you are correct, the different algorithms are often interchangeable. Sometimes this is not the case: e.g. Genetic Programming is used to search for algorithms that are not easily represented and searched for with other evolutionary algorithms.

stcredzero · on Feb 16, 2013

> Then our Charles simply had to figure out how Mrs Kipling scored the cakes and he could genetically evolve the best cake!

"And then a miracle happens." Isn't the fitness function the "key ingredient?" (It's in the source code, but not in the text for a reason.)

Still a good explanation of the rest.

GuiA · on Feb 16, 2013

> "And then a miracle happens."

One of the favorite quotes of my graduate advisor, thanks for the chuckle :)

For those who are unfamiliar with the reference:

http://star.psy.ohio-state.edu/coglab/Miracle.html (mirror image: http://i.imgur.com/k7gm70t.jpg)

jacques_chester · on Feb 16, 2013

One of the things that optimising algorithms are really good at doing is demonstrating that your fitness function is incompletely specified.

helloiamdave · on Feb 16, 2013

Once upon there was another local search algorithm. It was called GA. The end.

randomsearch · on Feb 16, 2013

Actually, GAs are mostly described as global search algorithms, because they use populations with recombination and therefore simultaneously consider potentially disparate areas of the search space.

fmax30 · on Feb 17, 2013

Once upon a time there was a local search algorithm It was called GA and then it mutated and became a Global Search Algorithm. The End.

jmduke · on Feb 16, 2013

Unrelated: this simvla network thing looks like a pale imitation of Svbtle.

FaisalAbid · on Feb 16, 2013

@OP- Hey Jake, awesome article, really understand what Genetic Algorithms are now.

Also wanted to let the people that think Simvla is a ripoff of Svbtle know that I recently acquired Simvla from its creator and it's my first job to give Simvla a unique look. I respect Svbtle and I feel it's unfair to Dustin that Simvla copies the look he created.

ipince · on Feb 16, 2013

I would suggest making the fitness function more complex. I think its a bit confusing to see the optimal result be equal to the fitness function's parameters, and it makes it easier to miss the point.

perone · on Feb 17, 2013

I've written something [1] related to this if someone is interested.

[1] http://pyevolve.sourceforge.net/wordpress/?p=576

jmcdowell · on Feb 16, 2013

I don't have much of a knowledge of genetic algorithms so it might be good to explicitly say what part of GA algorithms set them apart from other branches of algorithms.

Nice story otherwise.

gweinberg · on Feb 16, 2013

It's not at all clear from the story why or if having multiple recipes "mate" works better than just using one recipe and fiddling with it.

jules · on Feb 16, 2013

It doesn't. At least there is zero evidence that it does and there is a lot of evidence that it does not (i.e. simple hill climbing beats GA in every practical situation). I've mentioned this before on HN and reddit, but from time to time these kind of stories come up again. There is a weird fascination with genetic algorithms. Even though dozens of people have a career based on it, as far as I can tell the whole field of biologically inspired optimization algorithms is basically pseudoscience. Note optimization algorithms, so this does certainly not include e.g. neural networks, which work fantastically well.

randomsearch · on Feb 16, 2013

That's a very insightful comment. The debate over whether recombination gives you anything "extra" continues in academia to this day. For example, papers on the controversy surrounding crossover in Genetic Programming (Luke and Spector in the 90s is a good starting point).

datz · on Feb 17, 2013

This is stupid. The whole is not simply a sum of its parts in cake baking and biology.

zapdrive · on Feb 16, 2013

"After thinking long and hard about the problem at hand our Charles came to the conclusion that he knew two things"

Lol... long and hard.

youngerdryas · on Feb 16, 2013

Neat little story, despite the grammar and capitalization. Is it common to use _ as an index variable?

Edit: for _ in range(4):

mikeash · on Feb 16, 2013

_ is a typical name to use when the language requires a variable name that the program doesn't otherwise need. In this case, the program just needs to run some code four times, and doesn't care about the index. The language doesn't have a plain "run this code X times" construct, but it does have a "iterate over this list" construct, so the code iterates over the list [0, 1, 2, 3] and puts the current element into a dummy variable since it's not being used.

JosephHatfield · on Feb 16, 2013

Not sure how common that is, but I've seen it used before where the index variable itself is not referenced inside the body of the statement.

aghull · on Feb 16, 2013

Common in Lua, Ruby, Python. In fact, ruby gives _ a minor amount of special meaning to make it more convenient as a throw-away variable, e.g. it can be specified multiple times in an argument list without an error and generates no warning if left unused.

monochromatic · on Feb 16, 2013

I've seen it before, but it's a little jarring. Not sure what's wrong with just using i.

jdiez17 · on Feb 16, 2013

It says explicitly that you don't care about that value. If you used i, for example, you wouldn't know whether the loop depended on the iteration number.

monochromatic · on Feb 16, 2013

Yeah, I get it. And that's a useful thing to do, I guess. It just seems a little... unpythonic.

randomsearch · on Feb 16, 2013

To avoid unused variable warnings.

yen223 · on Feb 16, 2013

It's better to use two underscores __ to represent throwaway variables, because _ has a meaning in Python: it returns the last output.

GuiA · on Feb 16, 2013

True, but only in the REPL.

For example try to run the following script (not from the REPL):

  _ = 12
  print _
  42 * 42
  print _

mertd · on Feb 16, 2013

Most of the code is a little verbose for Python. For example make() function could simply be:

    return [random.randint(1,10000) for x in range(4)]

Also remember that a little kitten dies every time you write a for loop as follows:

    for index in range(len(some_list)):

ipince · on Feb 16, 2013

I'm a python noob. Why does a kitten die? You're creating a new list of the same size of the source list (once) and iterating over it. How can you achieve the same w/out creating a new list?

mertd · on Feb 16, 2013

Python god takes the kitten as a sacrifice for the unnecessary list of indexes you created :).

range(len(some_list)) is a convoluted way iterating over the elements of a list. What you are doing is creating a new list with elements [0,... lengh-1] and iterating over the elements of this list, only to immediately throw them away after using them as array index to grab the item you really want from the original list.

Instead you can just directly iterate over your original list: "for item in some_list". If you really need to know the index value, you can use enumerate: "for index, item in enumerate(some_list)"

One nice thing is that the "for x in y" lets you iterate over things (iterables) that you may be generating on the fly (e.g., all prime numbers smaller than N).