It looks like it no longer does much progresses after some time. Many years ago I wrote one of those stuff, just a few days after the famous 2008 blog post doing the same (http://rogeralsing.com/2008/12/07/genetic-programming-evolut...). However after some time I switched the code to use simulated annealing, and it performed better. I'm running it for a few minutes against the Monalisa image to see what happens. Btw I had a plan to release the code after some cleanup, maybe tonight is the right moment after 6 years of this code hanging on my HD.
I took a course on this stuff in uni, it was so fantastically fun to play with.
Simulated annealing has less cachet than GAs (and other evolution-inspired approaches) but tends to perform well enough on lots of problems that you might as well skip GAs.
I haven't dug much on the linked GA, but it looks like it converges too quickly. That's a classic problem (exploration vs exploitation, aka coverage vs convergence) in optimisation problems. The problem I practiced on was Ackley's Function[1]. Even in 2 dimensions it's really good at tripping up algorithms that converge too eagerly.
Some GA methods actually make the parameters of the system part of the genome, but as I recall it doesn't make a huge difference overall. Fun to think about though.
Mind you, the No Free Lunch Theorem means that there's a place for everyone in the heuristic computing tent. And simple random sampling (monte carlo method) and/or graphing can be very helpful for looking for the contours of a solution space. There's still room for humans too.
> That's a classic problem (exploration vs exploitation, aka coverage vs convergence) in optimisation problems.
The nicest approach to exploration/exploitation I've seen for simulated annealing is to run two searches side-by-side. Their relative "temperatures" (ie. probability of accepting a worse solution) depends on their relative fitness: if both are the same fitness, both have the same temperature; if one has higher fitness than the other, its temperature lowers (so it can "focus" on that high-fitness region) whilst the other's temperature raises (so it can "spread out" to find higher-fitness regions).
Also, regarding the No Free Lunch Theorems, I don't give them much thought since they don't take the computational complexity of the problem into account. They say that every search algorithm, when averaged over all problems, achieves total performance that's the same as random search. The real world doesn't contain all problems; in particular, we care more about (computationally) simple problems than unrealisitically-complex ones. Worrying about NFL for search is like worrying about adversaries with infinite computing power in crypto; yes your algorithm will be weak against such an attacker, but it's unrealistic to care about.
Totally agree one reason why it converges so fast is because independent solutions are unlikely to produce a better offspring when crossed, so basically my feeling is that it ends with a very similar population and most of the evolution happens just because of the mutations... that's why I eventually switched to SA. Another funny thing is when the improvement you get because of the algorithm is not compensated by the drop in performance: working with just a set of triangles that fit the cache can be a true boost.
Ah, good point about crossing. Each individual actually faces enormous local selection pressure, so once a given triangle converges closely enough, offspring just aren't going to be selected upon crossover with a (possibly very different) neighbour.
So actually I think this demonstrates a kind speciation in a very sharply divided landscape. It'd probably work better on simple images with smooth gradients and the like.
Excellent observation! In this video, Dr Christos Papadimitriou lays out a compelling argument for why genetic algorithms are poor at optimizing individual fitness when compared to simulated annealing. The video is about asexual vs sexual evolution and how it connects to multiplicative update/weighted majorty type algorithms. https://www.youtube.com/watch?v=cWv-s6KuDlM
Gets to ~97% in the first few minutes and then takes a long time hunting for very small gains. I was hoping to see more dramatic improvement after the first couple of minutes as well, but I was amused by the thought that followed after my evolving image failed to look successively more like a Mondrian.
We were told early in public school that we share 98% (or some such very high number percent) of our DNA with our simian relatives. I always was amazed at how just small a section of the genome was responsible for such amazing diversity among primates, or even mammals in general.
Seeing the wild differences that result in these images when you're given 3% of leeway between generations really helps me appreciate the vast number of individuals in our species!
Hello! Chris here. I wrote this after being inspired by a GA+JS implementation by Jacob Seidelin, (attributed). Unfortunately his blog is returning a 404, but I think he in turn attributed Roger Alsing. I am absolutely not claiming the idea of genetic algorithms and art to be my own, but the code is, so I think it's unfair of you to call me a thief.
EvoLisa was a great post. I spent a few nights trying to reproduce it and I'm sure I wasn't the only one. I eventually got to the point where I was using SSE instructions (I think it was PSADBW) to speed up the fitness function. It was loads of fun. OpenGL is probably the solution to really upping the generation frequency, but I stopped messing with it before I got there.
I'd love to see reports of basic genetic genetic properties of the population. What's the variance in fitness within each generation? This turns out to be the key factor in the rate of evolution [1], assuming that one defines fitness the way that geneticists do.
Also, estimates of the relative strengths of mutation, selection, and drift would be nice, as would options for distinguishing between hard and soft selection. These sorts of diagnostics and options could really improve the method's effectiveness.
Why is the collection of properties for an individual mutated shard called DNA? It doesn't really have anything to do with the actual molecule, as far as I can tell...
By the same logic, this is a C2H6O simulator:
var s = "I'm going to get drunk";
for (var i = 0; i < 10000; i++) {
var idx = Math.floor(Math.random()*s.length);
var c = s.substr(idx, 1);
s = s.substr(0, idx) + s.substr(idx-1, s.length-1-idx);
idx = Math.floor(Math.random()*s.length);
s = s.substr(0, idx) + c + s.substr(idx, s.length-idx);
}
(Wow, the downvotes are aggressive these days. Relax - it's just a joke!)
That kind of data structure is just an abstraction of hereditary material, so a good way to think about it is that it is "like" artificial DNA in some gross ways. I don't think the author is trying to claim he's recreated the complexity of living organisms in his model
You are right. The "DNA" label is inaccurate, but jal278 explained the reasoning perfectly. I'd be happy to hear suggestions for a better term though! ;)
A few years ago I wrote a genetic algorithm to solve a travelling salesman problem - it was a ton of fun - I need to port it to JS one day.
Ive also thought about making an algorithm that would breed JS programs to solve basic tasks - sort an array of input or find the longest consecutive group of numbers. Maybe one day I'll find the time.
> Ive also thought about making an algorithm that would breed JS programs to solve basic tasks
When you do, a useful tip: in the literature, Genetic Algorithms are distinct from Genetic Programming.
GAs evolve collections of parameters into a function or group of functions.
GPs evolve actual executable programs. Mostly as interpreter trees, though some experiments have been done with evolving strings of machine code for real and virtual machines.
So in your case, the place to start is Genetic Programming.
ps. it's freakishly slow and the GP literature is really fixated on favouring one evolution step almost to the exclusion of others (I can't remember if it's the crossover step or mutation step).
In a lot of GA applications the effects of mutation are negligible. Given a large and diverse population, crossover provides enough variation already. Mutation will only be necessary to escape if the entire population has converged to a poor local optimum, but that doesn't tend to happen often in practice, and even if it does, there's no guarantee that mutation will help; hence the lack of focus on mutation operators in general.
It's very difficult for the algorithm to optimize the hexagons. If you change the number of vertices from 6 to 3, it runs slightly faster but the results are much better.
I love genetic algorithms. I wrote one for the 0-1 knapsack problem back in school, then used a genetic-ish algorithm (didn't have enough time for a true GA) to win a rock paper scissors tourney at work.
It seems to me that one needs another genetic algorithm to manage all the variables (such as mutation rate, population size, crossover, etc) because they have such an impact on the effectiveness of the algorithm.
> It seems to me that one needs another genetic algorithm to manage all the variables (such as mutation rate, population size, crossover, etc) because they have such an impact on the effectiveness of the algorithm.
This has been tried in the literature -- usually by adding those parameters to the genome. Basically it's a wash.
Great project. Love that it's configurable and open source. The analytics are a nice touch. I'm a little confused by the fitness percentage though. 98% or even 99% fitness seems to have very little immediate resemblance to the target image.
The fitness value represents the difference between the pixels of the "goal" image and the pixels of the current individual [1]. It will not necessarily reflect the perceived similarity between two images, which will change from person to person.
It'd be nondeterministic and quite time-consuming, but this suggests an image compression algorithm that could conceivably come up with very compact representations of its input.
I tried this as a college project inspired by the Roger Alsing post. Unfortunately I didn't create anything usable. If I was doing it again I'd use a SA approach or pure hillclimbing like Alsing - as antirez points out in a sibling post, crossover between diverse individuals rarely gives viable offspring with this problem.
would be cool to be able to upload your own photos / pull in from instagram, etc.
you might make a little coin by piping the finished images to a t-shirt or skin printer.
we'd be happy to give a free CreditCover, credit card skin to any one who wants one using your imagery. We have a standing $10 gift code for hacker news users (hackernews) but we could make something specific for you if you like.
I haven't looked at the specific source, but I can't imagine there's anything preventing this from being modified to run as a Web Worker on multiple threads.
EDIT: after 6 minutes (64 shapes) -> http://imgur.com/gyZCjBD
EDIT2: after ~15 minutes -> http://imgur.com/5XMU8vH
Now trying using just triangles, again with 64 shapes, I remember it was much better than when using circles for most images.
EDIT3: Oh, much better now, after 6 minutes (64 triangles) -> http://imgur.com/x9E9TmK
EDIT4: 10 minutes more and she got a nose: http://imgur.com/bX8UTIQ
EDIT5: Code released! https://news.ycombinator.com/item?id=8710418