Hacker News new | past | comments | ask | show | jobs | submit login

Makes me wonder if anyone has looked into using genetic algorithms combined with RL where the genetics determine the reward function.

This seems to be how humans have evolved. Ultimately, all living animals are here based on only one reward function, the ability to have had an uninterrupted chain of reproduction. Our nervous system provides stimuli and our brains chemicals provide positive or negative rewards (pain or pleasure) that optimize us taking actions that result in having an uninterrupted chain of reproduction (it's why sex feels good and putting your hand on a stove feels bad).

Presumably, both the reward function within our brain, as well as the signal it interprets (nervous system) evolved to find a more optimal combination of inputs and reward scalars for each input to maximize for this singular goal (reproduction).

Maybe we need to frame RL goals in much more simple terms, and allow genetic algorithms to evolve their own inputs and reward functions on their own.

RL is one of my weakest fields of knowledge in the AI field, so I'm sure some of this has been tried before, I'm curious how much and what the results have been.




>genetic algorithms combined with RL where the genetics determine the reward function.

I have been working on this problem for years (2+ as researcher, 2 as PhD student).

The main issue is that evolution is both massively parallel and had plenty of runtime to get to human level intelligence.

The person that pushes this evolution/evolved reward point is Andrew G. Barto and his students/collaborators over the years.

Satinder Singh in particular is actively working on gradient based algorithms to find rewards (e.g. https://arxiv.org/abs/2102.06741)

> Maybe we need to frame RL goals in much more simple terms, and allow genetic algorithms to evolve their own inputs and reward functions on their own.

I was checking HN while the current iteration of this (gradient based, genetic was my master thesis) algorithm, the main complexity is figuring out:

1) What are the sub-goal e.g. grasping things 2) How to solve those goals e.g. motor control 3) How to do something useful, e.g. surviving

Balancing those three processes is the current hurdle.

For more info my email is delvermm at mila.quebec


Also, evolution isn’t trying to get to human level intelligence. It’s just one out of millions of adaptations that work, it’s recent, and it’s rare. Change Earths parameters a little over the past several million years, and maybe we don’t evolve.


It's an extremely open-ended process.


> The main issue is that evolution is both massively parallel and had plenty of runtime to get to human level intelligence.

How many entities are we talking about for substantial evolution? I know that there have been 100 billion "humans" (not that it's so clear-cut) alive, so guessing this is on the order of ~trillions of entities to simulate some evolution for (but maybe I'm really underestimating the early tail of tons and tons of microorganisms and small short-lived life that got us to this point).

Is the bandwidth of evolution that much larger than what we could possible simulate with computation, especially for a much simpler world/task than "generally survive"?


Given that a single teaspoon of soil probably has about a billion bacterial organisms in it, I suspect you're a couple orders of magnitude short.


The purpose of any scientific field is to generate knowledge, i.e. to actually understand the conceptual underpinnings of something like intelligence.

This idea that all that's necessarily for engineering intelligence is throw some chemicals into a bucket and turn the heat on is bad. By that logic you can just write a universe simulator, wait a million years and maybe you solve AI as a side challenge. if AI is just evolution and genetics and genetics is just physics just solve that and we're good to go.

It's like if someone tried to build a bridge and he just clobbers things together until it stands up and then prays that it doesn't fall down. That's not how engineering works obviously, but that's the attitude we have towards AI.

What AI needs at this point is the very opposite. An actual theory of intelligence at a high level because we haven't really made progress on that front in decades.


The problem with this view is that the outside world has such an immensely vast amount of data to it that the problem becomes uncomputable.

It took ~3 billion years of real evolution to reach Humans. This evolution occurred on a planet scale, including naturally formed barriers which rose and fell, changes to climatic conditions, and even a few stellar events to shake up evolution. There isn't even a compelling reason to think that Humans couldn't have arisen anytime within the last ~300 million years. Implying that the probabilities of intelligent life emerging are low, or that the conditions are poorly understood and rare.

Effectively each attempt to learn an agent which has to interact with the real world runs into these problems. The solution is to make the reward more complex and the simulated environment more realistic - both actions which increase the computational costs of the problem faster than the improvements arrive.


Well the failure of multilayer perceptrons to converge into useful models was because of limited compute scale, which eventually was solved with the advent of powerful GPU's and CUDA popularising using their parallel computationa meant for graphics rendering for the linear operations used during back propogation.

Maybe the problem isn't that the RL algorithm is wrong, but that it just doesn't work without a 3 billion year, atom resolution, planet scale computer.


> and allow genetic algorithms to evolve their own inputs and reward functions on their own

I've been playing with genetic algorithms for years now as a hobby and this type approach was a dead end for me, the GA entities would just "game the system" as it were and would min/max in surprising ways.

My latest genetic algorithm creation https://littlefish.fish has performed far better at pattern recognition than I expected. I really think they've got massive potential.


I tried

1 4 9 16 25 16 9 4 1 4 9

It did not predict what I had in mind.


What did you have in mind?


16


> anyone has looked into using genetic algorithms combined with RL where the genetics determine the reward function

There is an introductory guide to rust, deep learning and genetic algorithms that I really like, it's called "Learning to fly" [^0]

[0]: https://pwy.io/en/posts/learning-to-fly-pt1/


I think the stove and sex examples are on the right track but these qualities are also what every animal experiences. Well... judging by the face of a dog when he's humping your leg, I'm sure it feels good for him.

Anyways, I think there's another ingredient that's missing that we humans uniquely have. I think that ingredient is the fear of death. The knowledge that of all our intellect and powers as a human, we will inevitably die. Its better summed up by terror management theory I believe.


Dogs have achieved an unbroken chain of survival dating back as far as your ancestors have, so they've achieved the same survival goals as you.

They've managed to do so without our intellectual abilities, which goes to prove that our goals of making RL algorithms become "smart" is malformed since high level complex and abstract reasoning skills apparently aren't a necessarily prerequisite to survival, at least in our earth environment.


Worker ants end the chain of survival, but are still necessary, so chain of surival among individuals is a flawed metric.


Indeed, the genetics of every living thing are not evidence of its fitness, but once it has reproduced, it is.


I found this video an excellent take on open ended goals in RL. https://youtu.be/lhYGXYeMq_E


One benefit with genetic algorithms is the fact that it can handle multiple objectives, like the NSGA-II algorithm. I used it to evolve a neural net in my master thesis.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: