Hacker News new | past | comments | ask | show | jobs | submit login
Darwin Machines (vedgie.net)
164 points by calepayson 3 months ago | hide | past | favorite | 82 comments



> This layering forms the dominant narrative of how intelligence may work and is the basis for deep neural nets. The idea is, stimulus is piped into the "top" layer and filters down to the bottom layer, with each layer picking up on more and more abstract concepts.

popular deep artificial neural networks (lstms, llms, etc.) are highly recurrent, in which they are simulating not deep networks, but shallow networks that process information in loops many times.

> columns.. and that's about it.

recommend not to oversimplify structure here. what you describing is only high-level structure of single part of brain (neocortex).

1. brain has many other structures inside basal ganglia, cerebellum, midbrain, etc. each with different characteristic micro-circuits.

2. brain networks are highly interconnected on long range. neurons project (as in send signals) to very distant parts of the brain. similarly they get projections from other distant parts of brain too.

3. temporal dimension is important. your article is very ML-like focusing on information processing devoid of temporal dimension. if you want to draw parallels to real neurons in brain, need to explain how it fits into temporal dynamics (oscillations in neurons and circuits).

4. is this competition in realm of abeyant (what you can think in principle) or current (what you think now) representations? what's the timescales and neurological basis for this?

overall, my take it is a bit ML-like talk. if it describes real neurological networks it got to be closer and stronger neurological footing.

here is some good material, if you want to dive into neuroscience. "Principles of Neurobiology", Liqun Luo, 2020 and "Fundamental Neuroscience", McGraw Hill.

more resources can be found here:

http://neuroscience-landscape.com/


> popular deep artificial neural networks (lstms, llms, etc.) are highly recurrent, in which they are simulating not deep networks, but shallow networks that process information in loops many times.

Thanks for the info. Is there anything you would recommend to dive deeper into this? Books/papers/courses/etc.

> recommend not to oversimplify structure here. what you describing is only high-level structure of single part of brain (neocortex).

Nice suggestion. I added a bit to make it clear that I'm talking about the neocortex.

> 1 & 2

Totally. I don't think AI is a simple as building a Darwin Machine, much like it's not as simple as building a neural net. But I think the concept of a Darwin Machine is an interesting, and possibly important, component.

My goal with this post was to introduce folks who hadn't heard of this concept and, hopefully, get in contact with folks who had. I left out the other so I could try to focus on what matters.

> temporal dimension is important. your article is very ML-like focusing on information processing devoid of temporal dimension. if you want to draw parallels to real neurons in brain, need to explain how it fits into temporal dynamics (oscillations in neurons and circuits).

Correct me if I misunderstand, but I believe I did. The spatio-temporal firing patterns of minicolumns contain the temporal dimension. I touched on the song analogy but we can go deeper here.

Let's imagine the firing pattern of a minicolumn as a melody that fits within the period of some internal clock (I doubt there's actually a clock but I think it's a useful analogy). Each minicolumn starts "singing" its melody over and over, in time with the clock. Each clock cycle, every minicolumn is influenced by its neighbors within the network and they begin to sync up. Eventually they're all harmonizing to the same melody.

A network might propagate a bunch of different melodies at once. When they meet, the melodies "compete". Each tries to propagate to a new minicolumn and fitness is judged by other inputs to that minicolumn (think sensory) and the tendencies of that minicolumn (think memory).

I think the evolution is an incredible algorithm is because it relies as much as it does on time.

> is this competition in realm of abeyant (what you can think in principle) or current (what you think now) representations? what's the timescales and neurological basis for this?

I'm not familiar with these ideas but let me give it a shot. Feel free to jump in with more questions to help clarify.

Neural Darwinism points to structures - minicolumns, cortical columns, and interesting features of their connections - and describes one possibility for how those structures might lead to thought. In your words, I think the structures are the realm of abeyant representations while the theory describes current representations.

The neurological basis for this, the description of the abeyant representation (hope I'm getting that right), is Calvin's observations of the structure of the brain. Observations based on his and other's research.

To a large extent, neuroscience doesn't have a great through-line-story of how the brain works. For example the idea of regions of the brain responsible for specific functions - like the hippocampus for memory - doesn't exactly play nice with Karl Lashley's experimental work on memory.

What I liked most about this book is how Calvin tried to relate his theory to both structure and experimental results.

> overall, my take it is a bit ML-like talk. if it describes real neurological networks it got to be closer and stronger neurological footing.

If, by ML-like talk, you mean a bit woo-woo and hand wavy. Ya, I agree. Ideally I'd be a better writer. But I'm not, so I highly recommend the book.

It's written by an incredible neuroscientist and, so far, none of the neuroscience researchers I've given it to have expressed anything other than excitement about it. And I explicitly told them to keep an eye out for places they might disagree. One of them is currently reading it a second time right now with the goal verifying everything. If it all checks out, he plans on presenting the ideas to his lab. I'll update the post if he, or anyone in his lab, finds something that doesn't check out.

> here is some good material, if you want to dive into neuroscience. "Principles of Neurobiology", Liqun Luo, 2020 and "Fundamental Neuroscience", McGraw Hill.

Why these two textbooks? I got my B.S. in neuroscience so I feel good about the foundations. Happy to check these out if you believe they add something that many other textbooks are missing.


Big-picture, the idea is that different modalities of sensory data (visual, olfactory, etc.) are processed by different minicolumns in the brain, i.e., different subnetworks, each outputting a different firing pattern. These firing patterns propagate across the surface area of the brain, competing with conflicting messages. And then, to quote the OP, "after some period of time a winner is chosen, likely the message that controls the greatest surface area, the greatest number of minicolumns. When this happens, the winning minicolumns are rewarded, likely prompting them to encode a tendency for that firing pattern into their structure." And this happens in multiple layers of the brain.

In other words, there's some kind of iterative mechanism for higher-level layers to find which lower-level subnetworks are most in agreement about the input data, inducing learning.

Capsule-routing algorithms, proposed by Hinton and others, seek to implement precisely this idea, typically with some kind of expectation-maximization (EM) process.

There are quite a few implementations available on github:

https://github.com/topics/capsules

https://github.com/topics/em-routing

https://github.com/topics/routing-algorithm


I haven't heard of anyone talk about Hinton's capsule network concepts for some time. In 2017-18 it seemed exciting both because of Hinton but also because the pose/transformation sounded pretty reasonable. I don't know what would count as "explanation", but I'd be curious to hear any thoughts about why it seems they didn't really pan out. (Are there any tasks for which capsule methods are the best?)


If you take a birds eye view, fundamental breakthroughs don't happen that often. "Attention Is All You Need" paper also came out in 2017. It has now been 7 years without breakthrough at the same level as transformers. Breakthrough ideas can take decades before they are ready. There are many false starts and dead ends.

Money and popularity are orthogonal to pathfinding that leads to breakthroughs.


Well said


The short answer as to why capsule networks have "fallen out of fashion" is... Transformers.

Transformers came out at roughly the same time[a] and have proven to be great at... pretty much everything. They just work. Since then, most AI research money, effort, and compute has been invested to study and improve Transformers and related models, at the expense of almost everything else.

Many promising ideas, including routing, won't be seriously re-explored until and unless progress towards AGI seems to stall.

---

[a] https://arxiv.org/abs/1706.03762


I think this is a non-answer in some sense. Yes, transformers have been clearly very successful across a very wide range of tasks. But what about the approach taken in capsules is comparatively deficient?

Some kinds of explanations which I think are at least plausible (but IDK if any evidence exists for them):

- The attention structure in transformers allows any chunk to be learned to be important for any other chunk. And pretty quickly people tended towards these being pretty deep. By comparison, the capsule + routing structure (IIUC) came with a built-in kind of sparsity (from capsules at a level in the hierarchy being independent), and because the hierarchy was meant to align with composition, it often (I think) didn't have a huge number of levels? Maybe this flexibility + depth are key?

- Related to capsules being independent, an initial design feature in capsule networks seems to have been smaller model sizes. Perhaps this was at some level just a bad thing to reach for? I think at the time, "smaller models means optimization searches over a smaller space, which is faster to converge and requires less data" was still sort of in people's heads, and I think this view is pretty much dead.

- I've heard some people argue that one of the core strengths of transformers is that they support training in a way allows for maxing out available GPUs. I think this is mostly in comparison to previous language models which were explicitly sequential. But are capsule networks less conducive to efficient training?


It's hard to make a fair comparison, because there hasn't been anywhere near as much money, effort, or compute invested in trying to scale up routing methods.

Promising approaches are often ignored for reasons that have little to do with their merits. For example, Hinton, Bengio, and Lecun spent much of the 1990's and all of the 2000's in the fringes of academia, unable to get much funding, because few others were interested in or saw any promise in deep learning! Similarly, Katalin Karikó lost her job and spent almost two decades in obscurity because few others were interested or saw any promise in RNA vaccines!

Now, I'm not saying routing methods will become more popular in the future. I mean, who the heck knows?

What I'm saying is that promising approaches can fall out of favor for reasons that are not intrinsic to them.


Great summary. Thanks for the links. These are awesome.


The domain of Artificial Life is highly related and has had an ongoing conference series and journal going, might be worth mining for more inspiration:

https://en.wikipedia.org/wiki/Artificial_life https://direct.mit.edu/artl https://alife.org


FYI Evolutionary Algorithms have been an active area of research for decades.[1]

Among the many uses, they have been applied to ‘evolving’ neural networks.

Famously a guy whose name I can’t remember used to generate programs and mutations of programs.

My recommendation if you want to get into AI: avoid anything written in the last 10 years and explore some classics from the 70s

[1] https://en.m.wikipedia.org/wiki/Evolutionary_algorithm


I'm sure it's not who you're thinking of, but I can't miss an opportunity to mention Tom Ray and Tierra: https://tomray.me/tierra/whatis.html


In the Creatures artificial life / virtual pet series, the creatures have about 900 (maybe more in later versions) or so neurons. Each neuron is a little virtual machine that is designed in such a way that programs remain valid even with random mutation.


A friend of mine made this in-browser neural network engine that could run millions of multi-layer NNs in a simulated world at hundreds of updates per second and each network could reproduce and evolve. It worked in the sense that the networks exhibited useful and varied behaviors. However, it was clear that larger networks were needed for more complex behaviors and evolution just starts to take a lot longer.

https://youtu.be/-1s3Re49jfE?si=_G8pEVFoSb2J4vgS



There is the case of Blondie24, an evolutionary neural net, or genetic algorithm, which was able to develop a very strong checkers-playing capability by self-play with no human instruction. It was later extended to paly other games.

https://en.wikipedia.org/wiki/Blondie24


I read this book:

     <https://books.google.ca/books/about/Artificial_Intelligence_Through_Simulate.html?id=QMLaAAAAMAAJ>
in 1972. It was published in 1966.


Your recommendation to explore the classics is a good one. You can gain a deeper appreciation by studying these foundational works


> avoid anything written in the last 10 years

Why?


presumably because it's saturated with a monoculture, and the hope (rightly or wrongly), some of the other roads might lead to some alternative breakthrough.


I think this is over-simplified and possibly misunderstood. I haven't read the book this article references but if I am understanding the main proposal correctly then it can be summarised as "cortical activity produces spatial patterns which somehow 'compete' and the 'winner' is chosen which is then reinforced through a 'reward'".

'Compete', 'winner', and 'reward' are all left undefined in the article. Even given that, the theory is not new information and seems incredibly analogous to Hebbian learning which is a long-standing theory in neuroscience. Additionally, the metaphor of evolution within the brain does not seem apt. Essentially what is said is that given a sensory input, we will see patterns emerge that correspond to a behaviour deemed successful. Other brain patterns may arise but are ignored or not reinforced by a reward. This is almost tautological, and the 'evolutionary process' (input -> brain activity -> behaviour -> reward) lacks explanatory power. This is exactly what we would expect to see. If we observe a behaviour that has been reinforced in some way, it would obviously correlate with the brain producing a specific activity pattern. I don't see any evidence that the brain will always produce several candidate activity patterns before judging a winner based on consensus. The tangent of cortical columns ignores key deep brain structures and is also almost irrelevant, the brain could use the proposed 'evolutionary' process with any architecture.


While it does build on established concepts like Hebbian learning, I think theory offers a potentially insightful way of thinking about brain function


> I think this is over-simplified and possibly misunderstood.

I'm with you here. I wrote this because I wanted to drive people towards the book. It's incredible and I did it little justice.

> "cortical activity produces spatial patterns which somehow 'compete' and the 'winner' is chosen which is then reinforced through a 'reward'"

A slight modification: spatio-temporal patterns*. Otherwise you're dead on.

> 'Compete', 'winner', and 'reward' are all left undefined in the article.

You're right. I left these undefined because I don't believe I have a firm understanding of how they work. Here's some speculation that might help clarify.

Compete - The field of minicolumns is an environment. A spatio-temporal pattern "survives" when a minicolumn is firing in that pattern. It's "fit" if it's able to effectively spread to other minicolumns. Eventually, as different firing patterns spread across the surface area of the neocortex, a border will form between two distinct firing patterns. They "Compete" insofar as each firing pattern tries to "convert" minicolumns to fire in their specific pattern instead of another.

Winner - This has two levels. First, an individual firing pattern could "win" the competition by spreading to a new minicolumn. Second, amalgamations of firing patterns, the overall firing pattern of a cortical column, could match reality better than others. This is a very hand-wavy answer, because I have no intuition for how this might happen. At a high level, the winning thought is likely the one that best matches perception. How this works seems like a bit of a paradox as these thoughts are perception. I suspect this is done through prediction. E.g. "If that person is my grandmother, she'll probably smile and call my name". Again, super hand-wavy, questions like this are why I posted this hoping to get in touch with people who have spent more time studying this.

Reward - I'm an interested amateur when it comes to ML, and folks have been great about pointing out areas that I should go deeper. I have only a basic understanding of how reward functions work. I imagine the minicolumns as small neural networks and alluded to "reward" in the same sense. I have no idea what that reward algorithm is or if NNs are even a good analogy. Again, I really recommend the book if you're interested in a deeper explanation of this.

> the theory is not new information and seems incredibly analogous to Hebbian learning which is a long-standing theory in neuroscience.

I disagree with you here. Hebbian learning is very much a component of this theory, but not the whole. The last two constraints were inspired by it and, in hindsight, I should have been more explicit about that. But, Hebbian learning describes a tendency to average, "cells that fire together wire together". Please feel free to push back here but, the concept of Darwin Machines fits the constraints of Hebbian learning while still offering a seemingly valid description of how creative thought might occur. Something that, if I'm not misunderstanding, is undoubtedly new information.

> I don't see any evidence that the brain will always produce several candidate activity patterns before judging a winner based on consensus.

That's probably my fault in the retelling, check out the book: http://williamcalvin.com/bk9/index.htm

I think if you read Chapters 1-4 (about 60 pages and with plenty of awesome diagrams) you'd have a sense for why Calvin believes this (whether you agree or not would be a fun conversation).

> The tangent of cortical columns ignores key deep brain structures and is also almost irrelevant, the brain could use the proposed 'evolutionary' process with any architecture.

I disagree here. A common mistake I think we to make is assuming evolution and natural selection are equivalent. Some examples of natural selection: A diversified portfolio, or a beach with large grains of sand due to some intricacy of the currents. Dawkinsian evolution is much much rarer. I can only think of three examples of architectures that have pulled it off. Genes, and their architecture, are one. Memes (imitated behavior) are another. Many animals imitate, but only one species has been able to build architecture to allow those behaviors to undergo an evolutionary process. Humans. And finally, if this theory is right, spatiotemporal patterns and the columnar architecture of the brain is the third.

Ignoring Darwin Machines, there are only two architectures that have led to an evolutionary process. Saying we could use "any architecture" seems a bit optimistic.

I appreciate the thoughtful response.


Thanks for the considered reply.


I don't think it matters so much how the brain is made, what matters is the training data. And we obtain data by searching. Search is a great concept, it covers evolution, intelligence and creativity, it's also social. Search is discrete, recursive, combinatorial and based on some kind of language (DNA, or words, or just math/code).

Searching the environment provides the data brain is trained on. I don't believe we can understand the brain in isolation without its data engine and the problem space where it develops.

Neural nets showed that given a dataset, you can obtain similar results with very different architectures, like transformer and diffusion models, or transformer vs Mamba. The essential ingredient is data, architecture only needs to pass some minimal bar for learning.

Studying just the brain misses the essential - we are search processes, the whole life is search for optimal actions, and evolution itself is search for environment fitness. These search processes made us what we are.


What in the world

Most "diffusion models" use similar VAE to transformer backbone architectures. Diffusion isn't an architecture, it's a problem framing

As for the rest of this, I'm torn between liking the poetry of it and pointing out that this is kind of that thing where you say something like it's supposed to be a mind-blowing insight when it's well-known and pretty obvious. Most people familiar with learning theory already understand learning algorithms of any kind as a subset of probabilistic search algorithms with properties that make them responsive to data. The idea that the structure of the information processing system doesn't matter and there's just this general factor of learning capacity a thing has is... not well supported by the way in which research has progressed in the entire period of time when this has been relevant to most people? Sure, in theory any neural network is a general function approximator and could in theory learn any function it's complex enough to represent. Also, we can arrive at the solution to any computable problem by representing it as a number and guessing random numbers until we can verify a solution. Learning algorithms can almost be defined as attempts to do better search via structured empiricism than can be done with the assumption that structure doesn't matter. Like, sometimes multiple things work, sure. That doesn't mean it's arbitrary

TL;DR: Of course learning is a kind of search, but discovering structures that are good at learning is the whole game


Yeah, I really don’t understand this recently popular viewpoint that the algorithm doesn’t matter, just how much data you throw at it. It doesn’t seem to be based on anything more than wishful thinking.

One can apply Hutter search to solve just about any problem conceivable given the data and guess what—you’ll approach the optimal solution! The only downside is that this process will take more time than available in our physical universe.

I think people forget the time factor and how the entire field of computational complexity theory arose because the meta problem is not that we can’t solve the problem—it’s that we can’t solve it quickly enough on a timescale that matters to humans.

Current NN architectures are missing something very fundamental related to the efficiency of problem solving, and I really don’t see how throwing more data at them is going to magically convert an EXPTIME algorithm into a PTIME one. (I’m not saying NNs are EXPTIME; I’m saying that they are incapable of solving entire classes of problems that have both PTIME and EXPTIME solutions, as the NN architecture is not able to “discover” PTIME solutions, thus rendering them incapable of solving those classes of problems in any practical sense).


Also, one of the major classes of problem that gets solved and we view as "progress" in machine learning is framing problems. Like we couldn't define "draw a good picture" in a way we could actually assess well, GANs and Diffusion turn out to be good ways to phrase problems like that. In the former case, it creates a way to define the problem as "make something indistinguishable from an example pulled from this dataset" and in the latter case, "I've randomized some of these pixels, undo that based on the description"

The idea of "efficiency" and "progress" is this post-hoc rationalization that people who never understood the problem, pay people to understand the problem, apply to problems once they have a working solution in hand. It's a model that is inherently as dumb as a model can be, and the assumption it makes is that there is some general factor of progress on hard problems that can be dialed up and down. Sure, you can pay more scientists and probabilistically increase the rate at which problems are solved, but you can't predict how long it will take, how much failure it will involve, whether a particular scientist will solve a particular problem at all, whether that problem is even solvable in principle sometimes. Businesspeople and investors like models where you put in money and you get out growth at a predictable rate with a controllable timescale, and if this doesn't work you just kick it harder, and this ill fits most frontier research. Hell, it ill suits a lot of regular work.


> Sure, you can pay more scientists and probabilistically increase the rate at which problems are solved, but you can't predict how long it will take, how much failure it will involve, whether a particular scientist will solve a particular problem at all, whether that problem is even solvable in principle sometimes.

Fully agree, search is hard, unpredictable and expensive. Also a matter of luck, being at the right place and time, and observing something novel. That is why I put the emphasis of AI doing search, not just imitating humans.


Okay, but what does that mean? AI is a search process. Do you mean you want the AI to formulate queries? Test hypotheses? Okay. How? What does that mean? What we know how to do is to mathematically define objective functions and tune against them. What objective function describes the behavior you want? Is there some structural paradigm we can use for this other than tuning the parameters on a neural network through optimization toward an objective function? If so, what is it?

I'm sorry to be a little testy but what you've basically said is "We should go solve the hard problems in AI research". Dope. As an AI researcher I fully agree. Thanks. Am I supposed to clap or something?


Not "throwing more data at them" but having the AI discover things by searching. AI needs to contribute to the search process to graduate the parrot label.


> Of course learning is a kind of search, but discovering structures that are good at learning is the whole game

No, you missed the essential. I mentioned search in the context of discovery, or in other words expanding knowledge.

Training neural nets is also a search for the best parameters that fit the data, but it's secondary. Many architectures work, there have been a thousand variations for the transformer architectures and plenty of RNN-like approaches since 2017 when transformer was invented, and none of them is better than the current one or significantly worse.

Also, considering human population, the number of neurons in the brain, synapses and wiring are very different at micro level from person to person, yet we all learn. The difference between the top 5% and bottom 5% humans is small compared with other species, for example. What makes a big difference between people is education, in other words experiences, or training data.

To return to the original idea - AI that simply learns to imitate human text is capable only of remixing ideas. But an AI that actively explores can discover novel ideas, like AlphaZero and AlphaTensor. In both these cases search played a major role.

So I was generalizing the concept of "search" across many levels of optimization, from protein folding to DNA and human intelligence. Search is essential for progress across the stack. Even network architecture evolves by search - with human researchers.


>I don't think it matters so much how the brain is made, what matters is the training data.

I agree that training data is hugely important but I think it does matter how the brain is made. Structures in the brain are remarkably well preserved between species. Despite the fact that evolution loves to try different methods, if it can get away with it.

> Searching the environment provides the data brain is trained on. I don't believe we can understand the brain in isolation without its data engine and the problem space where it develops.

I completely agree and suspect we might be on the same page. What I find most compelling about the idea of Darwin Machines is the fact that it relies on evolution. In my opinion, true Dawkinsian evolution, is the most efficient search algorithm.

I'd love to hear you go deeper on what you mean by data engine and problem space. To (possibly) abuse those terms, I think evolution is the data engine. The problem space is fun and I love David Eagleman's description of the brain as sitting in a warm bath in a dark room trying to figure out what to do with all these electric shocks.

> Neural nets showed that given a dataset, you can obtain similar results with very different architectures, like transformer and diffusion models, or transformer vs Mamba. The essential ingredient is data, architecture only needs to pass some minimal bar for learning.

My understanding of neural nets, and please correct me if I'm wrong, is that they solve system-one thinking, intuition. As of yet, they haven't been able to do much more than produce an average of their training data (which is incredible). With a brute force approach they can innovate in constrained environments, e.g. move 37 (or so I'm told, I haven't played go :)). I haven't seen evidence that they might be able to innovate in open-ended environments. In other words, there's no suggestion they can do system-two thinking where time spent on a problem correlates with the quality of the answer.

> Studying just the brain misses the essential - we are search processes, the whole life is search for optimal actions, and evolution itself is search for environment fitness.

I completely agree. I even suspect that, in a few years, we'll see "life" and "intelligence" as synonymous concepts, just implemented in different mediums. At the same time, studying those mediums can be a blast.


I’ve been noodling on how to combine neural networks with evolution for a while. I’ve always thought that to do this, you need some sort of evolvable genetic/functional units, and have had trouble fitting traditional artificial neurons w backprop into that picture.

My current rabbit hole is using Combinatory Logic as the genetic material, and have been trying to evolve combinators, etc (there is some active research in this area).

Only slightly related to the author’s idea, its cool that others are interested in this space as well.


Then probably you know about NEAT (the genetic algorithm) by now. I'm not sure what has been tried in directly using combinatorical logic instead of NNs (do Hopfield networks count?), any references?

I've tried to learn simple look-up tables (like, 9 bits of input) using the Cross-Entropy method (CEM), this worked well. But it was a very small search space (way too large to just try all solutions, but still, a tiny model). I haven't seen the CEM used on larger problems. Though there is a cool paper about learning tetris using the cross-entropy method, using a bit of feature engineering.


I am familiar with NEAT, it was very exciting when it came out. But, NEAT does not use back propagation or single network training at all. The genetic algorithm combines static neural networks in an ingenious way.

Several years prior, in undergrad, I talked to a professor about evolving network architectures with GA. He scoffed that squishing two "mediocre" techniques together wouldn't make a better algorithm. I still think he was wrong. Should have sent him that paper.

IIRC NEAT wasn't SOTA when it came out, but it is still a fascinating and effective way to evolve NN architecture using genetic algorithms.

If OP (or anyone in ML) hasn't studied it, they should.

https://en.m.wikipedia.org/wiki/Neuroevolution_of_augmenting... (and check the bibliography for the papers)

Edit: looking at the continuation of NEAT it looks like they focused on control systems, which makes sense. The evolved network structures are relatively simple.


Maybe a key innovation would be to apply backpropagation to optimize the crossover process itself. Instead of random crossover, compute the gradient of the crossover operation.

For each potential combination, "learn" (via normal backprop) how different ways of crossover impacts on overall network performance. Then use this to guide the selection of optimal crossover points and methods.

This "gradient-optimized crossover" would be a search process in itself, aiming to find the best way to combine specific parts of networks to maximize improvement of the whole. It could make "leaps", instead of small incremental steps, due to the exploratory genetic algorithm.

Has anything like this been tried?


Thermodynamic annealing over a density parameter space


Fantastic speculation here, explains a lot, and has testable hypotheses.

For example, there should be a relationship between rate of learning and the physical subcolumns - we should be able to identify when a single column starts up / is fully trained / is overused

Or use AI to try to mirror the learning process, creating an external replica that makes the same decisions as the person

Marvin Minsky was spot on about the general idea 50 years ago, seeing the brain as a collection of 1000s of atomic operators (society of mind?)


> Fantastic speculation here, explains a lot, and has testable hypotheses.

Calvin is the man.

> For example, there should be a relationship between rate of learning and the physical subcolumns - we should be able to identify when a single column starts up / is fully trained / is overused

This sounds super interesting. Could you break down what you're thinking here?

> Marvin Minsky was spot on about the general idea 50 years ago, seeing the brain as a collection of 1000s of atomic operators (society of mind?)

I'm very much an amateur in this field and was under the impression that Minsky was trying to break it up, but was trying to specify each of those operations. What I find so enticing about Neural Darwinism is the lack of specification needed. Ideally, once you get the underlying process right, there's a cascade of emergent properties.

Using the example of a murmuration of starlings I picture Minsky trying to describe phase transitions between every possible murmuration state. On the other hand I see Neural Darwinism as an attempt to describe the behavior of a single starling which can then be scaled to thousands.

Let me know if that's super wrong. I've only read second hand descriptions of Minsky's ideas, so feel free to send some homework my way.


> I've only read second hand descriptions of Minsky's ideas, so feel free to send some homework my way.

Here you go: https://breckyunits.com/marvin-minsky.html

I think you are right in that Minsky was missing some important details in the branches of the tree, particularly around cortical columns, but he was old when Hawkins and Numenta released their stuff.

In terms of the root idea of the mind being a huge number of concurrent agents, I think he was close to the bullseye and it very much aligns with what you wrote.


Awesome post, thanks. I ordered society of mind.

Reminds me of when I took "The Philosophy of Cognitive Science" in college. The entire class was on AI. When I asked the professor why, she explained: "You don't understand something unless you can build it".

It's cool to learn that quote might have been because she's a fan of Minsky.

> In terms of the root idea of the mind being a huge number of concurrent agents, I think he was close to the bullseye and it very much aligns with what you wrote.

I think you're right here and I'd like to add a bit. One common mistake people make when thinking of evolution, is where in the hierarchy it takes place. In other words, they misidentify the agents by an order of magnitude.

For example, in biology I commonly see it taught that the individual is the subject of natural selection (or worse, the population).

Really, it's the gene. The beauty of evolution is that it can take an agent as simple as the gene and shape it into the litany of complex forms and functions we see all around us.

If evolution is at play in the brain, I suspect that Minsky's agents are the individual firing patterns. Like genes, the base of the hierarchy, the fundamental unit. Also like genes, they slowly build increasingly complex behaviors from the ground up. Starting before birth and continuing for most of our lives.


Right, the Selfish Gene is one of the best books I ever read.

There's also a paper I recently came across (https://warpcast.com/breck/0xea2e1a35) which talks about how causation is a two way street: low level nodes cause things in higher level nodes, but higher level nodes in turn cause things in lower level nodes.

In other words, just because genes have really been the drivers and our bodies just the vehicles, doesn't mean that's not cyclical (sometimes it could cycle to be the higher level ideas driving the evolution in lower level agents).

> I suspect that Minsky's agents are the individual firing patterns.

I like this idea. The biggest open question in my mind in regards to Minsky still is exactly on this: what physically is an agent? How many are there? My margin of error here is wild -- like 10 orders of magnitude.


Regarding Minsky: the most interesting thoughts I read about theories of a mind, are his books, namely: The Society of Mind and The Emotion Machine which should be more widely known.

More of Minsky's ideas on “Matter, Mind, and Models” are mentioned here: https://www.newyorker.com/magazine/1981/12/14/a-i

And let's not forget Daniel Dennett: In “Consciousness Explained,” a 1991 best-seller, he described consciousness as something like the product of multiple, layered computer programs running on the hardware of the brain. [...]

Quoted from https://www.newyorker.com/magazine/2017/03/27/daniel-dennett...


I read the followup:

Lingua ex Machina: Reconciling Darwin and Chomsky with the Human [2000]

https://www.amazon.com/Lingua-Machina-Reconciling-Darwin-Cho...

Completely changed my worldview. Evolutionary processes every where.

My (turrible) recollection:

Darwinian processes for comprehending speech, the process of translating sounds into phenomes (?).

There's something like a brain song, where a harmony signal echoes back and forth.

Competition between and among hexagonal processing units (what Jeff Hawkins & Numenta are studying). My paraphrasing: meme PvP F4A battlefield where "winning" means converting your neighbor to your faction.

Speculation about the human brain leaped from proto-language (noun-verb) to Chomsky language (recursively composable noun-verb-object predicates). Further speculation how that might be encoding in our brains.

Etc.


> These connections result in a triangular array of connected minicolumns with large gaps of unconnected minicolumns in between. Well, not really unconnected, each of these are connected to their own triangular array.

> Looking down on the brain again, we can imagine projecting a pattern of equilateral triangles - like a fishing net - over the surface. Each vertex in the net will land on a minicolumn within the same network, leaving holes over minicolumns that don't belong to that network. If we were to project nets over the network until every minicolumn was covered by a vertex we would project 50-100 nets.

Around this part I had a difficult time visualizing the intent here. Are there any accompanying diagrams or texts? Thanks for the interesting read!


http://williamcalvin.com/bk9/index.htm

I'd recommend just banging out chapters 1-4 of the book (~60 pages). Lot's of diagrams and I think you'll get the meat of the idea.

Thanks for the feedback!


This reminds me a little of Jeff Hawkins book, 1000 brain theory. His company numenta has done this kind of research and they have a mailing list. I'm not an expert but I've read Jeff's book and noodled at the mailing list


Hawkins' proposal is missing the key innovation that Calvin proposes, which is that learning take place by evolutionary means. But Hawkins' proposal does fit squarely within current popular ideas around predictive coding.

The key structures in Hawkins' architecture are cortical columns (CC). What his VP of Eng (Dileep George) did is to analyze Hawkins' account of the functionality of a CC, and then say that a CC is a module which must conform to a certain API, and meet a certain contract. As long as a module obeys the API and contract, we don't actually care how the CC module is implemented. In particular, it's actually not important that the module contain neurons. (Though the connections between the CCs may still have to look like axons or whatever, I don't know.)

Then Dileep George further figured out that there is an off the shelf algorithm than works perfectly for the CC module. He selected an algorithm which is based on statistical learning theory (STL).

STL based algorithms are an excellent choice for the CCs, IMNSHO. They are fast, theoretically sound, etc. They are also understood in great mathematical detail, so we can characterize _exactly_ what they can and can't do. So there is zero mystery about the capabilities of his system at the CC level. Note that in Hawkins' case, the STL based algorithms are used for pattern recognition.

Now Hawkin's proposal isn't just a single CC, it's a network of CC's all connected together in a certain way. My memory is a little hazy at this point, but as best I recall, his architecture should have no problem identifying sequences of patterns (as for sound), or spatial patterns across time (as for vision). And I bought his argument that these could be combined hierarchically, and that the same structure could also be used for playing back (outputting) a learned pattern, and for recognizing cross modal patterns (that is, across sensory modalities).

But is all this enough?

I say no. My reading of evolutionary epistemology suggests to me that pattern identification is insufficient for making and refuting conjectures in the most general sense. And ultimately, a system must be able to create and refute conjectures to create knowledge. Hawkins has a very weak story about creativity and claims that it can all be done with pattern recognition and analogy, but I am not convinced. It was the weakest part of the book. (pp 183-193)

I don't know if it's clear to you or not why pattern recognition is insufficient for doing general conjectures and refutations. If it's not clear, I should attempt to expand on that ...

The idea is, that it is not always possible to arrive at a theory by just abstracting a pattern from a data set. For example:

What set of data could Newton have looked at to conclude that an object in motion stays in motion? I suppose he knew of 7 planets that stayed in motion, but then he had millions of counter examples all around him. For a pattern recognition algorithm, if you feed it a million + 7 data points, it will conclude that objects in motion always come to a stop except for a few weird exceptions which are probably noise.


This is an awesome write up. I especially love the Newton analogy. Thanks.


I ordered the book and I'm checking out the website rn. Looks awesome. Thanks a ton for sharing!


my impression of hawkins from a distance is that he can reproduce the success of the current orthodoxy, but is always a few years behind sota.


Correction: it is generally accepted that DNA was confirmed as genetic material in the Hershey-Chase experiment (https://en.m.wikipedia.org/wiki/Hershey%E2%80%93Chase_experi...), which predates the determination of the structure of dna by about a year


The image of the flattened out brain could use some illustrations, or more specific instructions on what we should be visualising.

> First, if you look at a cross-section of the brain (eye-level with the table)

I thought it was flat on the table? Surely if we look at it side-on we just see the edge?

Without a clear idea of how to picture this, the other aspect (columns) doesn't make sense either.


There's lots of room for cross-pollination between bio/life sciences and ML/AI. One key insight is the importance of what you pick as your primary representation of data (is everything a number, a symbolic structure, a probability distribution, etc). I believe a lot of these bio-inspired approaches over-emphasize the embodied nature of intelligence and how much it needs to be situated in space and time, which downplays all the sub-problems that need to be solved in other "spaces" with less obvious "spatiotemporal" structure. I believe space and time are emergent, at least for the purposes of defining intelligence, and there are representations where both space and time arise as dimensions of their structure and evolution.


The book "Cerebral Code" is made available for free by the author on his website: http://williamcalvin.com/bk9/

For a more modern treatment on the subject, read this paper: An Attempt at a Unified Theory of the Neocortical Microcircuit in Sensory Cortex https://www.researchgate.net/publication/343269087_An_Attemp...


I took more notes on this blog post than anything else I've read this month.


Man, this has me grinning like an idiot. Thanks.


This project employs a Darwinian approach. Initially, it was an experiment in traditional program and user interface generation that incorporated evolutionary feedback into the mutation process. A combination of PG and AL. It has achieved some success with small programs and is now exploring the potential combination of LLMs

https://youtu.be/sqvHjXfbI8o?si=7qwpc15Gn42mUnKQ&t=513


I don't think this is true as stated. Evolutionary algorithms are not the most efficient way to do most things because they, handwavily, search randomly in all directions. Gradient descent and other gradient-based optimizers are way way faster where we can apply them: the brain probably can't do proper backprop for architectural reasons but I am confident it uses something much smarter than blind evolutionary search.


The OP is not about evolutionary algorithms in the usual sense (random mutation and selection over many generations).

It's about mechanisms in the brain that plausibly evolved over time.


> A Darwin Machine uses evolution to produce intelligence. It relies on the same insight that produced biology: That evolution is the best algorithm for predicting valid "solutions" within a near infinite problem space.

It seems to be suggesting that neuron firing patterns (or something like that?) are selected by internal evolutionary processes.


> Evolutionary algorithms are not the most efficient way to do most things because they, handwavily, search randomly in all directions.

I think we agree and would love to dive a bit deeper with you here. My background is in biology and I'm very much an enthusiastic amateur when it comes to CS.

When I first read about Darwin Machines, I looked up "evolutionary algorithms in AI", thought to myself "Oh hell ya, these CS folks are on it" and then was shocked to learn that "evolutionary algorithms" seemed to be based on an old school conception of evolution.

First, evolution is on your team, it hates random search. In biology point mutations are the equivalent of random search, and organisms do everything in their power to minimize them.

As I said in the article, If we were building a skyscraper and someone told us they wanted to place some bricks at random angles "so that we might accidentally stumble upon a better design" we would call them crazy. And rightfully so.

Evolution still needs variation though, and it gets it through recombination. Recombination is when we take traits that we know work, and shuffle them to get something new. It provides much more variation with a much smaller chance of producing something that decreases fitness.

It took me a while to grok how recombination produces anything novel, if we're shuffling existing traits how do we get a new trait? I still don't have a "silver-bullet" answer for this but I find that I usually visualize these concepts too far up the hierarchy. When I think of traits I think of eye color or hair color (and I suspect you do to). A trait is really just a protein (sometimes not even that) and those examples are the outliers where a single protein is responsible.

It might be better to think of cancer suppression systems, which can be made up of thousands of proteins and pathways. They're like a large code base that proofreads. Imagine this code base has tons of different functions for different scenarios.

Point mutations, what evolution hates, is like going into that code base and randomizing some individual characters. You're probably going to break the relevant function.

Recombination, what evolution loves, is like going in and swapping two functions that take the same input, produce the same output, but are implemented differently. You can see how this blind shuffling might lead to improvements.

How evolution creates new functions is a much more difficult topic. If you're interested, I recommend "The Selfish Gene". It's the best book I've ever read.

>Gradient descent and other gradient-based optimizers are way way faster where we can apply them

The second point is based on my (limited) understanding of non-biology things. Please point me in the right direction if you see me making a mistake.

Gradient descent etc. are way faster when we can apply them. But I don't think we can apply them to these problems.

My understanding of modern machine learning is that it can be creative in constrained environments. I hear move 37 is a great example but I don't know enough about go to feel any sort of way about it. My sense is: if you limit the problem space gradient decent can find creative solutions.

But intelligence like you or I's operates in an unconstrained problem space. I don't think you can apply gradient descent because, how the heck could you possibly score a behavior?

This is where evolution excels as an algorithm. It can take an infinite problem space and consistently come up with "valid" solutions to it.

>the brain probably can't do proper backprop for architectural reasons but I am confident it uses something much smarter than blind evolutionary search.

I think Darwin Machines might be able to explain "animal intelligence". But human intelligence is a whole other deal. There's some incredible research on it that is (as far as I can tell) largely undiscovered by AI engineers that I can share if you're interested.


> When I first read about Darwin Machines, I looked up "evolutionary algorithms in AI", thought to myself "Oh hell ya, these CS folks are on it" and then was shocked to learn that "evolutionary algorithms" seemed to be based on an old school conception of evolution.

I think a lot of the genetic algorithms people do implement recombination-like things. Most of the things operated on aren't really structured like genomes so it makes less sense there.

> But intelligence like you or I's operates in an unconstrained problem space. I don't think you can apply gradient descent because, how the heck could you possibly score a behavior?

> This is where evolution excels as an algorithm. It can take an infinite problem space and consistently come up with "valid" solutions to it.

Evolutionary search also relies on scoring. Genetic algorithms on computers hardcode a "fitness function" to determine what solutions are good and should be propagated and biological evolutionary processes are implicitly selecting on "inclusive genetic fitness" or something. You can't apply gradient-based optimizers directly to all of these, though, because they are not (guaranteed to be) differentiable. There are lots of ways to optimize against nondifferentiable functions in smarter ways than evolutionary search, and these come under "reinforcement learning", which does work but is generally more annoying than (self-)supervised algorithms.

> I think Darwin Machines might be able to explain "animal intelligence". But human intelligence is a whole other deal. There's some incredible research on it that is (as far as I can tell) largely undiscovered by AI engineers that I can share if you're interested.

As far as I know human brains are more or less a straight scaleup of smaller primate brains.


I think in some ways by considering the brain as a Darwin Machine, we can explore new dimensions of how our minds work


Nitpick: lots of text descriptions of visual patterns - this article could use at least 5 visual aid images.


The book provides a ton. I'll write another version that follows the book more closely and uses them. Thanks for the feedback.


This post analogizes between a specific theory of human intelligence and a badly caricatured theory of evolution. It feels like better versions of the arguments for Darwin Machines exist that would not: a) require an unsupportable neuron-centric view of evolution, and b) don't view evolution through the programmers lens.

> Essentially, biology uses evolution because it is the best way to solve the problem of prediction (survival/reproduction) in a complex world.

1. This is anthropocentric in a way that meaningfully distorts the conclusion. The vast majority of life on earth, whether you count by raw number, number of species, weight, etc. do not have neurons. These organisms are of course, microbes (viruses and prokaryotes) and plants. Bacteria and viruses do not 'predict' in the way this post speaks of. Survival strategies that bacteria use (that we know about and understand) are hedging-based. For example, some portion of a population will stochastically switch certain survival genes on (e.g. sporulation, certain efflux pumps = antibiotic resistance genes) that have a cost benefit ratio that changes depending on the condition. This could be construed as a prediction in some sense: the genome that has enough plasticity to allow certain changes like this will, on average, produce copies in a large enough population that enable survival through a tremendous range of conditions. But that's a very different type of prediction than what the rest of the post talks about. In short, prediction is something neurons are good at, but it's not clear it's a 'favored' outcome in our biosphere.

> It relies on the same insight that produced biology: That evolution is the best algorithm for predicting valid "solutions" within a near infinite problem space.

2. This gets the teleology reversed. Biology doesn't use anything, it's not trying to solve anything, and evolution isn't an algorithm because it doesn't have an end goal or a teleology (and it's not predicting anything). Evolution is what you observe over time in a population of organisms that reproduce without perfect fidelity copy mechanisms. All we need say is that things that reproduce are more likely to be observed. We don't have to anthropomorphize the evolutionary process to get an explanation of the distribution of reproducing entities that we observe or the fact that they solve challenges to reproduction.

> Many people believe that, in biology, point mutations lead to the change necessary to drive novelty in evolution. This is rarely the case. Point mutations are usually disastrous and every organism I know of does everything in its power to minimize them. Think, for every one beneficial point mutation, there are thousands that don't have any effect, and hundreds that cause something awful like cancer. If you're building a skyscraper, having one in a hundred bricks be laid with some variation is not a good thing. Instead Biology relies on recombination. Swap one beneficial trait for another and there's a much smaller chance you'll end up with something harmful and a much higher chance you'll end up with something useful. Recombination is the key to the creativity of evolution, and Darwin Machines harness it.

3. An anthropocentric reading of evidence that distorts the conclusion. The fidelity (number of errors per cycle per base pair) of the polymerases varies by maybe 8 orders of magnitude across the tree of life. For a review, see figure 3 in ref [1]. I don't know about Darwin Machines, but the view that 'recombination' is the key to evolution is a conclusion you would draw if you examined only a part of the tree of life. We can quibble about viruses being alive or not, but they are certainly the most abundant reproducing thing on earth by orders of magnitude. Recombination doesn't seem as important for adaptation in them as it does in organisms with chromosomes.

4. There are arguments that seem interesting (and maybe not incompatible with some version of the post) that life should be abundant because it actually helps dissipate energy gradients. See the Quanta article on this [0].

[0] https://www.quantamagazine.org/a-new-thermodynamics-theory-o... [1] Sniegowski, P. D., Gerrish, P. J., Johnson, T., & Shaver, A. (2000). The evolution of mutation rates: separating causes from consequences. BioEssays, 22(12), 1057–1066. doi:10.1002/1521-1878(200012)22:12<1057::aid-bies3>3.0.co;2-w


This strongly reminds me of the algorithm used by swarming honeybees (if anyone's interested I'd highly recommend reading honeybee democracy). I reckon there's something to this.

I might have a go implementing something along these lines.


The title of the referenced book by Erwin Schrödinger is “what is life”, I believe.

https://archive.org/details/whatislife0000erwi


Thanks for pointing this out. I’ll change it when I’m at my computer.


I'm obsessed with the idea of Darwin Machines (and I think you should be too).

I've been tinkering with the idea in python but I just don't have enough ML experience.

If you, or anyone you know, is interested in Darwin Machines please reach out!


It is the focus of my long-term crazy side project. https://youtu.be/sqvHjXfbI8o?si=7qwpc15Gn42mUnKQ&t=513


Thank-you. Was wondering about your thoughts on emotions. Are they merely byproducts of the chemical evolution of the brain, or are they emergent artifacts of intelligence?

A system cannot feel, but we can map neurochemistry as mechanistically as any other process. It would be interesting to discover whether a "pure brain" exists, or whether even its inputs, when considered in whole, colored its nature.


Aren't emotions just the brain's way of experiencing the pain/pleasure feedback system for non-physical stimuli?


Yes, feedback for good and bad. Usually we want to control or ignore it, by conscious training or its negative associations.


Honestly, no idea.

I could imagine a Darwin Machine being a "pure brain", as you put it. While we have emotions because our brains built a pure brain atop an existing and messy infrastructure.

Or, emotions could just be the subjective experience of thoughts competing.

Calvin goes deeper into things like this in the book, but I suspect emotions help intelligence to some extent insofar as they provide environmental change. It's good for the long term health of an ecosystem to shake things up so that nothing gets too stagnant.


Thank-you for your reply. I think it would be interesting to see whether things get complex enough to where, necessarily running on its own and outpacing our monitoring of it, there will be a need for a governing module that the machine identifies as, well, our emotions.

Fear, authority, or what it identifies it as. I assume it will need to diagnose and repair itself, so will need to reflect. Who knows, maybe just fiction.


There is a lot of quibbling over details, but this is a 1-2 page high level elevator pitch, so will have some things glossed over. To that end, it seems like some interesting concepts for further exploration.


Oh dude this is so cool. I think you’re dead right.

If you’ll pardon some woo, another argument I see in favour of message passing/consensus, is that it “fits” the self similar nature of life patterns.

Valid behaviours that replicate and persist, for only the reason that they do.

Culture, religion, politics, pop songs, memes… “Egregore” comes to mind. In some ways “recombination” could be seen as “cooperation”, even at the level of minicolumns.

(Edit: what I mean to say is that it kinda makes sense that the group dynamics between constituent units of one brain would be similar in some way to the group dynamics you get from a bunch of brains)


isn't this the same as genetic algorithms ?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: