Hacker News new | past | comments | ask | show | jobs | submit login

Google's Imagen was trained on about as many images as a 6 year old would have seen over their lifetime at 24fps and a whole lot more text. It can draw a lot better and probably has a better visual vocabulary but is also way outclassed in many ways.

Paucity of the stimulus is a real problem and may mean our starting point architecture from genetics has a lot of learning built in than just a bunch of uninitialized weights randomly connected. A newborn animal can often get up and walk right away in many species.

https://www.youtube.com/watch?v=oTNA8vFUMEc

Humans have a giant head at birth and muscles too weak, but can swim around like little seals pretty quickly after birth.




Definitely. I do think video is much more important than images, because video implicitly encodes physics, which is a huge deal.

And, as you say, there are probably some structural/architectural improvements to be made in the neural network as well. The mammalian brain has had a few hundred million years to evolve such a structure.

It also remains unclear how important learning causal influence is. These networks are essentially "locked in" from inception. They can only take the world in. Whereas animals actively probe and influence their world to learn causality.


The mammalian brain have had a few hundred million years to evolve neural plasticity [1] which is the key function missing in AI. The brain’s structure isn’t set in stone but develops over one’s lifetime and can even carry out major restructuring on a short time scale in some cases of massive brain damage.

Neural plasticity is the algorithm running on top of our neural networks that optimizes their structure as we learn so not only do we get more data, but our brains get better tailored to handle that kind of data. This process continues from birth to death and physical experimentation in youth is a key part of that development, as is social experimentation in social animals.

I think “it remains unclear” only to the ML field, from the perspective of neuroscientists, current neural networks aren’t even superficially at the complexity of axon-dendrite connections with ion channels and threshold potentials, let alone the whole system.

A family member’s doctoral thesis was on the potentiation of signals and based on my understanding if it, every neuron takes part in the process with its own “memory” of sorts and the potentiation she studied was just one tiny piece of the neural plasticity story. We’d need to turn every component in the hidden layers of a neural network into it’s own massive NN with its own memory to even begin to approach that kind of complexity.

[1] https://en.m.wikipedia.org/wiki/Neuroplasticity


> our starting point architecture from genetics has a lot of learning built in

I don't doubt that evolution provided us with great priors to help us be fast learners, but there are two more things to consider.

One is scale - the brain is still 10,000x more complex than large language models. We know that smaller models need more training data, thus our brain being many orders of magnitude larger than GPT-3 naturally learns faster.

The second is social embedding - we are not isolated, our environment is made of human beings, similarly an AI would need to be trained as part of human society, or even as part of an AI society, but not alone.


> Google's Imagen was trained on about as many images as a 6 year old would have seen over their lifetime at 24fps

The six year old has the advantage of being immersed in a persistent world where images have continuity and don’t jump around randomly. For example infants learn very quickly that most objects stay put even when they aren’t being observed. In contrast a dataset of images on the internet doesn’t really demonstrate how the world works.


> It can draw a lot better

Drawing involves taking a mental image and converting it into a sequence of actions that replicate the image on a physical surface. Imagen does not do that. I think the images it generates are more analogous to the image a person creates in their mind before drawing something.


I was too loose with that. There is CLIPDraw and others that operate at the stroke/action level but haven't been trained on as much data. Still impressive at the time:

https://www.louisbouchard.ai/clipdraw/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: