Hacker News new | past | comments | ask | show | jobs | submit login

The hypothesis that you can't learn some things from text - you need real life experience, is intuitive and I used to think it's true. But there are interesting results from just a few days ago saying that text by itself is also enough:

> We test a stronger hypothesis: that the conceptual representations learned by text only models are functionally equivalent (up to a linear transformation) to those learned by models trained on vision tasks. Specifically, we show that the image representations from vision models can be transferred as continuous prompts to frozen LMs by training only a single linear projection.

Linearly Mapping from Image to Text Space - https://arxiv.org/abs/2209.15162




The claim isn’t that you can’t learn it from text, but rather that this is why models require so much text to train on - because they’re learning the stuff that humans learn from video.


The key issue is learning effort (such as energy vs time). Congenitally deaf-blind humans with no accompanying mental disabilities as a shared cause can learn as children just fine without any video or sound from comparatively low bandwidth channels like proprioception and touch.

Another issue is what we really care about is scientific reasoning and there, if anything, nature has given an anti-bias, at least at the level of interfacing with facts. People aren't born biased towards learning Metric Tensors and Christoffell Symbols but it takes only a few years at a handful of hours a day using a small number of joules for many humans to get it (I'm counting from all grade school prerequisites vs GPUs watts x time). Much fewer for genius children.


Great comment. Something I hadn’t considered. So a question might be, how much data is touch and spatial reasoning?


And how many Joules of energy are consumed by each of them?


They're also learning all the things we learned through evolution (a rather data-inefficient process.)

People's brains are wired for language, it's natural we have a head-start.


Im testing this argument out, but doesnt this apply to all tasks, not just language? I can learn to paint from scratch in what like 300 attempts? 1000 attempts? It takes far more examples to train a guided diffusion model. I'd struggle to believe that our brains are hardwired for painting


Could be something like:

1. hardwired for creating visual models of 3d / 2d space

2. hardwired for fine-grained hand motor movements.

So the learning is just combining those two skills into the ability to make the brush strokes needed to output the model onto paper.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: