> How is the human brain able to achieve a higher level of success with 1% of the data?
The most obvious answer is "the human brain uses a shit-ton more compute", for 18+ years as well.
We spend data, which we have in abundance, to save on compute, which we do not. Even at the most generous low-end estimates of the human brain's computing power, we are only barely there; on the high-end estimates that people in love with the ineffable mysteries of the brain love to cite, we are multiple orders of magnitude away from even the biggest supercomputers matching the brain. So no matter which way you slice it, we are extremely compute-poor.
Feeding a lot of data through an extremely lightweight optimizer like first-order SGDs is one way to cope with lacking compute: https://www.gwern.net/docs/ai/scaling/2013-bottou.pdf Bottou asks why (even in 2013!) is SGD so hard to dethrone when we can empirically see plenty of optimizers like second-order gradient descent algorithms which can beat SGD quite solidly? His observation is that while they are much better than SGD in terms of iterations or _n_, they lose in compute/wallclock because SGD can just go-brrrr through the data much faster than they can.
Yeah, there are ~100B neurons, ~1Q synapses, but how much compute is the brain actually using over time?
Some quick googling gives this:
- Generation of an action potential seems to use ~2.5×10^−7 J [0]
- The brain consumes around 20W during normal activity
This seems to imply that there are around 8×10^7, call it 10^8, activations per second [1].
Apparently, the average neuron has 1000 synapses. Let's say each synapse requires 10 mulacc operations per activation. Doing that math gives about 10^12 FLOPs/s [2].
Integrate that over 18 years, and you get roughly 5.7×10^20 FLOPs [3].
PaLM required 2.56×10^24 FLOPs to train [4]. So, we have (way more than) enough compute, we're just not using it efficiently. We're wasting a lot of FLOPs on dense matrix multiplication.
There's plenty of wiggle room in these calculations. I checked over the math, but I'd appreciate if someone would let me know if I've missed something.
There is a long history of connectionist attempts trying to ballpark the brain compute to constrain AI timelines, going back to von Neumann/Turing/Good. The most recent one would be https://www.openphilanthropy.org/brain-computation-report You can see in Figure 1 that your 10^12 steady state is the very low end. If you're interested in seeing where your envelope estimate differs from the others, well, it has the references.
The most obvious answer is "the human brain uses a shit-ton more compute", for 18+ years as well.
We spend data, which we have in abundance, to save on compute, which we do not. Even at the most generous low-end estimates of the human brain's computing power, we are only barely there; on the high-end estimates that people in love with the ineffable mysteries of the brain love to cite, we are multiple orders of magnitude away from even the biggest supercomputers matching the brain. So no matter which way you slice it, we are extremely compute-poor.
Feeding a lot of data through an extremely lightweight optimizer like first-order SGDs is one way to cope with lacking compute: https://www.gwern.net/docs/ai/scaling/2013-bottou.pdf Bottou asks why (even in 2013!) is SGD so hard to dethrone when we can empirically see plenty of optimizers like second-order gradient descent algorithms which can beat SGD quite solidly? His observation is that while they are much better than SGD in terms of iterations or _n_, they lose in compute/wallclock because SGD can just go-brrrr through the data much faster than they can.