Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It is ultimately a hardware problem. To simplify it greatly, an LLM neuron is a single input single output function. A human brain neuron takes in thousands of inputs and produces thousands of outputs, to the point that some inputs start being processed before they even get inside the cell by structures on the outside of it. An LLM neuron is an approximation of this. We cannot manufacture a human level neuron to be small and fast and energy efficient enough with our manufacturing capabilities today. A human brain has something like 80 or 90 billion of them and there are other types of cells that outnumber neurons by I think two orders of magnitude. The entire architecture is massively parallel and has a complex feedback network instead of the LLM’s rigid mostly forward processing. When I say massively parallel I don’t mean a billion tensor units. I mean a quintillion input superpositions.

And the final kicker: the human brain runs on like two dozen Watts. An LLM takes a year of running on a few MW to train and several KW to run.

Given this I am not certain we will get to AGI by simulating it in a GPU or TPU. We would need a new hardware paradigm.



A bee is an autonomous walking, climbing, and flying drone that investigates its environment, collects resources, builds structures, and coordinates with other drones.

We're totally incapable of building an AI that can do anything resembling that. We're still at the phase where robots walking on rough terrain without falling over remains a bit impressive.

I doubt the limitation is that we can't produce enough raw compute to replace a single bee.


I’m in agreement with you in that raw compute isn’t really the only missing piece, but I’m not in agreement we have enough compute to fully simulate even simple insect brains


I feel like we are talking past one another here. I disagree that all the computer processors in the world combined don't have enough raw processing power to simulate a single bee brain. That to me is an absurd idea.

Now if you meant we don't know enough about bees to actually model what its brain does and create a processor to actually process an artificial bee brain, then yes, we don't have that ability yet.

----

Aside, when I watched Netflix show black mirror such as the grain episode, I always kept getting stuck at how are they lowering this tiny device... What kind of battery technology works here was my question even though this is science fiction.


> I disagree that all the computer processors in the world combined don't have enough raw processing power to simulate a single bee brain. That to me is an absurd idea.

A bee weighs 100mg. If they are 5% brain, their brain weighs 5mg. 5mg of carbon is 2.5 * 10^20 atoms.

The largest supercomputers, at hundreds of trillions of transistors, come in at 2-5 * 10^14 transistors.

I don't think that combining all the computers in the world would give us 1 transistor per atom in a bee brain, and I additionally have to imagine transistors are incapable of simulating an atom by multiple orders of magnitude (i.e. 1 atom would require > 1 * 1 ^ 4 transistors) in realtime.

So, I would argue that our level of compute is still insufficient to simulate even a simple insect brain in realtime. Perhaps you could compute 1 second of bee thinking in an hour using all our compute.

And then, of course, comes into play that we have no idea how to simulate an atom in full fidelity.


> A bee weighs 100mg. If they are 5% brain, their brain weighs 5mg. 5mg of carbon is 2.5 * 10^20 atoms.

Any rational characterization of the problem has left the building here.

Nobody wants to simulate any creature's brain or nervous system atom for atom.

Most of what atoms do in any cell, including neurological cells, is operate a vast number of complex but common survival systems. And the very small fraction supporting specifically neurological behavior, do not meaningfully contribute at the level where specifics of individual atoms matter, but at a level many orders of magnitude higher in scale.


We’re talking about whether or not humanity has enough compute to simulate a bee brain, how can us having orders of magnitude less transistors than there are atoms in the system we wish to simulate possibly be hand waved away? A single transistor doesn’t do much, regardless of how much you wish to baldly assert atoms are supposedly trivial.


Perhaps you are meaning to compare transistors with neurons? If we do that, then yes, a transistor is a lot simpler than a neuron. But transistors are also insanely fast compared to neurons, so transistors do far FAR more, per unit of time.

A bee brain might have a million neurons, operating at maximum speed of about 250 Hz.

10^6 neurons x 250 Hz

= 2.5 x 10^8 Hz for a bee brain.

We can easily model that many neurons with a trillion transistors, operating at 100 Ghz. (If that sounds fast to you, keep in mind that CPU clock speeds account for many transistors switching in series.)

10^12 transistors x 10^11 Hz

= 10^23 Hz for a CPU

That is a factor of 4 x 10^14 more powerful.

So yeah, the only problem for modeling a bee brain is identifying the organization of its neurons.

Nature does a lot with a little. Trillions of bee life years went into designing a really efficient bee brain.

---

Atoms:

A bee brain might have 10^20 atoms. Atoms interact at speeds that are far beyond anything we are talking about here. But they don't "switch", they bounce around every which way and slowly end up reacting when they connect with the right conditions. This is called Brownian motion and its not computation, its natures way of using the chaos produced by heat to give compounds so many chances to find their right context that they eventually do.

While we could not easily model that many atoms, nobody wants to (with regard to bees). Atoms are neither the "transistors" or the neurons of a bee brain.


Oh yeah, you're right, we have 4 * 10^14 more compute than an insectoid brain in real time. Come the fuck off it.

This is the same guy who said rational characterizations had left right before waving away the complexity of a trillion atoms. Painful to imagine how smarmy you are.


Well you have left your written record of (1) persistently confusing atoms with neurons, (2) an inability to explain where your confusion comes from outside of just repeating it, along side (3) snide responses to someone who in good faith took the time to explain the difference to you.

One more time:

Not understanding something is not the same as needing high compute. And the amount of compute we have today is colossal.

Simulating a high level system (neurons or transistors) is not the same as simulating their low level implementations (all the atoms and electrons in either of them).

Some large LLMs (interactive summaries of a large percentage of the entire human race's verbalized knowledge at moderate depth), can be run on a single MacStudio M3 Ultra with 512GB of RAM.

Modeling a bee brain will require a tiny fraction of that computationally.

> Most of what atoms do in any cell, including neurological cells, is operate a vast number of complex but common survival systems. And the very small fraction supporting specifically neurological behavior, do not meaningfully contribute at the level where specifics of individual atoms matter, but at a level many orders of magnitude higher in scale.


I don’t see where I reference neurons, care to quote me rather than adding misleading citation looking (1) strings to your text?

Indeed, looking towards full compute simulation of a physical system, I reference one of its smallest physical components - atoms - and draw an analogy to the transistor in compute. You make no argument for why an atom needn’t be simulated in a system but nonetheless hand wave it away, lifting yourself to neurons as if their behavior is well understood, finally acting as if clock speed of a transistor is somehow related to the unsourced “clock speed” of a neuron.

So, to me it seems pretty transparently obvious you’ve failed to grasp the two topics in play here - whether there is enough brute force compute available, and what models of simulating a system like a brain exist - and in your confusion, went ahead and self aggrandized your own typing to cast out the conversation that played out before you joined.

I encourage you to not respond, and instead to use that energy to read someone else’s argument from a steel man perspective rather than this diminishing arrogance you display.


> I don’t see where I reference neurons, care to quote me rather than adding misleading citation looking (1) strings to your text?

A brain operates at the level of neurons, not individual atoms.

The vast majority of atoms in the brain are there to do non-cognitive things.

The tiny fraction actually doing neural things are insanely redundant.

You can simulate a 1 kg ball dropping from 100 meters, to time its collision with the Earth, by simulating all its atoms. But the net mass of the ball is what matters. Not the individual atom masses and movements.

Likewise, you can simulate a brain by simulating all its atoms. But the net movements of transmitters are what matter, not all those individual atomic movements.

Neurons and transistors both operate on net signals precisely so they can be insensitive to individual atomic behaviors. If they were sensitive to individual atoms, they would be noise generators instead of reliable information processors.

And because they are designed to be insensitive to individual atomic behaviors, we don't have to model them at the atomic level.

(Which is impossible anyway. The quantum field equations for even two interacting atoms are complex. There is no visible future in which the atoms of a single neuron could all be simulated accurately, much less a brain. Even then, because of quantum noise, the atomic model wouldn't be any more accurate.)

--

My apologies on being so blunt at the beginning of this conversation. That was unnecessary.


You’re not trying to simulate a brain atom for atom. Individual atoms don’t do much on their own. Even if you did do that, you’d really need to simulate electron flow, which is whole other level.

Now, simulating what outputs the bees brain yields from a set of stimuli — that could be done. If it could be done as fast as a brain is a whole other question, and I’m not sure of that answer.


I think the problem here is the physical hardware that will navigate and collect information from this environment? In this case, a biological robot makes more sense than a mechanical/electronic one. If you go down that route, though, the best AI will be a human brain. We have been training and selecting these for quite a while now.


But can a bee create a video of Will Smith eating pasta?


On the other hand, a large part of the complexity of human hardware randomly evolved for survival and only recently started playing around in the higher-order intellect game. It could be that we don't need so many neurons just for playing intellectual games in an environment with no natural selection pressure.

Evolution is winning because it's operating at a much lower scale than we are and needs less energy to achieve anything. Coincidentally, our own progress has also been tied to the rate of shrinking of our toys.


Evolution has won so far because it had a four billion year head start. In two hundred years, technology has gone from "this multi-ton machine can do arithmetic operations on large numbers several times faster than a person" to "this box produces a convincing facsimile of human conversation, but it only emulates a trillion neurons and they're not nearly as sophisticated as real ones."

I do think we probably need a new hardware approach to get to the human level, but it does seem like it will happen in a relative blink of an eye compared to how long the brain took.


> Evolution has won so far because it had a four billion year head start. In two hundred years, technology has gone from

I dunno, whenever I leave the silicon technology alone with plenty of power and cooling, nothing changes. :p

If the effect requires the involvement of swarms of ancient nanobots, then maybe that's the hardware and software that really deserves the credit.


But we don't even need a human brain. We already have those, they take months to grow, take forever to train, and are forever distracted. Our logic-based processes will keep getting smaller and less power hungry as we figure out how to implement them at even lower scales, and eventually we'll be able to solve problems with the same building blocks as evolution but in intelligent ways, of which LLMs will likely only play a minuscule part of the larger algorithms.


I think current LLMs are trying to poorly emulate several distinct systems.

They're not that great at knowledge (and we're currently wasting most of the neurons on memorizing common crawl, which... have you looked at common crawl?)

They're not that great at determinism (a good solution here is that the LLM writes 10 lines of Python, which then feed back into the LLM. Then the task completes 100% of the time, and much cheaper too).

They're not that great at complex rules (surprisingly good actually, but expensive and flakey). Often we are trying to simulate what are basically 50 lines of Prolog with a trillion params and 50KB of vague English prompts.

I think if we figure out what we're actually trying to do with these things, then we can actually do each of those things properly, and the whole thing is going to work a lot better.


>But we don't even need a human brain. We already have those, they take months to grow, take forever to train

This is a weird argument considering LLMs are composed of the output of countless hours of human brains. That makes LLMs, by definition, logarithmically worse at learning.


Not all artificial neurons are LLMs. Machine learning can be applied to any kind of large data set, not just human prose, and will start finding useful patterns before a human brain has time to learn how many fingers it has.


The brain is logical by design, it implements logic?


We use logic to design our technology, but evolution does it by literally shaking all the atoms into place, no design involved. Our brains were created randomly.


It appears logical, but it falls under the same threshold. We say it has function, which is logical, but its used for something we could or may not have planned for. That's evolution. The logic is a pretext.


If it quacks like a Markov chain...


That's unrelated to evolution.


To be fair to the raw capabilities of the semiconductor industry, a 100mm^2 die at 3nm can contain on the order of 1~10 trillion features. I don't know that we are actually that far off in terms of scale. How to arrange these features seems to be the difficult part.

The EDA [0] problem is immune to the bitter lesson. There are certainly specific arrangements of matter that can solve this problem better than a GPU/TPU/CPU can today.

[0] https://en.wikipedia.org/wiki/Electronic_design_automation


The bigger issue though is that we can't scale that die to the approximate volume of a human brain in any dimension.

Those feature sizes are tiny, yes, but we struggle to put them in a block the size of a human brain and keep it cool enough to be useful (or even make it affordable).


Unlike neurons, connections those features can't be rewired on the fly so the comparison is not even meaningful.



This is a great summary! I've joked with a coworker that while our capabilities can sometimes pale in comparison (such as dealing with massively high-dimensional data), at least we can run on just a few sandwiches per day.


One sandwich is about the energy equivalent of running about two modern desktop CPU’s (flat out) for about an hour.


> To simplify it greatly, an LLM neuron is a single input single output function. A human brain neuron takes in thousands of inputs and produces thousands of outputs

This is simply a scaling problem, eg. thousands of single I/O functions can reproduce the behaviour of a function that takes thousands of inputs and produces thousands of outputs.

Edit: As for the rest of your argument, it's not so clear cut. An LLM can produce a complete essay in a fraction of the time it would take a human. So yes, a human brain only consumes about 20W but it might take a week to produce the same essay that the LLM can produce in a few seconds.

Also, LLMs can process multiple prompts in parallel and share resources across those prompts, so again, the energy use is not directly comparable in the way you've portrayed.


> This is simply a scaling problem, eg. thousands of single I/O functions can reproduce the behaviour of a function that takes thousands of inputs and produces thousands of outputs.

I think it's more than just scaling, you need to understand the functional details to reproduce those functions (assuming those functions are valuable for the end result as opposed to just the way it had to be done given the medium).

An interesting example of this neuron complexity that was published recently:

As rats/mice (can't remember which) are exposed to new stimuli, the axon terminals of a single neuron do not all transmit a signal when there is an action potential, they transmit in a changing pattern after each action potential and ultimately settle into a more consistent pattern of some transmitting and some not.

IMHO: There is interesting mathematical modeling and transformations going on in the brain that is the secret sauce for our intelligence and it is yet to be figured out. It's not just scaling of LLM's, it's finding the right functions.


Yes, there may be interesting math, but I didn't mean "scaling LLMs", necessarily. I was making a more general point that a single-I/O function can pretty trivially replicate a multi-I/O function, so the OP's point that "LLM neurons" are single-I/O and bio neurons are multi-I/O doesn't mean much. Estimates of brain complexity have already factored this in, which is why we know we're still a few orders of magnitude away from the number of parameters needed for a human brain in a raw compute sense.

However, the human brain has extra parameters that a pure/distilled general intelligence may not actually need, eg. emotions, some types of perception, balance, and modulation of various biological processes. It's not clear how many of the parameters of the human brain these take up, so maybe we're not as far as we think.

And there are alternative models such as spiking neural networks which more closely mimic biology, but it's not clear whether these are really that critical. I think general intelligence will likely have multiple models which achieve similar results, just like there are multiple ways to sort a set of numbers.


> Yes, there may be interesting math, but I didn't mean "scaling LLMs", necessarily.

Ya, I realized later that the LLM scaling part of my post sounded like it misinterpreted what you said when it was really a separate point unrelated to the topic of neurons that just happened to include the word "scaling" also.

I do agree with you somewhat that just because biological neurons are vastly more complex and functional than typical artificial neurons, it just means we need more artificial neurons to achieve similar functionality.

> Estimates of brain complexity have already factored this in

I don't agree with this, the estimates I've seen don't seem to factor it in, and many of those estimates were prior to things discovered within just the last 5 years that expose significantly more complexity and capability that needs to be understood first.

> And there are alternative models such as spiking neural networks which more closely mimic biology, but it's not clear whether these are really that critical.

I kept reading that people wanted to use spiking networks and I thought the same thing as you, it didn't seem to provide a benefit. A while ago I read some paper about why they want to use spiking networks and I can't remember the details but they described some functional capabilities that really were much easier with spiking. I vaguely remember that it had to do with processing real-time sensory information, it was easier to synchronize signals based on frequency instead of trying to rely on precise single signal timing (something like that). And I think there were benefits in other areas also.


I agree with both of you, but scaling isn't feasible with this paradigm. You could need continent-sized hardware to approximate general intelligence with the current paradigm.


> You could need continent-sized hardware to approximate general intelligence with the current paradigm.

I doubt it, if by "current paradigm" you mean the hardware and general execution model, eg. matrix math. Model improvements from progress in algorithms have been outpacing performance improvements from hardware progress for decades. Even if hardware development stopped today, models will continue improving exponentially.


> We would need a new hardware paradigm.

It's not even that. The architecture(s) behind LLMs are nowhere near close that of a brain. The brain has multiple entry-points for different signals and uses different signaling across different parts. A brain of a rodent is much more complex than LLMs are.


LLM 'neurons' are not single input/single output functions. Most 'neurons' are Mat-Vec computations that combine the products of dozens or hundreds of prior weights.

In our lane the only important question to ask is, "Of what value are the tokens these models output?" not "How closely can we emulate an organic bran?"

Regarding the article, I disagree with the thesis that AGI research is a waste. AGI is the moonshot goal. It's what motivated the fairly expensive experiment that produced the GPT models, and we can look at all sorts of other hairbrained goals that ended up making revolutionary changes.


> "Of what value are the tokens these models output?" not "How closely can we emulate an organic bran?"

Then you build something that is static and does not learn. This is as far from AI as you can get. You're just building a goofy search engine.


"To simplify it greatly, an LLM neuron is a single input single output function". This is very wrong unless I'm mistaken. A synthetic neuron is multiple input single output.


Ten thousands of extremely complex analog inputs, one output with several thousand of targets that MIGHT receive the output with different timing and quality.

One neuron is ufathomably complex. It‘s offensive to biology to call a cell in a mathematical matrix neuron.


It's even worse than number of input/outputs, number of neurons, efficiency or directional feedback.

The brain also has plasticity! The connections between neurons change dynamically - an extra level of meta.


Connections between LLM neurons also change during training.


a) "during training" is a huuuuge asterisk

b) Do you have a citation for that? my understanding is that while some weights can go to zero and effectively be removed, no (actually used in prod) network architecture or training method allows arbitrary connections.


“And the final kicker: the human brain runs on like two dozen Watts. An LLM takes a year of running on a few MW to train and several KW to run.”

I’ve always thought about nature didn’t evolve to use electricity as its primary means of energy. Instead it uses chemistry. It’s quite curious, really.

Like a tiny insect is chemistry powered. It doesn’t need to recharge batteries, it needs to eat and breathe oxygen.

What if our computers started to use biology and chemistry as their primary energy source?

Or will it be the case that in the end using electricity as the primary energy source is more efficient for “human brain scale computation”, it’s just that nature didn’t evolve that way…


"Wetware" as it were , i remeber some research article some time back where they grew some kind of 'brainlets' and had them functional as far as memory was concerned[1]. Would be a interesting to see how that tech would progress while incororatd in the current silicon/photonic devices for input/output. Downside would be the durability of the organic matter and replacement.

[1] https://sciencesensei.com/scientists-created-thinking-brain-...


>Downside would be the durability of the organic matter and replacement.

thinking about the consequences of consumer grade "organic computing" gets weird really fast. How do you interface biological matter with peripherals? What about toxicity? What about pathogens? Not only as targets, but as vectors too. What about senescence? Would my computer catch a cold or get Alzheimer's? What about energy? Would I have to buy proteins/sugar for my PC? Would a "beefy PC master race" kind of machine big enough to gain sentience? Would my PC need to literally sleep?!

Funny to think about it


I always thought optical signalling would be a interesting i/o pathway being non intrusive in HumanComputer interaction - rather than electrical signalling probes ala neuralink.

Yeah and sentience open a whole new can of ethical/moral implications to sort through.


Minor correction here. You are correct about hardware being an issue, but the magnitude is much greater. You have a lot more than "thousands" of inputs. In the hand alone you have ~40,000+ tactile corpuscles (sensing regions). And that's just one mode. The eye has ~7 million cones and 80 million rods. There is processing and quantization performed by each of those cells and each of the additional cells those signal, throughout the entire sensory-brain system. The amount of data the human brain processes is many orders of magnitude greater than even our largest exascale computers. We are at least 3 decades from AGI if we need equivalent data processing as the human brain, and that's optimistic.

Like you mention, each individual neuron or synapse includes fully parallel processing capability. With signals conveyed by dozens of different molecules. Each neuron (~86 billion) holds state information in addition to processing. The same is true for each synapse (~600 quadrillion). That is how many ~10 Hz "cores" the human computational system has.

The hubris of the AI community is laughable considering the biological complexity of the human body and brain. If we need anywhere close to the same processing capability, there is no doubt we are multiple massive hardware advances away from AGI.


I agree with this up till saying we must be very far from AGI. I don't think we're close, but the scale of human inputs doesn't tell us anything about it. A useful AGI need not be capable of human level cognition, and human level cognition need not require the entire human biological or nervous systems - we're a product of millions of years of undirected random evolution, optimized to run a fleshy body and survive African plains predators. This whole thing we do of thinking and science and engineering is a quirk that made us very adaptable, but how much of what we are is required to implement it isn't clear (i.e. a human minus a hand can still understand advanced mathematics, there are blind programmers etc.)


I'm pretty sure human level cognition requires human level processing power. We are still multiple orders of magnitude away from that.

A blind programmer still has human processing power. The "usually-sight" regions of the brain don't just shut down. They're still used.


Sure, but there are animals with much larger brains on Earth which have - we believe - a reasonably high level of cognition but have not achieved the technological and engineering feats which we have (i.e. dolphins, whales, elephants as naive examples based on brain mass and complexity).

Conversely you have birds - with much smaller brains - which also don't achieve those things but display advanced language skills, have apparent societal structure and despite our inability to understand it seem to have enough language to communicate advanced concepts.


You are absolutely correct. There are multiple algorithmic advances required in addition to hardware advances.

Parrots have hundreds of millions to a few billion neurons, which are just as much parallel-local state-proceasing units as the ones in the mammalian brain. We haven't done a great job of simulating a ~300 neuron c. elegans worm. Not that simulation is required for equivalent intelligence. I'm just saying these analog machines are much much more complex and powerful than the average AI aficionado gives credit. So complex that we are nowhere near AGI.


Med resident here: AFAIK the 80-90 billion neuron is misleading: more than 80% of them are in the cerebellum and are mostly a low pass filter for motor signals. People born with no cerebellum are of normal intelligence. And we don't know how much of the neocortex is actually useful for consciousness but apparently a minority of it.


I wrote a concrete expected‑value model for AGI that anchors rewards in the 15–30T USD Western white‑collar payroll, adds spillovers on 60T GDP, includes transition costs, and varies probability explicitly. Three scenarios (optimistic, mid, pessimistic) show when the bet is rational versus value‑destroying—no mysticism, just plug‑and‑play numbers. If you’re debating AGI’s payoff, benchmark it against actual payroll and GDP, not vibes.

Read: https://pythonic.ninja/blog/2025-11-15-ev-of-agi-for-western...


it is an architecture problem, too. LLMs simply aren't capable of AGI


Why not?

A lot of people say that, but no one, not a single person has ever pointed out a fundamental limitation that would prevent an LLM from going all the way.

If LLMs have limits, we are yet to find them.


We have already found limitations of the current LLM paradigm, even if we don't have a theorem saying transformers can never be AGI. Scaling laws show that performance keeps improving with more params, data + compute but only following a smooth power law with sharply diminishing returns. Each extra order of magnitude of compute buys a smaller gain than the last, and recent work suggests we're running into economic and physical constraints on continuing this trend indefinitely.

OOD is still unsolved problem, they basically struggle under domain shifts and long tail cases or when you try systematically new combinations of concepts (especially on reasoning heavy tasks). This is now a well documented limitation of LLMs/multimodal LLMs.

Work on COT faithfulness shows that the step by step reasoning they print doesn't match their actual internal computation, they frequently generate plausible but misleading explanations of their own answers (lookup anthropic paper). That means they lack self knowledge about how/why they got a result. I doubt you can get AGI without that.

None of this proves that no LLM based architecture could ever reach AGI. But it directly contradicts the idea that we haven't found any limits. We've already found multiple major limitations of the current LLMs, and there's no evidence that blindly scaling this recipe is enough to cross from very capable assistant to AGI.


A lot of those failings (i.e. COT faithfulness) are straight up human failure modes.

LLMs failing the same way as humans do on the same tasks as humans is a weak sign of "this tech is AGI capable", in my eyes. Because it hints that LLMs are angling to do the same things human mind does, and in similar enough ways to share the failure modes. And human mind is the one architecture we know to support general intelligence.

Anthropic has a more recent paper on introspection in LLMs, by the way. With numerous findings. The main takeaway is: existing LLMs have introspection capabilities - weak, limited and unreliable, but present nonetheless. It's a bit weird, given that we never trained them for that.

https://transformer-circuits.pub/2025/introspection/index.ht...

You can train them to be better at it, if you really wanted to. A few other papers tried, although in different contexts.


This is all nonsense and you are just falling for marketing that you want to be true.

The whole space is largely marketing at this point, intentionally conflating all these philosophical terms because we don't want to face the ugly reality that LLMs are a dead end to "AGI".

Not to mention, it is not on those who don't believe in Santa Clause to prove that Santa Clause doesn't exist. It is on those who believe in Santa Clause to show how AGI can possibly emerge from next token prediction.

I would question if you even use the models much really because I thought this in 2023 but I just can't imagine how anyone who uses the models all the time can possibly think we are on the path to AGI with LLMs in 2025.

It is almost like the idea of a thinking being emerging from text was a dumb idea to start with.


You are falling for the AI effect.

Which is: flesh apes want to feel unique and special! And "intelligence" must be what makes them so unique and special! So they deny "intelligence" in anything that's not a fellow flesh ape!

If an AI can't talk like a human, then it must be the talking that makes the human intelligence special! But if the AI can talk, then talking was never important for intelligence in the first place! Repeat for everything.

I use LLMs a lot, and the improvements in the last few years are vast. OpenAI's entire personality tuning team should be loaded into a rocket and fired off into the sun, but that's a separate issue from raw AI capabilities, which keep improving steadily and with no end in sight.


Breaking down in -30C temperatures is also human failure mode, but doesen't make cars human. They both exhibit the exact same behavior (not moving), but are fundamentally different


The similarities go quite a bit deeper than that.

Both rely on a certain metabolic process to be able to move. Both function in a narrow temperature range, and fail outside it. Both have a homeostatic process that attempts to keep them in that temperature range. Both rely on chemical energy, oxidizing stored hydrocarbons to extract power from them, and both take in O2-rich air, and emit air enriched in CO2 and water vapor.

So, yes, the cars aren't humans. But they sure implement quite a few of the same things as humans do - despite being made out of very different parts.

LLMs of today? They implement abstract thinking the same way cars implement aerobic metabolism. A nonhuman implementation, but one that does a great many of the same things.


Real time learning that doesn't pollute limited context windows.


You can mimic this already. Unreliable and computationally inefficient, but those are not fundamental limitations.


LLMs are bounded by the same bounds computers are. They run on computers so a prime example of a limitation is Rices theorem. Any ‘ai’ that writes code is unable (just like humans) to determine if the output is or is not error free.

This means a multi agent workflow without human that writes code may or may not be error free.

LLMs are also bounded by runtime complexity. Could an llm find the shortest Hamiltionian path between two cities in non polynomial time?

LLMs are bounded by in model context: Could an llm create and use a new language with no context in its model?


Assuming you want to define the goal, "AGI", as something functionally equivalent to part (or all) of the human brain, there are two broad approaches to implement that.

1) Try to build a neuron-level brain simulator - something that is a far distant possibility, not because of compute, but because we don't have a clear enough idea of how the brain is wired, how neurons work, and what level of fidelity is needed to capture all the aspects of neuron dynamics that are functionally relevant rather than just part of a wetware realization

OR

2) Analyze what the brain is doing, to extent possible given our current incomplete knowledge, and/or reduce the definition of "AGI" to a functional level, then design a functional architecture/implementation, rather than neuron level one, to implement it

The compute demands of these two approaches are massively different. It's like the difference between an electronic circuit simulator that works at gate level vs one that works at functional level.

For time being we have no choice other than following the functional approach, since we just don't know enough to build an accurate brain simulator even if that was for some reason to be seen as the preferred approach.

The power efficiency of a brain vs a gigawatt systolic array is certainly dramatic, and it would be great for the planet to close that gap, but it seems we first need to build a working "AGI" or artificial brain (however you want it define the goal) before we optimize it. Research and iteration requires a flexible platform like GPUs. Maybe when we figure it out we can use more of a dataflow brain-like approach to reduce power usage.

OTOH, look at the difference between a single user MOE LLM, and one running in a datacenter simultaneously processing multiple inputs. In the single-user case we conceptualize the MOE as saving FLOPs/power by only having one "expert" active at a time, but in the multi-user case all experts are active all the time handling tokens from different users. The potential of a dataflow approach to save power may be similar, with all parts of the model active at the same time when handling a datacenter load, so a custom hardware realization may not be needed/relevant for power efficiency.


Or

3) Pour enough computation into a sufficiently capable search process and have it find a solution for us

Which is what we're doing now.

The bitter lesson was proven right once again. LLMs prove that you can build incredibly advanced AIs without "understanding" how they work.


You could do an architectural search, and Google previously did that for CNNs with it's NASNet (Network Architectural Search) series of architectures, but the problem is you first need to decide what are the architectural components you want your search process to operate over, so you are baking in a lot of assumptions from the start and massively reducing the search space (because this is necessary to be computationally viable).

A search or evolutionary process would also need an AGI-evaluator to guide the search, and this evaluator would then determine the characteristics of the solution found, so it rather smacks of benchmark gaming rather than the preferred approach of designing for generic capabilities rather than specific evaluations.

I wouldn't say we don't know how LLMs "work" - clearly we know how the transformer itself works, and it was designed intentionally with certain approach in mind - we just don't know all the details of what representations it has learnt from the data. I also wouldn't say LLMs/transformers represent a bitter lesson approach since the architecture is so specific - there is a lot of assumptions baked into it.


Hard problem of consciousness seems way harder to wolve than the easy one which is a purely engineering problem. People have been thinking about why the brain thinks for a very long time and so far we have absolutely no idea.


> People have been thinking about why the brain thinks for a very long time and so far we have absolutely no idea

I'm not sure what you mean by this.

I think there is a pretty large consensus that our neocortex is a prediction machine (predicting future observations/outcomes from past experience), and the reason WHY it would have evolved to be this is because there is obvious massive survival benefit in successfully predicting how predators and prey will react ahead of time, what will be the outcome of your own actions, etc, etc. Prediction unlocks you from being stuck in the present having to react to things as they happen and lets you plan ahead.

Thinking = Reasoning/Planning is just multi-step prediction.

I don't think consciousness is the big deal most people think it is - it seems to be just the ability to self-observe (which helps to self-predict), but if we somehow built AGI that wasn't conscious, then who cares?


Not why it was created, why the systems in the brain lead to consciousness. Your option B requires understanding not just mechanically how but the fundamental reason for why consciousness appears. If we just understand the mechanics all we can confidently do is work toward a more and more accurate representation of the brain. AGI without consciousness is speculated but hard for me to believe in.


As noted, consciousness seems to just be the ability to self-observe, which is useful as another predictive input.

I would expect that all intelligent animals are conscious, and any AI we build with a roughly brain-like architecture in terms of connections, looping, and being prediction based would also report itself to be conscious and describe a similar subjective experience. LLMs seem much too simple (just layer-wise pass-thru data flow) to be conscious.

It's possible that some of the neural connections supporting consciousness may have evolved, or been enhanced, due to the evolutionary value of enhanced self-prediction (i.e. this is the reason), but as noted I expect it basically "comes for free" with any complete enough cognitive/sensory architecture.


> As noted, consciousness seems to just be the ability to self-observe, which is useful as another predictive input.

As far as I know, consciousness is referring to something other than self-referential systems, especially with regards to the hard problem of consciousness.

The [philosophical zombie](https://en.wikipedia.org/wiki/Philosophical_zombie) thought experiment is well-known for imagining something with all the structural characteristics that you mention but without conscious experience as in "what-its-like" to be someone.


It seems entirely possible that the "philosophical zombie" is an impossible/illogical construct, and that in fact anything with all the structure necessary for consciousness will of necessity be conscious.

When considering the structural underpinnings of consciousness, it's interesting to note the phenomena of "blindsight", which is essentially a loss of visual consciousness without an actual loss of vision!

Note that anything with mental access to it's own deliberations and sensory inputs will by definition always be able to report "what it's like" to be themselves - what they are experiencing (what's in their mind). If something reports to you their quales of vision or hearing, isn't this exactly what we mean by "what it's like" to be them - how they feel they are experiencing the world?!


> It seems entirely possible that the "philosophical zombie" is an impossible/illogical construct, and that in fact anything with all the structure necessary for consciousness will of necessity be conscious.

Yes, and that's pretty much exactly the point: we don't know of any way of determining whether someone is a p-zombie or a being with conscious phenomenal experience. We can certainly have an opinion or belief or assume that sufficient structure means consciousness, which is a perfectly reasonable stance to take and one that many would take, but we have to be careful to understand that's not a scientific stance since it isn't testable or falsifiable, which is why it's been called the "hard problem" of consciousness. It's an unfounded belief we choose out of reasons like psychological comfort.

With regards to your latter point, I think you are making some sophisticated distinctions regarding the "map and territory" relation, and it seems you've hit upon the crux of the matter: how can we report "what its like" for us to experience something the other person hasn't experienced, if its not deconstructible to phenomenal states they've already experienced (and therefore constructible for them based off of our report)? The landmark paper here is "What Is It Like to Be a Bat?" by Josh Nagel, and if you're ever curious it's a pretty short read.

With regards to "blindsight" since I'm not familiar with it and curious, how do we distinguish between loss of visual consciousness and loss of information transfer between conscious regions, or loss of memory about conscious experience?


I'm not sure how much, if any, work has been done to study the brains of people with blindsight. I'm also not sure I would differentiate between loss of visual consciousness and loss of information transfer ... my understanding is that it's the loss if information transfer that is causing the loss of consciousness (e.g maybe your visual cortex works fine, so you can see, and you can perform some visual tasks that have been well practiced and/or no longer need general association cortex, but if the connection between visual cortex and association cortex was lost, then perhaps this is where you become unaware of your ability to see, i.e. lose visual consciousness).

I don't think it's a memory issue - one classic test of blindsight is asking the patient to navigate a cluttered corridor full of obstacles, which the patient succeeds in doing despite reporting themselves as blind - so it's a real-time phenomena, not one of memory.


> Yes, and that's pretty much exactly the point: we don't know of any way of determining whether someone is a p-zombie or a being with conscious phenomenal experience.

That seems to come down to defining, in a non hand-wavy way, what we mean by "conscious phenomenal experience". If this is referring to personal subjective experience, then why is just asking them to report that subjective experience unsatisfactory ?!

I get that consciousness is considered as some ineffable personal experience, but as a thought experiment, what if the experimenter, defining themselves as "conscious" wanted to probe if some subject's subjective experience differed from their own, then they could at least attempt to verbalize any and all aspects of their own (the experimeter's) subjective experience and ask the subject if they felt the same, and the more (unconstrained) questions they asked without finding any significant difference would make it asymptotically unlikely that there was any difference.

> which is why it's [p-zombie detection] been called the "hard problem" of consciousness

AFAIK the normal definition of the hard problem is basically how and why the brain gives rise to qualia and subjective experience, which really seems like a non-problem...

We have thoughts and emotions, and mental access to these, so it has to feel like something to be alive and experience things. If we introspect on what having, say, vision, is like, or what it is like to have our eyes open vs shut, then (assuming we don't have blindsight!) we are obviously going to experience the difference and be able to report it - it does "feel" like something.

Qualia are an interesting thing to discuss - why do we experience what we do, or experience anything at all for that matter when we see, say a large red circle. Why does red feel "red"? Why and how does music feel different in nature to color, and why does it feel the way it does, etc?

I think these are also really non-problems that disappear as soon as you start to examine them! What are the differences in quales of seeing a small red circle vs a large red circle, or a large blue circle vs a large red one... When you consider differences in quales vs the fact that we experience anything at all (which is proved by our ability to report that we do). Color is perceived as surface attribute with a spatial extent, with colors differentiated by what they remind us of. Blue brings to mind water, sky and other blue things, Red brings to mind fire, roses, and other red things. Perception of color can be proven to be purely associative, not absolute, by Ivo Kohler's chromatic adaptation experiments, having the subject wear colored goggles whose effect "wears off" after a few days with normal subjective perception of color returning.


I'm actually curious here, because maybe our experiences are different. When you look at something red, before any associations or thoughts kick in, before you start thinking "this reminds me of fire" or analyzing it, is there something it's like for that redness to be there? Some quality to it that exists independent of what you can say about it?

For me, I can turn off all the thinking and associations and just... look. And there's something there that the looking is of or like, if that makes sense. It's hard to put into words because it's prior to words, and can possibly be independent of them.

But maybe that's not something universal? I know some people don't have visual imagery or an inner voice, so maybe phenomenal experience varies more than we assume. Does that distinction between the experience itself and your ability to think/talk about it track for you at all?


> And there's something there that the looking is of or like, if that makes sense

I think I know what you mean, but if you consider something really simple like the patch of a single color, even without any color associations (although presumably they are always there subconsciously) then isn't the experience just of "a surface attribute, of given spatial extent". There is something there, that is the same in that spatial region, but different elsewhere.

At least, that's how it seems to me, and isn't that exactly how the quale of a color has to be - that is the essence of it ?!


Yeah that pretty much seems to be it.

> then isn't the experience just of "a surface attribute, of given spatial extent".

I don't know why this seems to be so hard for me to think about and even put into words, but isn't "the experience of the surface attribute of a given spatial extent" something other than the experience of the surface attribute of a given spatial extent itself?

I mean that the words we use to describe something aren't the something itself. Conceivably, you can experience something without ever having words, and having words about a phenomenal visual experience doesn't seem to change the experience much or at all (at least for me).

Maybe another way of phrasing this would be something like: can we talk about red blotches using red blotches themselves, in the same way that we can talk about words using words themselves? And then, supposing that we could talk about red blotches using red blotches (maybe the blotches are in the form of words or structured like knowledge, I dunno), can we talk about red blotches without ever having experienced red blotches? I learned this idea from Mary's Room thought experiment, but I still don't know what to think about it.


Yes - the experience / quale has nothing to do with words.

The point (opinion) I'm trying to make is that something like the quale of vision, that is so hard to describe, basically has to be the way it is, because that is it's fundamental nature.

Consider starting with your eyes closed, and maybe just a white surface in front of you, then you open your eyes. Seeing is not the same as not-seeing, so it has to feel different. If it was a different color then the input to your brain would be different, so that has to feel different too. Vision is a spatial sense - we have a 2-D array of rods and cones in our retina feeding into our brain, so (combined with persistence of vision) we experience the scene in front of us all at once in a spatial manner, completely unlike hearing which is a temporal sense with one thing happening after another... etc, etc.

It seems to me that when you start analyzing it, everything about the quale of vision (or hearing, or touch, or smell) has to be the way it is - it is no mystery - and an artificial brain with similar senses would experience it exactly the same way.


Yep that's a cogent, serious stance, and it sounds a lot like illusionism (famously argued by Daniel Dennett) or functionalism if you ever wanted to check out more about it.

It’s a serious stance, but the really interesting thing to me here is that its not a settled fact. What’s quite surprising and unique about this field is that unlike physics or chemistry where we generally agree on the basics, in consciousness studies you have some quite brilliant minds totally deadlocked on the fundamentals. There is absolutely no consensus on whether the problem is 'solved' or 'impossible,' and its definitely not a matter of people not taking this seriously enough or making some rash judgments or simple errors.

I find this fascinating because this type of situation is pretty rare or unique in modern science. Maybe the fun part is that I can take one stance and you another and here there's no "right answer" that some expert knows and one of us is "clearly" wrong. Nice chatting with you :)


Correct - the vast majority of people vastly underestimate the complexity of the human brain and the emergent properties that develop from this inherent complexity.


>It is ultimately a hardware problem.

I think it's more an algorithm problem. I've been reading how LLMs work and the brain does nothing like matrix multiplication over billions of entities. It seems a very inefficient way to do it in terms of compute use, although efficient in terms of not many lines of code. I think the example of the brain shows one could do far better.


exactly, the brain - what a concept! over here you have broca's area, there, wernicke, then Bowman's crest, sector 19, and undiscovered country.

if you put the brain in the shape of a tube you'd have a really long err, well, let's say it's not a good idea to do that. the brain gives me goosepimples, my brain too


Humans grow over years with plenty of self guided study. It's far more than a hardware problem.


Quantum compute is my guess. Being able to switch entire models at atomic speeds will give the perception of intelligence at least. There is still a lot there that will need to be figured out between now and then.


I remember reading about memristors when I was at University and the hope they could help simulate neurons.

I don't remember hearing much about neuromorphic computing lately though so I guess it hasn't had much progress.


It’s not the level of computing we might hope for, but there has been some progress in developing memristors :)

https://journals.plos.org/plosone/article?id=10.1371/journal...


That's my non-expert belief as well. We are trying to brute force an approximation of one aspect of how neurons work at great cost.


> And the final kicker: the human brain runs on like two dozen Watts. An LLM takes a year of running on a few MW to train and several KW to run.

I mean, you could argue that if you take into consideration all the generations (starting from the first amoeba) that it took to get to a standard human brain today, then the total energy used to "train" that brain is far greater. But I get your point and I do agree with you that our current hardware paradigm is probably not what's going to give us "god in a box".


Try explaining to someone who's only ever seen dial-up modems that 4k HDR video streaming is a thing.


Dial-up modems can transfer a 4K HDR video file, or any other arbitrary data.

It obviously wouldn't have the bandwidth to do so in a way that would make a real-time stream feasible, but it doesn't involve any leap of logic to conclude that a higher bandwidth link means being able to transfer more data within a given period of time, which would eventually enable use cases that weren't feasible before.

In contrast, you could throw an essentially unlimited amount of hardware at LLMs, and that still wouldn't mean that they would be able to achieve AGI, because there's no clear mechanism for how they would do so.


From modern perspective it's obvious that simply upping the bandwidth allows streaming high-quality videos, but it's not strictly about "more bigger cable". Huge leaps in various technologies were needed for you to watch video in 4k:

- 4k consumer-grade cameras

- SSDs

- video codecs

- hardware-accelerated video encoding

- large-scale internet infrastructure

- OLED displays

What I'm trying to say is that I clearly remember reading an old article about sharing mp3s on P2P networks and the person writing the article was confident that video sharing, let alone video streaming, let alone high-quality video streaming, wouldn't happen in foreseeable future because there were just too many problems with that.

If you went back in time just 10 years and told people about ChatGPT they simply wouldn't believe you. They imagined that an AI that can do things that current LLMs can do must be insanely complex, but once technology made that step, we realized "it's actually not that complicated". Sure, AGI won't surface from simply adding more GPUs into LLMs, just like LLMs didn't emerge from adding more GPUs to "cat vs dog" AI. But if technology took us from "AI can tell apart dog and cat 80% of the time" to "AI is literally wiping out entire industry sectors like translation or creative work while turning people into dopamine addicts en masse" within ten years, then I assume that I'll see AGI within my lifetime.


There's nothing about 4K videos that needs an SSD, an OLED display, or any particular video codec, and "large-scale internet infrastructure" is just a different way of saying "lots of high-bandwidth links". Hardware graphics acceleration was also around long before any form of 4K video, and a video decoding accerator is such an obvious solution that dedicated accelerators were used for early full-motion video before CPUs could reasonably decode them.

Your anecdote regarding P2P file sharing is ridiculous, and you've almost certainly misunderstood what the author was saying (or the author themselves was an idiot). That there wasn't sufficient bandwidth or computing power to stream 4K video at consumer price points during the heyday of mp3 file sharing, didn't mean that no one knew how to do it. It would be as ridiculous as me today saying that 16K stereoscopic streaming video can't happen. Just because it's infeasible today, doesn't mean that it's impossible.

Regarding ChatGPT, setting aside the fact that the transformer model that ChatGPT is built on was under active research 10 years ago, sure, breakthroughs happen. That doesn't mean that you can linearly extrapolate future breakthroughs. That would be like claiming that if we developer faster and more powerful rockets, then we will eventually be able to travel faster than light.


so planes that don't flap their wings can't fly


Exactly why I cringe so hard when AI-bros make arguments equating AI neurons to biological neurons.


There are some tradeoffs in the other direction. Digital neurons can have advantages that biological neurons do not.

For example, if biology had a "choice" I am fairly confident that it would have elected to not have leaky charge carriers or relatively high latency between elements. Roughly 20% of our brain exists simply to slow down and compensate for the other 80%.

I don't know that eliminating these caveats is sufficient to overcome all the downsides, but I also don't think we've tried very hard to build experiments that directly target this kind of thinking. Most of our digital neurons today are of an extremely reductive variety. At a minimum, I think we need recurrence over a time domain. The current paradigm (GPU-bound) is highly allergic to a causal flow of events over time (i.e., branching control flows).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: