>There's nothing "omniscient" or "dim-witted" about these tools
I disagree in that that seems quite a good way of describing them. All language is a bit inexact.
Also I don't buy we are no closer to AI than ten years ago - there seem lots going on. Just because LLMs are limited doesn't mean we can't find or add other algorithms - I mean look at alphaevolve for example https://www.technologyreview.com/2025/05/14/1116438/google-d...
>found a faster way to solve matrix multiplications—a fundamental problem in computer science—beating a record that had stood for more than 50 years
I figure it's hard to argue that that is not at least somewhat intelligent?
> I figure it's hard to argue that that is not at least somewhat intelligent?
The fact that this technology can be very useful doesn't imply that it's intelligent. My argument is about the language used to describe it, not about its abilities.
The breakthroughs we've had is because there is a lot of utility from finding patterns in data which humans aren't very good at. Many of our problems can be boiled down to this task. So when we have vast amounts of data and compute at our disposal, we can be easily impressed by results that seem impossible for humans.
But this is not intelligence. The machine has no semantic understanding of what the data represents. The algorithm is optimized for generating specific permutations of tokens that match something it previously saw and was rewarded for. Again, very useful, but there's no thinking or reasoning there. The model doesn't have an understanding of why the wolf can't be close to the goat, or how a cabbage tastes. It's trained on enough data and algorithmic tricks that its responses can fool us into thinking it does, but this is just an illusion of intelligence. This is why we need to constantly feed it more tricks so that it doesn't fumble with basic questions like how many "R"s are in "strawberry", or that it doesn't generate racially diverse but historically inaccurate images.
>The machine has no semantic understanding of what the data represents.
How do you define "semantic understanding" in a way that doesn't ultimately boil down to saying they don't have phenomenal consciousness? Any functional concept of semantic understanding is captured to some degree by LLMs.
Typically when we attribute understanding to some entity, we recognize some substantial abilities in the entity in relation to that which is being understood. Specifically, the subject recognizes relevant entities and their relationships, various causal dependences, and so on. This ability goes beyond rote memorization, it has a counterfactual quality in that the subject can infer facts or descriptions in different but related cases beyond the subject's explicit knowledge. But LLMs excel at this.
>feed it more tricks so that it doesn't fumble with basic questions like how many "R"s are in "strawberry"
This failure mode has nothing to do with LLMs lacking intelligence and everything to do with how tokens are represented. They do not see individual characters, but sub-word chunks. It's like expecting a human to count the pixels in an image it sees on a computer screen. While not impossible, it's unnatural to how we process images and therefore error-prone.
You don't need phenomenal consciousness. You need consistency.
LLMs are not consistent. This is unarguable. They will produce a string of text that says they have solved a problem and/or done a thing when neither is true.
And sometimes they will do it over and over, even when corrected.
Your last paragraph admits this.
Tokenisation on its own simply cannot represent reality accurately and reliably. It can be tweaked so that specific problems can appear solved, but true AI would be based on a reliable general strategy which solves entire classes of problems without needing this kind of tweaking.
This is a common category of error people commit when talking about LLMs.
"True, LLMs can't do X, but a lot of people don't do X well either!"
The problem is, when you say humans have trouble with X, what you mean is that human brains are fully capable of X, but sometimes they do, indeed, make mistakes. Or that some humans haven't trained their faculties for X very well, or whatever.
But LLMs are fundamentally, completely, incapable of X. It is not something that can be a result of their processes.
These things are not comparable.
So, to your specific point: When an LLM is inconsistent, it is because it is, at its root, a statistical engine generating plausible next tokens, with no semantic understanding of the underlying data. When a human is inconsistent, it is because they got distracted, didn't learn enough about this particular subject, or otherwise made a mistake that they can, if their attention is drawn to it, recognize and correct.
LLMs cannot. They can only be told they made a mistake, which prompts them to try again (because that's the pattern that has been trained into them for what happens when told they made a mistake). But their next try won't have any better odds of being correct than their previous one.
>But LLMs are fundamentally, completely, incapable of X. It is not something that can be a result of their processes.
This is the very point of contention. You don't get to just assume it.
> it is because it is, at its root, a statistical engine generating plausible next tokens, with no semantic understanding of the underlying data.
Another highly contentious point you are just outright assuming. LLMs are modelling the world, not just "predicting the next token". Some examples here[1][2][3]. Anyone claiming otherwise at this point is not arguing in good faith. It's interesting how the people with the strongest opinions about LLMs don't seem to understand them.
OK, sure; there is some evidence potentially showing that LLMs are constructing a world model of some sort.
This is, however, a distraction from the point, which is that you were trying to make claims that the described lack of consistency in LLMs shouldn't be considered a problem because "humans aren't very consistent either."
Humans are perfectly capable of being consistent when they choose to be. Human variability and fallibility cannot be used to handwave away lack of fundamental ability in LLMs. Especially when that lack of fundamental ability is on empirical display.
I still hold that LLMs cannot be consistent, just as TheOtherHobbes describes, and you have done nothing to refute that.
Address the actual point, or it becomes clear that you are the one arguing in bad faith.
You are misrepresenting the point of contention. The question is whether LLMs lack of consistency undermines the claim that they "understand" in some relevant sense. But arguing that lack of consistency is a defeater for understanding is itself undermined by noting that humans are inconsistent but do in fact understand things. It's as simple as that.
If you want to alter the argument by saying humans can engage in focused effort to reach some requisite level of consistency for understanding, you have to actually make that argument. It's not at all obvious that focused effort is required for understanding or that a lack of focused effort undermines understanding.
You also need to content with the fact that LLMs aren't really a single entity, but are a collection of personas, and what you get and its capabilities do depend on how you prompt it to a large degree. Even if the entity as a whole is inconsistent between prompts, the right subset might very well be reliably consistent. There's also the fact of the temperature setting that artificially injects randomness into the LLMs output. An LLM itself is entirely deterministic. It's not at all obvious how consistency relates to LLM understanding.
Feel free to do some conceptual work to make an argument; I'm happy to engage with it. What I'm tired of are these half-assed claims and incredulity that people don't take them as obviously true.
I imagine if you asked the LLM why the wolf can't be close to the goat it would give a reasonable answer. I realise it does it by using permutation of tokens but I think you have to judge intelligence by the results rather than the mechanism otherwise you could argue humans can't be intelligent because they are just a bunch of neurons that find patterns.
Actually I think the Chinese room fits my idea. It's a silly thought experiment that would never work in practice. If you tried to make one you would judge it unintelligent because it wouldn't work. Or at least in the way Searle implied - he basically proposed a look up table.
We have had programs that can give good answers to some hard questions for a very long time now. Watson won jeapordy already 2011, but it still wasn't very good at replacing humans.
So that isn't a good way to judge intelligence, computers are so fast and have so much data that you can make programs to answer just about anything pretty well, LLM is able to do that but more automatic. But it still doesn't automate the logical parts yet, just the lookup of knowledge, we don't know how to train large logic models, just large language models.
LLMs are not the only model type though? There's a plethora of architectures and combinations being researched.. And even transformers start to be able to do cool sh1t on knowledge graphs, also interesting is progress on autoregressive physics PDE (partial differential equations) models.. and can't be too long until some providers of actual biological neural nets show up on openrouter (probably a lot less energy and capital intense to scale up brain goo in tanks compared to gigawatt GPU clusters).. combine that zoo of "AI" specimen using M2M, MCP etc. and the line between mock and "true"intelligence will blur, escalating our feable species into ASI territory.. good luck to us.
> There's a plethora of architectures and combinations being researched
There were plethora of architectures and combinations being researched before LLM, still took a very long time to find LLM architecture.
> the line between mock and "true"intelligence will blur
Yes, I think this will happen at some point. The question is how long it will take, not if it will happen.
The only thing that can stop this is if intermediate AI is good enough to give every human a comfortable life but still isn't good enough to think on its own.
Its easy to imagine such an AI being developed, imagine a model that can learn to mimic humans at any task, but still cannot update itself without losing those skills and becoming worse. Such an AI could be trained to perform every job on earth as long as we don't care about progress.
If such an AI is developed, and we don't quickly solve the remaining problems to get an AI to be able to progress science on its own, its likely our progress entirely stalls there as humans will no longer have a reason to go to school to advance science.
This approach to defining “true” intelligence seems flawed to me because of examples in biology where semantic understanding is in no way relevant to function. A slime mold solving a maze doesn’t even have a brain, yet it solves a problem to get food. There’s no knowing that it does that, no complex signal processing, no self-perception of purpose, but nevertheless it gets the food it needs. My response to that isn’t to say the slime mold has no intelligence, it’s to widen the definition of intelligence to include the mold. In other words, intelligence is something one does rather than has; it’s not the form but the function of the thing. Certainly LLMs lack anything in any way resembling human intelligence, they even lack brains, but they demonstrate a capacity to solve problems I don’t think is unreasonable to label intelligent behavior. You can put them in some mazes and LLMs will happen to solve them.
>LLMs lack anything in any way resembling human intelligence
I think intelligence has many aspects from moulds solving mazes to chess etc. I find LLMs resemble very much human rapid language responses where you say something without thinking about it first. They are not very good at thinking though. And hopeless if you were to say hook one to a robot and tell it to fix your plumbing.
While it's debatable whether slime molds showcase intelligence, there's a substantial difference between its behavior and modern AI systems. The organism was never trained to traverse a maze. It simply behaves in the same way as it would in its natural habitat, seeking out food in this case, which we interpret as "solving" a human-made problem. In order to get an AI system to do the same we would have to "train" it on large amounts of data that specifically included maze solving. This training wouldn't carry over any other type of problem, for which we would also need to specifically train it on.
When you consider how humans and other animals learn, knowledge is carried over. I.e. if we learn how to solve a maze on paper, we can carry this knowledge over to solve a hedge maze. It's a contrived example, but you get the idea. When we learn, we build out a web of ideas in our minds which we can later use while thinking to solve other types of problems, or the same problems in different ways. This is a sign of intelligence that modern AI systems simply don't have. They're showing an illusion of intelligence, which as I've said before, can still be very useful.
My alternative definition would be something like this. Intelligence is the capacity to solve problems, where a problem is defined contextually. This means that what is and is not intelligence is negotiable in situations where the problem itself is negotiable. If you have water solve a maze, then yes the water could be said to have intelligence, though that would be a silly way to put it. It’s more that intelligence is a material phenomenon, and things which seem like they should be incredibly stupid can demonstrate surprisingly intelligent behavior.
LLMs are leagues ahead of viruses or proteins or water. If you put an LLM into a code editor with access to error messages, it can solve a problem you create for it, much like water flowing through a maze. Does it learn or change? No, everything is already there in the structure of the LLM. Does it have agency? No, it’s a transparently deterministic mapping from input to output. Can it demonstrate intelligent behavior? Yes.
That's an interesting way of looking at it, though I do disagree. Mainly because, as you mention, it would be silly to claim that water is intelligent if it can be used to solve a problem. That would imply that any human-made tool is intelligent, which is borderline absurd.
This is why I think it's important that if we're going to call these tools intelligent, then they must follow the processes that humans do to showcase that intelligence. Scoring high on a benchmark is not a good indicator of this, in the same way that a human scoring high on a test isn't. It's just one convenient way we have of judging this, and a very flawed one at that.
I keep on trying this wolf cabbage goat problem with various permutations, let’s say just a wolf and a cabbage, no goat mentioned. At some step the got materializes in the answer. I tell it there is no goat and yet it answers again and the goat is there.
I disagree in that that seems quite a good way of describing them. All language is a bit inexact.
Also I don't buy we are no closer to AI than ten years ago - there seem lots going on. Just because LLMs are limited doesn't mean we can't find or add other algorithms - I mean look at alphaevolve for example https://www.technologyreview.com/2025/05/14/1116438/google-d...
>found a faster way to solve matrix multiplications—a fundamental problem in computer science—beating a record that had stood for more than 50 years
I figure it's hard to argue that that is not at least somewhat intelligent?