When you train a model on data made by humans, then it learns to imitate but is ungrounded. After you train the model with interactivity, it can learn from the consequences of its outputs. This grounding by feedback constitutes a new learning signal that does not simply copy humans, and is a necessary ingredient for pattern matching to become reasoning. Everything we know as humans comes from the environment. It is the ultimate teacher and validator. This is the missing ingredient for AI to be able to reason.
Yeah but this doesn't change how the model functions, this is just turning reasoning into training data by example. It's not learning how to reason - it's just learning how to pretend to reason, about a gradually wider and wider variety of topics.
If any LLM appears to be reasoning, that is evidence not of the intelligence of the model, but rather the lack of creativity of the question.
Humans are only capable of principled reasoning in domains where they have expertise. We don't actually do full causal reasoning in domains we don't have formal training in. We use all sorts of shortcuts that are similar to what LLMs are doing.
If you consider AlphaTensor or other products in the Alpha family, it shows that feedback can train a model to super-human levels.
It’s the process by which you solve a problem. Reasoning requires creating abstract concepts and applying logic against them to arrive at a conclusion.
It’s like saying what’s the difference between between deductive logic and Monte Carlo simulations. Both arrive at answers that can be very similar but the process is not similar at all.
If there is any form of reasoning on display here it’s an abductive style of reasoning which operates in a probabilistic semantic space rather than a logical abstract space.
This is important to bear in mind and explains why hallucinations are very difficult to prevent. There is nothing to put guard rails around in the process because it’s literally computing probabilities of tokens appearing given the tokens seen so far and the space of all tokens trained against. It has nothing to draw upon other than this - and that’s the difference between LLMs and systems with richer abstract concepts and operations.
When you train a model on data made by humans, then it learns to imitate but is ungrounded. After you train the model with interactivity, it can learn from the consequences of its outputs. This grounding by feedback constitutes a new learning signal that does not simply copy humans, and is a necessary ingredient for pattern matching to become reasoning. Everything we know as humans comes from the environment. It is the ultimate teacher and validator. This is the missing ingredient for AI to be able to reason.