Yes, of course. We often don't know our real motivations either, our justifications are often self-serving and made up, and more generally there's a lot we don't know about how the brain works.
But we do at least have some memories of our own thought processes and some experience observing our own reactions to things. You can get better at introspection. Attempting to understand and explain your own mistakes is useful.
An LLM won't be able to do this sort of learning without changing its training or architecture somehow.
Actually just added a long paragraph showing how even memories or real time sense data is pleasant fabrication. Also every time you remember something, you're not really recalling it. Your brain rewrites your memory every time you "remember" often introducing false data or inconsistencies. It's one of the reasons human memory is so fallible and inadmissible.
That said, I don't disagree with the overall point of what you've said. Attempted Introspection is useful. The point I'm making is to not dismiss LLM "introspection" simply because it's rationalization. It can be extremely useful too.
The part I see missing for an LLM is learning from feedback about whether its justifications "work" for explaining its own actions. Keep in mind that an LLM is not human and its justifications often shouldn't look like human justifications. But it's going to be scored by how human its justifications look, so you will never get that. (With the exception of things explicitly trained in like knowing its own cut-off date - that's quite a non-human justification for not knowing something.)
Also, it's easy to come up with examples showing that we do remember what we were thinking, even if we didn't say it out loud. A simple example might be running an errand. Of course, sometimes we do forget, which is why a shopping list is useful.
An LLM cannot do that at all. It writes it down or it's lost. It can only pretend to remember an internal thought it had.
>An LLM cannot do that at all. It writes it down or it's lost.
I mean this is likely a solvable architecture issue at some point once we have the computing power to perform 'excess' computation and re-learn quickly. Consciousness, in my mind is a form of 'writing it down' and very rarely the primary thought itself, that is consciousness is reflection.
We see improvements in GPT behavior when allow externalized consciousness via reflection. It's able to 'see' its thoughts written, reflect on if they are correct, and then make corrections to improve the quality of its final answer.
The biggest hold up in LLMs is the amount of effort needed to retrain the LLM and ensure it's not gone off the rails. Currently they take months to train, now if Nvidia somehow makes the hardware a million more times powerful like they want to, will it take the training time down to minutes or hours?
>its own cut-off date - that's quite a non-human justification for not knowing something.)
That sounds very human to me... "I worked in the X industry from date Y to Z" is something people say all the time as to put a qualification date on the information they have about something.
Yes, there's no physical reason it couldn't be fixed. However, I think it's less a matter of computing power, and more one of architecture and training.
The inference-time code could be changed to keep a log of which neurons were activated the most when it generated a token. But then, how would we train it to use that log and report it accurately? Training on Internet documents isn't enough.
Chain-of-thought reasoning helps as a workaround. However, beware that even then, the justifications aren't a reliable explanation. There's an interesting paper on this:
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting
But we do at least have some memories of our own thought processes and some experience observing our own reactions to things. You can get better at introspection. Attempting to understand and explain your own mistakes is useful.
An LLM won't be able to do this sort of learning without changing its training or architecture somehow.