If you expect "the right way" to be something _other_ than a system which can generate a reasonable "state + 1" from a "state" - then what exactly do you imagine that entails?
That's how we think. We think sequentially. As I'm writing this, I'm deciding the next few words to type based on my last few.
Blows my mind that people don't see the parallels to human thought. Our thoughts don't arrive fully formed as a god-given answer. We're constantly deciding the next thing to think, the next word to say, the next thing to focus on. Yes, it's statistical. Yes, it's based on our existing neural weights. Why are you so much more dismissive of that when it's in silicon?
Finite-state machines are a limited model. In principle, you can use them to model everything that can fit in the observable universe. But that doesn't mean they are a good model for most purposes.
The biggest limitation with the current LLMs is the artificial separation between training and inference. Once deployed, they are eternally stuck in the same moment, always reacting but incapable of learning. At best, they are snapshots of a general intelligence.
I also have a vague feeling that a fixed set of tokens is a performance hack that ultimately limits the generality of LLMs. That hardcoded assumptions make tasks that build on those assumptions easier and seeing past the assumptions harder.
> As I'm writing this, I'm deciding the next few words to type based on my last few.
If so you could have written this as a newborn baby, you are determining these words based on a lifetime of experience. LLMs doesn't do that, every instance of ChatGPT is the same newborn baby while a thousand clones of you could all be vastly different.
We argue that representations in AI models, particularly deep networks, are converging. First, we survey many examples of convergence in the literature: over time and across multiple domains, the ways by which different neural networks represent data are becoming more aligned. Next, we demonstrate convergence across data modalities: as vision models and language models get larger, they measure distance between datapoints in a more and more alike way. We hypothesize that this convergence is driving toward a shared statistical model of reality, akin to Plato’s concept of an ideal reality. We term such a representation the platonic representation and discuss several possible selective pressures toward it. Finally, we discuss the implications of these trends, their limitations, and counterexamples to our analysis.
That's how we think. We think sequentially. As I'm writing this, I'm deciding the next few words to type based on my last few.
Blows my mind that people don't see the parallels to human thought. Our thoughts don't arrive fully formed as a god-given answer. We're constantly deciding the next thing to think, the next word to say, the next thing to focus on. Yes, it's statistical. Yes, it's based on our existing neural weights. Why are you so much more dismissive of that when it's in silicon?