Hacker News new | past | comments | ask | show | jobs | submit login

The tokens are also necessary to store information, or at least off-load it from neuron activations.

E.g. if you asked an LLM "think about X and then do Y", if the "think X" part is silent, the LLM has a high chance of:

a) just not doing that, or

b) thinking about it but then forgetting, because the capacity of 'RAM' or neuron activations is unknown but probably less than a few tokens.

Actually, has anyone tried to measure how much non-context data (i.e. new data generated from context data) a LLM can keep "in memory" without writing it down?




I don’t think commonly used LLM architectures have internal state that carries over between inference steps, so shouldn’t that be none? Unless you mean the previously generated tokens up to the context limit which is well defined.


Sorry, I meant the information that is inferred (from scratch on every token) from the entire context, and is then reduced to that single token. Every time a token is generated, the LLM looks at the entire context, does some processing (and critically, this step generates new data that is inferred from the context) and then the result of all that processing is reduced to a single token.

My conjecture is that the LLM "knows" some things that it does not put into words. I don't know what it is, but it seems wasteful to drop the entire state on every token. I even suspect that there is something like a "single logic step" of some conclusions from the context. Though I may be committing the fallacy of thinking in symbolic terms of something that is ultimately statistical.


Correct, there's no internal state, but CoT techniques simulate this by providing a space for the model to generate tokens which represent intermediary thoughts.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: