The tokens are also necessary to store information, or at least off-load it from...

pgorczak · on March 22, 2024

I don’t think commonly used LLM architectures have internal state that carries over between inference steps, so shouldn’t that be none? Unless you mean the previously generated tokens up to the context limit which is well defined.

Zondartul · on March 22, 2024

Sorry, I meant the information that is inferred (from scratch on every token) from the entire context, and is then reduced to that single token. Every time a token is generated, the LLM looks at the entire context, does some processing (and critically, this step generates new data that is inferred from the context) and then the result of all that processing is reduced to a single token.

My conjecture is that the LLM "knows" some things that it does not put into words. I don't know what it is, but it seems wasteful to drop the entire state on every token. I even suspect that there is something like a "single logic step" of some conclusions from the context. Though I may be committing the fallacy of thinking in symbolic terms of something that is ultimately statistical.

wnmurphy · on March 22, 2024

Correct, there's no internal state, but CoT techniques simulate this by providing a space for the model to generate tokens which represent intermediary thoughts.