To me it feels that whatever 'proof' you give that LLMs have a model in behind, ...

cjbprime · 2024-11-01T16:10:20 1730477420

The argument isn't that there is something more than next token prediction happening.

The argument is that next token prediction does not imply an upper bound on intelligence, because an improved next token prediction will pull increasingly more of the world that is described in the training data into itself.

unoti · 2024-11-01T19:24:03 1730489043

> The argument isn't that there is something more than next token prediction happening.

> The argument is that next token prediction does not imply an upper bound on intelligence, because an improved next token prediction will pull increasingly more of the world that is described in the training data into itself.

Well said! There's a philosophical rift appearing in the tech community over this issue semi-neatly dividing people between naysayers, "disbelievers" and believers over this very issue.

nuancebydefault · 2024-11-01T21:43:05 1730497385

I fully agree. Some people fully disagree though on the 'pull of the world' part, let alone 'intelligence' part, which are in fact impossible to define.

corimaith · 2024-10-31T18:51:51 1730400711

The reasoning emerges from the long distance relations between words picked up by the parallel nature of the transformers. It's why they were so much more performant than earlier RNNs and LSTMs which were using similar tokenization.