Hacker News new | past | comments | ask | show | jobs | submit login

The point is that LLMs can’t backtrack after deciding on a token. So the probability at least one token along a long generation will lead you down the wrong path does indeed increase as the sequence gets longer (especially since we typically sample from these things), whereas humans can plan their outputs in advance, revise/refine, etc.



Humans can backtrack, but the probability of an "correct" output is still (1-epsilon)^n. Not only can any token introduce an error, but the human author will not perfectly catch errors they have previously introduced. The epsilon ought to be lower for humans, but it's not zero.

But more to the point, in the deck provided, Lecun's point is _not_ about backtracking per se. The highlighted / red text on the preceding slide is:

> LLMs have no knowledge of the underlying reality > They have no common sense & they can't plan their answer

Now, we generally generate from LLMs by sampling uniformly forward, but it isn't hard to use essentially the same structure to generate tokens conditioned on both preceding and following sequences. If you ran generation for tokens 1...n, and then ran m iterations of re-sampling internal token i based on (1..i-1, i+1..n), it would sometimes "fix" issues created initial generation pass. It would sometimes introduce new issues, which were fine upon original generation. Process-wise, it would look a lot like MCMC at generation-time.

The ability to "backtrack" does _not_ on its own add knowledge of reality, common sense, or "planning".

When a human edits, they're reconciling their knowledge of the world and their intended impact on their expected audience, neither of which the LLM has.


>> LLMs have no knowledge of the underlying reality > They have no common sense & they can't plan their answer

If his arguments are entirely based on this, then it's not fully correct:

- GPT style language models try to build a model of the world: https://arxiv.org/abs/2210.13382

- GPT style language models end up internally implementing a mini "neural network training algorithm" (gradient descent fine-tuning for given examples): https://arxiv.org/abs/2212.10559


This is false. Standard sampling algorithms like beamsearch can "backtrack" and are widely used in generative language models.

It is true that the runtime of these algorithms is exponential in the length of the sequence, and so lots of heuristics are used to reduce this runtime in practice, and this limits the "backtracking" ability. But this limitation is purely for computational convenience's sake and not something inherent in the model.


To extend the non-fiction book example, the author doesn't write it in a one-shot forward pass, they go back and edit.


I could be wrong, but I think the only probabilistic component of an LLM is the statistical word fragment selection at the end. Assuming this is true, one could theoretically run the program multiple times, making different fragment choices. This (while horribly inefficient) would allow a sort of backtracking.


Do you know of any work on holistic response quality in LLMs? We currently have the LLM equivalent of the html line break and hyphenation algorithm, when what we want is the LaTeX version of that algorithm.


nothing is stopping you from doing the same with LLMs, you can have it go back over it’s response and check it for accuracy


What is stopping the LLMs from doing the same?


Sure they can. I even think GPT-3 does this by emitting a backspace token.


no




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: