The token limit is 100% an artificial limitation. When ChatGPT first released la...

CGamesPlay · on May 28, 2023

> When ChatGPT first released last november I took the opportunity to try pasting 3k-line codebases into it to get it to walk me through them and it worked perfectly fine.

A common technique to work around the limitations in context length is to simply pull the most recent context that fits into the length. It can be difficult to notice that this happens because oftentimes the full context isn't actually necessary. However, specific details from the context are actually lost. For example, if you ask the model to list the filenames back in the same order, and the context was truncated, it would start from the first non-truncated file and the others would be dropped.

> If you had infinite video memory there would be no "fundamental limit" to how long a LLM can output.

Well, you've certainly got me there. One of the big limits with the transformer architecture today is that the memory usage grows quadratically with context length due to the attention mechanism. This is why there's so much interest in alternatives like RWKV <https://news.ycombinator.com/item?id=36038868>, and why scaling them is hard <https://news.ycombinator.com/item?id=35948742>.

jph00 · on May 28, 2023

FlashAttention has memory linear in sequence length. https://github.com/HazyResearch/flash-attention

tomduncalf · on May 28, 2023

> The token limit is 100% an artificial limitation.

My understanding is the token limit is an immutable property of the neural network once it has been trained, so definitely is not an artificial limitation - unless you’re suggesting OpenAI trained the NN with a higher token count then released it with a limit on the input to only allow smaller ones? Which I guess is plausible but I’m not sure why they’d do it, as they’d still be “executing” the same NN for every input so wouldn’t save any compute.

BoiledCabbage · on May 28, 2023

I think you misunderstood the token limit. LLMs don't block your buffer, they simply take the final n tokens of all of the input you've shared. Plenty of users have seen this.

It will still function, but anything you previously shared and referenced above it will lose context on. And if you ask it about that earlier content, it will do its best to hallucinate a reasonable answer of what might have been in your buffer before the cutoff.

Separately you may have found a physical hard limit with a bug that crashes the system, but that's not what's meant by a token limit in LLMs. It's a limitation of the architecture itself of any LLM.

rahimnathwani · on May 28, 2023

  When ChatGPT first released last november I took the opportunity to try pasting 3k-line codebases into it to get it to walk me through them and it worked perfectly fine

Are these private codebases, or open source ones? If public, would you mind sharing a link to the ChatGPT session(s)?

smonn_ · on May 28, 2023

The token limit depends on the way the training data was encoded, usually its using some sort of sin/cos function which has the problem where longer inputs than it was trained on cause the accuracy to plummet. Its also very likely that it just took the last part of your input as context.