Hacker News new | past | comments | ask | show | jobs | submit login

> When ChatGPT first released last november I took the opportunity to try pasting 3k-line codebases into it to get it to walk me through them and it worked perfectly fine.

A common technique to work around the limitations in context length is to simply pull the most recent context that fits into the length. It can be difficult to notice that this happens because oftentimes the full context isn't actually necessary. However, specific details from the context are actually lost. For example, if you ask the model to list the filenames back in the same order, and the context was truncated, it would start from the first non-truncated file and the others would be dropped.

> If you had infinite video memory there would be no "fundamental limit" to how long a LLM can output.

Well, you've certainly got me there. One of the big limits with the transformer architecture today is that the memory usage grows quadratically with context length due to the attention mechanism. This is why there's so much interest in alternatives like RWKV <https://news.ycombinator.com/item?id=36038868>, and why scaling them is hard <https://news.ycombinator.com/item?id=35948742>.




FlashAttention has memory linear in sequence length. https://github.com/HazyResearch/flash-attention




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: