The token limit is 100% an artificial limitation. When ChatGPT first released last november I took the opportunity to try pasting 3k-line codebases into it to get it to walk me through them and it worked perfectly fine, putting that same code in the OpenAI tokenizer tells me it's ~33k tokens, way above the limits today. The reason they do this is because every token takes up ~1mb of video memory and that adds up real quick. If you had infinite video memory there would be no "fundamental limit" to how long a LLM can output.
OpenAI then has two limits on inputs. The first artificial one ensures that people don't get overzealous inputing too much, otherwise they'll hit the second hard limit of how much vram their cards have. To the LLM itself there is no difference between characters from the chatbot and human, the only hard limiter is the total number of tokens. I tried this out by inputing a 4k-token string into ChatGPT as many times as I could and it failed on the 20th input, meaning that the hard limit is >80k tokens. Converting this to vram gives us >80gb which is the exact amount of ram the Nvidia a100 card has.
OpenAI then has two limits on inputs. The first artificial one ensures that people don't get overzealous inputing too much, otherwise they'll hit the second hard limit of how much vram their cards have. To the LLM itself there is no difference between characters from the chatbot and human, the only hard limiter is the total number of tokens. I tried this out by inputing a 4k-token string into ChatGPT as many times as I could and it failed on the 20th input, meaning that the hard limit is >80k tokens. Converting this to vram gives us >80gb which is the exact amount of ram the Nvidia a100 card has.