> Proof: if it was false, you could do what GPT4 does with 10 param model and a good prompt.
This is oh so very much a strawman. There is rapid progress in AI. For my domains, the first useful model (without finetuning or additional training) was GPT3, which was released in 2020, and had 175B parameters.
We've had three years of optimization on the models, as well as a lot of progress on how to use them. That means we need fewer parameters today than we did in 2020. That doesn't imply there isn't a hard lower bound somewhere. We just don't know where or what it is.
My expectation is we'll continue to do better and better until, where e.g. a 2030 1B parameter model will be competitive with a 2020 200B parameter model, and a 2030 200B parameter model will be much better than either. After some amount of progress, we'll hit it (or more accurately, asymptotically converge to it).
I don't use local LLMs for coding, but for things related to text (it is a large LANGUAGE model, after all). For that, 7B parameter models became adequate sometime in 2023. For reference, in 2020, they were complete nonsense. You'd get cycles of repeating text, or just lose coherence after a sentence or two.
With my setup, local models aren't anywhere close to fast enough for real-time use. For coding, I need real-time use. It wouldn't surprise me if that domain needed more parameters, just based on what I've seen, but I could be proven wrong. If you buy me an H100, I can experiment with it too. As a footnote, many LARGE models work horribly for coding too; OpenAI did a very good job with GPT there (and I haven't used it enough to know, but I've heard Google did too from people who've used Bard).
This is oh so very much a strawman. There is rapid progress in AI. For my domains, the first useful model (without finetuning or additional training) was GPT3, which was released in 2020, and had 175B parameters.
We've had three years of optimization on the models, as well as a lot of progress on how to use them. That means we need fewer parameters today than we did in 2020. That doesn't imply there isn't a hard lower bound somewhere. We just don't know where or what it is.
My expectation is we'll continue to do better and better until, where e.g. a 2030 1B parameter model will be competitive with a 2020 200B parameter model, and a 2030 200B parameter model will be much better than either. After some amount of progress, we'll hit it (or more accurately, asymptotically converge to it).
I don't use local LLMs for coding, but for things related to text (it is a large LANGUAGE model, after all). For that, 7B parameter models became adequate sometime in 2023. For reference, in 2020, they were complete nonsense. You'd get cycles of repeating text, or just lose coherence after a sentence or two.
With my setup, local models aren't anywhere close to fast enough for real-time use. For coding, I need real-time use. It wouldn't surprise me if that domain needed more parameters, just based on what I've seen, but I could be proven wrong. If you buy me an H100, I can experiment with it too. As a footnote, many LARGE models work horribly for coding too; OpenAI did a very good job with GPT there (and I haven't used it enough to know, but I've heard Google did too from people who've used Bard).