As someone who doesn’t know much about how these models work or are created I’d love to see some kind of breakdown that shows what % of the power of GPT4 is due to how it’s modelled (layers or whatever) vs training data and the computing resources associated with it.
This isn't precisely knowable now, but it might be something academics figure out years from now. Of course, first principles of 'garbage in garbage out' would put data integrity very high, the LLM code itself is supposedly not even 100k lines of code, and the HW is crazy advanced.
so the ordering is probably data, HW, LLM model
This also fits the general ordering of
data = all human knowledge
HW = integrated complexity of most technologists
LLM = small team
Still requires the small team to figure out what to do with the first two, but it only happened now because the HW is good enough.
LLMs would have been invented by Turing and Shannon et al. almost certainly nearly 100 years ago if they had access to the first two.
That’s true now, but maybe GPT6 will be able to tell you how to build GPT7 on an old laptop, and you’ll be able to summon GPT8 with a toothpick and three cc’s of mouse blood.
But creating a base model is out of reach. You need an order of probably hundreds of millions of $$ (if not billion) to get close to GPT 4.