It might work for fine-tuning an open model to a narrow use case. But creating a...

Der_Einzige · on Dec 11, 2023

Model merging is easy, and a unique model merge may be hard to replicate if you don’t know the original recipe.

Model merging can create truly unique models. Love to see shit from ghost in the shell turn into real life

Yes training a new model from scratch is expensive, but creating a new model that can’t be replicated by fine tuning is easy

Xenoamorphous · on Dec 10, 2023

As someone who doesn’t know much about how these models work or are created I’d love to see some kind of breakdown that shows what % of the power of GPT4 is due to how it’s modelled (layers or whatever) vs training data and the computing resources associated with it.

mensetmanusman · on Dec 11, 2023

This isn't precisely knowable now, but it might be something academics figure out years from now. Of course, first principles of 'garbage in garbage out' would put data integrity very high, the LLM code itself is supposedly not even 100k lines of code, and the HW is crazy advanced.

so the ordering is probably data, HW, LLM model

This also fits the general ordering of

data = all human knowledge HW = integrated complexity of most technologists LLM = small team

Still requires the small team to figure out what to do with the first two, but it only happened now because the HW is good enough.

LLMs would have been invented by Turing and Shannon et al. almost certainly nearly 100 years ago if they had access to the first two.

MacsHeadroom · on Dec 11, 2023

By % of cost it's 99.9% compute cost and 0.01% data costs.

In terms of "secret sauce" it's 95% data quality and 5% architectural choices.

taneq · on Dec 10, 2023

That’s true now, but maybe GPT6 will be able to tell you how to build GPT7 on an old laptop, and you’ll be able to summon GPT8 with a toothpick and three cc’s of mouse blood.