Fine-tuning is the most important part, but it is under intense research today, things change fast. I hope they can streamline this process because these smaller models can only compete with big models when they are fine-tuned.
Yes but fine tuning requires a lot more gpu memory and is thus much more expensive, complicated and out of reach of most people. To fine tune a >10B model you still need multiple A100 / H100. Let’s hope that changes with quantized fine tuning, forward pass only etc.
What exactly do you mean here that the smaller models can compete with the the larger once they are fine-tuned? What about once the larger models are fine-tuned? Are they then out of reach of the fine-tuned smaller models?
They're probably referring to fine-tuning on private/proprietary data that is specific to a use case. Say a history of conversation transcripts in a call center.
Larger models, like OpenAI's GPT, don't have access to this by default.
Smaller models are likely more efficient to run inference and doesn't necessarily need the latest GPU. Larger language model trend to have better performance over more different type of tasks. But for a specific enterprise use case, either distilling a large model or use large model to help with training a smaller model can be quite helpful in getting things to production - where you may need cost-efficiency and lower latency.