Fine-tuning is the most important part, but it is under intense research today, ...

uniqueuid · on June 19, 2023

Yes but fine tuning requires a lot more gpu memory and is thus much more expensive, complicated and out of reach of most people. To fine tune a >10B model you still need multiple A100 / H100. Let’s hope that changes with quantized fine tuning, forward pass only etc.

chaoyu · on June 19, 2023

The OpenLLM team is actively exploring those techniques for streamlining the fine-tuning process and making it accessible!

visarga · on June 19, 2023

You can fine-tune medium models 3..60B on a single GPU with QLoRA

quickthrower2 · on June 19, 2023

What is the $ cost of a fine tune though? $500?

vczf · on June 19, 2023

Can you fine tune on an M2 with adequate memory?

comfypotato · on June 19, 2023

What exactly do you mean here that the smaller models can compete with the the larger once they are fine-tuned? What about once the larger models are fine-tuned? Are they then out of reach of the fine-tuned smaller models?

rmbyrro · on June 19, 2023

They're probably referring to fine-tuning on private/proprietary data that is specific to a use case. Say a history of conversation transcripts in a call center.

Larger models, like OpenAI's GPT, don't have access to this by default.

quickthrower2 · on June 19, 2023

OpenAI’s API has fine tuning options for older GPT models: davinci, curie, babbage, and ada

kohlerm · on June 20, 2023

It doesn't for the newer (relevant) ones. Fine-tuning them is expensive and slow, because they are large

chaoyu · on June 19, 2023

Smaller models are likely more efficient to run inference and doesn't necessarily need the latest GPU. Larger language model trend to have better performance over more different type of tasks. But for a specific enterprise use case, either distilling a large model or use large model to help with training a smaller model can be quite helpful in getting things to production - where you may need cost-efficiency and lower latency.

moneywoes · on June 19, 2023

Is there a good checklist or framework for fine tuning vs using a vector db to increase the context size