Hacker News new | past | comments | ask | show | jobs | submit login

Fine-tuning is the most important part, but it is under intense research today, things change fast. I hope they can streamline this process because these smaller models can only compete with big models when they are fine-tuned.



Yes but fine tuning requires a lot more gpu memory and is thus much more expensive, complicated and out of reach of most people. To fine tune a >10B model you still need multiple A100 / H100. Let’s hope that changes with quantized fine tuning, forward pass only etc.


The OpenLLM team is actively exploring those techniques for streamlining the fine-tuning process and making it accessible!


You can fine-tune medium models 3..60B on a single GPU with QLoRA


What is the $ cost of a fine tune though? $500?


Can you fine tune on an M2 with adequate memory?


What exactly do you mean here that the smaller models can compete with the the larger once they are fine-tuned? What about once the larger models are fine-tuned? Are they then out of reach of the fine-tuned smaller models?


They're probably referring to fine-tuning on private/proprietary data that is specific to a use case. Say a history of conversation transcripts in a call center.

Larger models, like OpenAI's GPT, don't have access to this by default.


OpenAI’s API has fine tuning options for older GPT models: davinci, curie, babbage, and ada


It doesn't for the newer (relevant) ones. Fine-tuning them is expensive and slow, because they are large


Smaller models are likely more efficient to run inference and doesn't necessarily need the latest GPU. Larger language model trend to have better performance over more different type of tasks. But for a specific enterprise use case, either distilling a large model or use large model to help with training a smaller model can be quite helpful in getting things to production - where you may need cost-efficiency and lower latency.


Is there a good checklist or framework for fine tuning vs using a vector db to increase the context size




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: