Hacker News new | past | comments | ask | show | jobs | submit login

I am very open to believing that. I'd love to see some examples.



I agree, I think they need an example or two on that blog post to back up the claim. I'm ready to believe it, but I need something more than "diverse customer tasks" to understand what we're talking about.


You can fine-tune a small model yourself and see. GPT-4 is an amazing general model, but won’t perform the best at every task you throw at it, out of the box. I have a fine-tuned Mistral 7B model that outperforms GPT 4 on a specific type of structured data extraction. Maybe if I fine-tuned GPT-4 it could beat it, but that costs a lot of money for what I can now do locally for the cost of electricity.


Well it's pretty easy to find examples online, this one using Llama 2, not even Mistral or fancy techniques: https://www.anyscale.com/blog/fine-tuning-llama-2-a-comprehe...


They're quite close in arena format: https://chat.lmsys.org/?arena


To be clear, Mixtral is very competitive, Mistral while certainly way better than most 7B models performs far worse than ChatGPT3.5 Turbo.


Apologies, that's what I get for skimming through the thread.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: