I am very open to believing that. I'd love to see some examples.

turnsout · on Dec 20, 2023

I agree, I think they need an example or two on that blog post to back up the claim. I'm ready to believe it, but I need something more than "diverse customer tasks" to understand what we're talking about.

bugglebeetle · on Dec 20, 2023

You can fine-tune a small model yourself and see. GPT-4 is an amazing general model, but won’t perform the best at every task you throw at it, out of the box. I have a fine-tuned Mistral 7B model that outperforms GPT 4 on a specific type of structured data extraction. Maybe if I fine-tuned GPT-4 it could beat it, but that costs a lot of money for what I can now do locally for the cost of electricity.

GaggiX · on Dec 20, 2023

Well it's pretty easy to find examples online, this one using Llama 2, not even Mistral or fancy techniques: https://www.anyscale.com/blog/fine-tuning-llama-2-a-comprehe...

shiftpgdn · on Dec 20, 2023

They're quite close in arena format: https://chat.lmsys.org/?arena

TOMDM · on Dec 20, 2023

To be clear, Mixtral is very competitive, Mistral while certainly way better than most 7B models performs far worse than ChatGPT3.5 Turbo.

shiftpgdn · on Dec 20, 2023

Apologies, that's what I get for skimming through the thread.