I agree, I think they need an example or two on that blog post to back up the claim. I'm ready to believe it, but I need something more than "diverse customer tasks" to understand what we're talking about.
You can fine-tune a small model yourself and see. GPT-4 is an amazing general model, but won’t perform the best at every task you throw at it, out of the box. I have a fine-tuned Mistral 7B model that outperforms GPT 4 on a specific type of structured data extraction. Maybe if I fine-tuned GPT-4 it could beat it, but that costs a lot of money for what I can now do locally for the cost of electricity.