OpenAI has the biggest appetite for large models. GPT-4 is generally a bit better than Gemini, for example, but that's not because Google can't compete with it. Gemini is orders of magnitude smaller than GPT-4 because if Google were to run a GPT-4-sized model every time somebody searches on Google, they would literally cease to be a profitable company. That's how expensive inference on these ultra-large models is. OpenAI still doesn't really care about burning through hundreds of billions of dollars, but that cannot last forever.
This, I think, is the crux of it. OpenAI is burning money at a furious rate. Perhaps this is due to a classic tech industry hypergrowth strategy, but the challenge with hypergrowth strategies is that they tend to involve skipping over the step where you figure out if the market will tolerate pricing your product appropriately instead of selling it at a loss.
At least for the use cases I've been directly exposed to, I don't think that is the case. They need to keep being priced about where they are right now. It wouldn't take very much of a rate hike for their end users to largely decide that not using the product makes more financial sense.
They have, Anthropic Claude Sonnet 3.5 is superior to GPT-4o in every way, it's even better then their new o1 model at most things (coding, writing, etc.).
OpenAI went from GPT-4, which was mind blowing, to 4o, which was okay, to o1 which was basically built in chain-of-thought.
No new Whisper models (granted, advanced voice chat is pretty cool). No new Dalle models. And nobody is sure what happened to Sora.
OpenAI had a noticeable head start with GPT-2 in 2019. They capitalized on that head start with ChatGPT in late 2022, and relatively speaking they plateaued from that point onwards. They lost that head start 2.5 months later with the announcement of Google Bard, and since then they've been only slightly ahead of the curve.
It's pretty undeniable that OpenAI's lead has been diminished greatly from the GPT-3 days. Back then, they could rely on marketing their coherency and the "true power" of larger models. But today we're starting to see 1B models that are undistinguishable from OpenAI's most advanced chain-of-thought models. From a turing test perspective, I don't think the average person could distinguish between an OpenAI and a Llama 3.2 response.