I always have issues with TTS models that do not allow you to send large chunks of text. Seems this one does not resolve this either. Always has a limit of like 2-3 sentences.
That's simply not true, it's not just "max thinking budget o3" just like o1-pro wasn't "max thinking budget o1". The specifics are unknown, but they might be doing multiple model generations and then somehow picking the best answer each time? Of course that's a gross simplification, but some assume that they do it this way.
> "We also introduced OpenAI o3-pro in the API—a version of o3 that uses more compute to think harder and provide reliable answers to challenging problems"
Sounds like it is just o3 with higher thinking budget to me
free users don't have this model selector, and probably don't care which model they get so 4o is good enough. paid users at 20$/month get more models which are better, like o3. paid users at 200$/month get the best models that are also costing OpenAI the most money, like o3-pro. I think they plan to unify them with GPT-5.
I switch to o1-pro on occasion, but it is slow enough that I don't use it as much as some of the others. It is a reasonably-effective last resort when I'm not getting the answer quality that I think should be achievable. It's the best available reasoning model from any provider by a noticeable margin.
Sounds like o3-pro is even slower, which is fine as long as it's better.
o4-mini-high is my usual go-to model if I need something better than the default GPT4-du jour. I don't see much point in the others and don't understand why they remain available. If o3-pro really is consistently better, it will move o1-pro into that category for me.
The article mentions that spell and grammar checking AI was used to help form the article. I think there is a spectrum here, with spell and grammar checking on one end, and the fears the article mentions on the other end (AI replacing our necessity to think). If we had a dial to manually adjust what AI works on, this may help solve the problems mentioned here. The issue is that all the AI companies are trying too hard to achieve AGI, and thus making the interfaces general and without controls like this.
I replaced Cursor with continue.dev. It allows me to run AI models locally and connect it with a vscode plugin instead of replacing vscode with a whole new IDE, and it's open source.
I recommend looking at swe-bench to get an idea as to what breakthroughs this product accomplishes: https://www.swebench.com/. They claim to have tested SOTA models like GPT-4 and Claude 2 (I would like to see it tested on Claude 3 Opus) and their score is 13.86% as opposed to 4.80% for Claude 2. This benchmark is for solving real-world GitHub issues. So for those claiming that they tried models in the past and it didn't work for their use case, maybe this one will be better?