Hacker Newsnew | past | comments | ask | show | jobs | submit | bachittle's commentslogin

I always have issues with TTS models that do not allow you to send large chunks of text. Seems this one does not resolve this either. Always has a limit of like 2-3 sentences.


That's just for their demo.

If you want to run it without size limits, here's an open-source API wrapper that fixes some of the main headaches with the main repo https://github.com/travisvn/chatterbox-tts-api/


it's the same model as o3, just with thinking tokens turned up to the max.


That's simply not true, it's not just "max thinking budget o3" just like o1-pro wasn't "max thinking budget o1". The specifics are unknown, but they might be doing multiple model generations and then somehow picking the best answer each time? Of course that's a gross simplification, but some assume that they do it this way.


> "We also introduced OpenAI o3-pro in the API—a version of o3 that uses more compute to think harder and provide reliable answers to challenging problems"

Sounds like it is just o3 with higher thinking budget to me


> That's simply not true, it's not just "max thinking budget o3"

> The specifics are unknown, but they might...

Hold up.

> but some assume that they do it this way.

Come on now.


Good luck finding the tweet (I can't) but at least one OpenAI engineer has said that o1-pro was not just 'o1 thinking longer'.


This one? Found with Kagi Assistant.

https://x.com/michpokrass/status/1869102222598152627

It says:

> hey aidan, not a miscommunication, they are different products! o1 pro is a different implementation and not just o1 with high reasoning.


That's a rather crappy product naming scheme.


I also don't have that tweet saved, but I do remember it.


free users don't have this model selector, and probably don't care which model they get so 4o is good enough. paid users at 20$/month get more models which are better, like o3. paid users at 200$/month get the best models that are also costing OpenAI the most money, like o3-pro. I think they plan to unify them with GPT-5.


That doesn't help much when we're asymptotically approaching GPT-5. We're probably going to be at GPT-4.9999 soon.


Not necessarily true. GPT-4.1 was released after GPT-4.5-preview. Next model might be GPT-3.7.


I'd be curious what proportion of paid users ever switch models. I'd guess < 10%


I switch to o1-pro on occasion, but it is slow enough that I don't use it as much as some of the others. It is a reasonably-effective last resort when I'm not getting the answer quality that I think should be achievable. It's the best available reasoning model from any provider by a noticeable margin.

Sounds like o3-pro is even slower, which is fine as long as it's better.

o4-mini-high is my usual go-to model if I need something better than the default GPT4-du jour. I don't see much point in the others and don't understand why they remain available. If o3-pro really is consistently better, it will move o1-pro into that category for me.


If you're not at least switching from 4o to 4.1 you're doing it wrong.


4o is better than 4.1 for a lot of things that are non-coding/general research.


The article mentions that spell and grammar checking AI was used to help form the article. I think there is a spectrum here, with spell and grammar checking on one end, and the fears the article mentions on the other end (AI replacing our necessity to think). If we had a dial to manually adjust what AI works on, this may help solve the problems mentioned here. The issue is that all the AI companies are trying too hard to achieve AGI, and thus making the interfaces general and without controls like this.


I replaced Cursor with continue.dev. It allows me to run AI models locally and connect it with a vscode plugin instead of replacing vscode with a whole new IDE, and it's open source.


I recommend looking at swe-bench to get an idea as to what breakthroughs this product accomplishes: https://www.swebench.com/. They claim to have tested SOTA models like GPT-4 and Claude 2 (I would like to see it tested on Claude 3 Opus) and their score is 13.86% as opposed to 4.80% for Claude 2. This benchmark is for solving real-world GitHub issues. So for those claiming that they tried models in the past and it didn't work for their use case, maybe this one will be better?


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: