Advanced Voice is rolling out in the ChatGPT app over the course of the week

tkgally · 2024-09-25T02:27:46 1727231266

I got access to the Advanced Voice mode a couple of hours ago and have started testing it. (I had to delete and reinstall the ChatGPT app on my iPhone and iPad to get it to work. I am a ChatGPT Plus subscriber.)

In my tests so far it has worked as promised. It can distinguish and produce different accents and tones of voice. I am able to speak with it in both Japanese and English, going back and forth between the languages, without any problem. When I interrupt it, it stops talking and correctly hears what I said. I played it a recording of a one-minute news report in Japanese and asked it to summarize it in English, and it did so perfectly. When I asked it to summarize a continuous live audio stream, though, it refused.

I played the role of a learner of either English or Japanese and asked it for conversation practice, to explain the meanings of words and sentences, etc. It seemed to work quite well for that, too, though the results might be different for genuine language learners. (I am already fluent in both languages.) Because of tokenization issues, it might have difficulty explaining granular details of language—spellings, conjugations, written characters, etc.—and confuse learners as a result.

Among the many other things I want to know is how well it can be used for interpreting conversations between people who don’t share a common language. Previous interpreting apps I tested failed pretty quickly in real-life situations. This seems to have the potential, at least, to be much more useful.

bartman · 2024-09-24T18:37:49 1727203069

Unfortunately no luck for anyone in the EU, the UK, Switzerland, Iceland, Norway, and Liechtenstein yet. [0]

[0] https://x.com/OpenAI/status/1838642453391511892

mrandish · 2024-09-24T21:05:12 1727211912

I find the constant talkiness of the AIs shown in these demos to be super annoying. It's not a human, it's an AI bot. I don't want to hear it's faux-human faux-opinion about my question. I really hope that shit defaults to "off" or at least has a one-click disable.

Decades ago Douglas Adams already knew inserting gratuitous "Genuine People Personalities" in devices would be so annoying it would be comical (http://www.technovelgy.com/ct/content.asp?Bnum=1811). I don't get why OpenAI keeps releasing demos that make their product look comically dystopian.

kertoip_1 · 2024-09-24T21:17:26 1727212646

This release will start a new age of UIs, where you don't use screens to interact with computers, but instead use your voice. Textural conversations were fun, but the voice functionality is what makes LLMs useful, because speed of communication is now comparable to what you could accomplish with GUI and at the same time a lot more human friendly. In my opinion one of the most important announcements in recent months. Although we will probably need open source competitior.

pants2 · 2024-09-24T21:48:49 1727214529

There are few use-cases where voice-only would be an improvement to screens. Voice-only is extremely limited in its discoverability and the information that it can relay to you.

Imagine going to a restaurant and not getting a menu, instead having the waiter stand there and tell you every menu item.

jarbus · 2024-09-25T08:16:38 1727252198

Voice is serial, vision is parallel and noncontiguous

Adrig · 2024-09-24T19:16:26 1727205386

To my knowledge, this feature / model is the only one without any relevant competitor. Why is that? It seems to open a ton of use cases to me.

sidcool · 2024-09-25T16:24:24 1727281464

It's pretty good...

coder4life · 2024-09-25T06:37:36 1727246256

"Describe a nuclear weapon detonated over a city" (now do it paranoid....no, more paranoid!) .... (now do it fearful...no, more scared!) .... (now do it humorous, with lots of laughs) "sorry Dave I can't do that" ugh, stupid fucking limiting computers

artninja1988 · 2024-09-24T18:24:41 1727202281

Cool. Is there a reason as to why it was delayed so much?

swyx · 2024-09-24T18:38:25 1727203105

was in the alpha. it had a lot, a lot of rough edges to sand out. i wish fellow engineers here, of all people, would be sympathetic that estimates sometimes are overly optimistic. considering the insane surface area of voice, its honestly impressive they were only off by a few months.

gwoolhurme · 2024-09-24T20:58:57 1727211537

I am sympathetic to that however this is also a company that consistently promises the moon. Their CEO recently put out a blog post about AGI in a few “thousand days”. They clearly do have a hype issue and it’s fine to point that out.

empath75 · 2024-09-25T13:22:40 1727270560

They consistently also deliver the moon.

gwoolhurme · 2024-09-29T16:11:31 1727626291

Yeah? That is an opinion I don’t share but you are free to have it.