I used to use GOOG-411 all the time before I had a smart phone. I must have provided so much training data that it is no surprise Google from early on has been very good at Speech-to-Text conversion of my particular accent :D
I don't think it's the same purpose. YouTube, TV and Movies offers enough speech samples and a lot of content is dubbed to other languages, and alot of this content already has the transcripts available.
They know who's calling, and the greeting was something like "Hello again". They are catching up at building a competitive database of persons and their preferences at the scale of FAANG. They're moving over from collecting info for their models to collecting info from their users for their agents. This is what they need to offer good agents.
But I might be wrong and it's just phoneme collection, as you speculate.
Regular human conversational voice, especially over the phone, is going to be a gold mine for training customer support AI agents. Actors reading movie scripts can't really provide that amount of relevance.
Agreed on the broader use of data. That said, it’s not just about phoneme collection—different channels and UX modalities reach different audiences and contexts. Each channel ultimately delivers unique inputs, fueling more specialized and robust models tailored to those specific use cases.
The best part of GOOG411 was that they would connect you to the phone number, free of charge, across borders.
List a business with a Google voice number and you can call in, check messages, and _dial out_ from Google voice. Free international calls!
I was in school in Canada where we had a payphone in a hallway. People heard me randomly saying "Funny Business Name, City State ... Connect me" into the phone so much, it became a running joke.
When I eventually got my own phone, I transferred the number and I still have it.
Does anyone else remember a very short lived Google experiment that allowed you to call a number, vocalize your search, and somehow without any additional steps, the results appeared on the browser in front of you? (which was not connected to the phone, or even logged into a Google account)
Sweet little duckling. Before the Internet you had to call a human on a phone to find phone numbers. 411 was a widely known number, similar to how widely known 911 is today.
On coast to coast flights, there's often not a good way of knowing what movies are available until after you've left cell coverage. This makes simple research like checking the IMDb score challenging.
Alaska Air has a whitelist of messaging services that you can use for free during the flight. WhatsApp is on that list.
So if you want to research obscure plane movies on an Alaska flight, you can connect to their wifi and message either WhatsApp's built-in LLaMA or now ChatGPT.
I get it's just an example, but are we really this far gone? Just watch a random movie and if you don't like it, pick another. This is such an extreme micro-optimization of a small experience.
You don't often know if a movie is good until you've finished it, because it all depends on how the story came together in the end.
You can spend 2 hours watching a moving, emotional story that teaches you something new about the human condition and the choices we make in our lives.
Or you can spend 2 hours that turns out to be full of plot holes and inconsistent characters, where nothing makes sense in the end and you've utterly wasted your time.
In what universe would you not want to have that information before watching? Especially if you're generally a busy person and only get to watch 10-20 movies a year.
I truly don't understand the attitude of "just pick one", whether it's for movies or other things. That reviews are "micro-optimization". Like, do you just not value your time? Do you not care about quality?
It's not like reviews are always right. But one film with 98% on Rotten Tomatoes vs. one with 45%... that's a really strong signal. Why on earth would you choose to ignore that?
You aren't wrong but it's not an argument you're going to win. A bad restaurant or a dumpy hotel, etc won't kill you either, yet most people rely heavily on crowdsourced reviews. It's just a part of the culture today, given how prevalent ratings are. This isnt 1940, so suggesting, out of the blue, "just go watch the movie regardless if it is any good" isn't going to convince someone to do so.
> Life is about more than optimizing the movies you watch.
Where did I say it wasn't? That's a straw man.
But if you're going to watch a movie for the next two hours, then yeah -- your life is going to be about that movie. So why not choose wisely?
> Watching a bad movie is not going to harm you. Maybe you'll take something away, maybe you won't.
Straw man again. And again -- why not choose quality instead of choosing ignorance and rolling the dice?
> Much like having a bad day is unlikely to ruin your life - it'll just give some nice context to the good days.
Again, straw man. Nobody's talking about ruining your life. But why intentionally choose a bad movie...?
> And we're talking about watching them on the plane, so the "busy person" argument really doesn't apply here.
To the contrary. For a lot of busy people, the plane is one of the few moments they have time to watch a movie. So it sure does apply.
You're arguing in favor of choosing bad things, because it's not going to ruin your life. Huh? Shouldn't we have a higher bar for the things we try to choose to spend our time on? You're describing standards that are the lowest of the low -- as long as it doesn't harm you, it's fine. Don't seek anything better. Yikes. I've rarely come across a life philosophy more depressing.
This isn't a straw man - I'm not claiming you think life is all about movie optimization. I'm making the point that the effort of optimization might not be worth it in the broader context.
> Straw man again. And again -- why not choose quality instead of choosing ignorance and rolling the dice?
Also not a straw man. I'm illustrating that the downside of a bad movie is so minimal that extensive optimization might not be justified. This directly addresses your argument about opportunity cost by suggesting the cost is actually quite small.
> Again, straw man. Nobody's talking about ruining your life. But why intentionally choose a bad movie...?
Again, not a straw man. I'm making a proportionality argument about how much a sub-optimal movie experience actually matters in practice.
> To the contrary. For a lot of busy people, the plane is one of the few moments they have time to watch a movie. So it sure does apply.
Even on a plane, the stakes just aren't that high. A less-than-perfect movie isn't going to meaningfully impact your life regardless of how busy you are.
> You're arguing in favor of choosing bad things, because it's not going to ruin your life. Huh?
You're interpreting my position as "arguing in favor of choosing bad things," but that's just not accurate. I'm suggesting that the effort of optimization might outweigh the minimal downside of occasionally watching something mediocre. There's a middle ground between actively choosing bad things and obsessing over choosing only the very best.
> A less-than-perfect movie isn't going to meaningfully impact your life regardless of how busy you are.
There are movies I've seen that changed my life. If I'd watched a dumb movie instead, yes my life actually would have been meaningfully impacted for the worse. That's the power of art.
> I'm suggesting that the effort of optimization might outweigh the minimal downside of occasionally watching something mediocre.
It takes a few seconds to check Rotten Tomatoes. A movie is around two hours. In what universe would you rather waste a couple of hours in order to save a few seconds?
And it's not occasionally watching something mediocre. Most movies are mediocre. You have the choice of usually watching something mediocre, versus usually watching something high-quality.
Again, you're strawmanning with "obsessing over choosing only the very best". Where did I describe an obsession? I'm just saying, check Rotten Tomatoes to help pick a good movie. There's just no universe in which the tiny effort to do that is going to outweigh the two+ hours of boredom and frustration of a bad movie.
I genuinely don't understand how you can take the position you're taking with movies, when checking Rotten Tomatoes takes seconds (a minute if you're checking several) and a movie lasts for hours.
It's not perfect, but it's a very strong and useful signal.
Never in my life have I seen something with 98% and thought, well that was a crap movie.
And never in my life have I seen something with 35% and thought, that was amazing!
It's more in the 75-90% range where you have to consider the "dimensionality" of the thing, like whether it's a genre you like, or which individual reviewers match your tastes more precisely.
Because life just isn't perfect. If you go seeking this perfect optimization in every aspect of your life then the world is going to be a continual disappointment.
To me, unanimous positivity means a boring movie that takes no risks. I'd rather watch a people-love-it-or-hate-it film.
Usually tho I just watch trailers and judge the actors' chemistry to decide if I would enjoy watching those characters. What other people thought of it is not especially relevant. Particularly on flights I've watched some amazing foreign content that I just would not have stumbled upon if I was just watching whatever topped rotten tomatoes.
Better bring your tablet or similar device with your choice of content. Airplane screen quality is bad and movies are edited in weird ways (for being acceptable for all ages and cultures on loss of anything else)
I would expect nothing but hallucinations and nonsense coming out of any LLM regarding recently-released movies (aka. the ones you often find on flights).
In every post about LLMs there is someone to blindly say something like this.
When in reality if you ask ChatGPT for 10 good movies from this year you will get this.
Anora - Directed by Sean Baker, a compelling drama about the life of a sex worker in Coney Island.
Challengers - A provocative tennis drama directed by Luca Guadagnino, starring Zendaya.
Dune: Part Two - Denis Villeneuve's continuation of the epic science fiction saga.
Furiosa: A Mad Max Saga - An action-packed prequel exploring the origins of Furiosa, directed by George Miller.
Inside Out 2 - Pixar's sequel that dives deeper into the complexities of human emotions.
Wicked - A musical fantasy adaptation directed by Jon M. Chu
.
The Zone of Interest - A thought-provoking film about Auschwitz, directed by Jonathan Glazer.
The Idea of You - A steamy romance starring Anne Hathaway.
Hit Man - A comedy thriller starring Glen Powell.
The Outrun - A powerful drama about a recovering alcoholic, starring Saoirse Ronan.
Let me know if you'd like more details about any of these!
Those descriptions are less detailed than the information you will see on basically any streaming interface and yet it still manages to not being very good. For example, no person who had actually seen Anora would describe it as "a compelling drama about the life of a sex worker in Coney Island".
I haven't seen Anora so I'll give you that one, but you cited that as if it was just one of many examples, when in fact I think it's the only one, as all the other descriptions seem reasonable.
Originally the problem was supposedly that it would hallucinate complete and utter gibberish, but now here we are quibbling over one example and insisting that maybe it's not quite as good as alternative descriptions.
The gap between what was produced and what you're looking for is small enough that I think it could be covered with some slightly tweaked prompt instructions.
I'm not saying you're wrong but want to note how the goalposts keep seeming to shift whenever we talk about these capabilities.
I'm not Alupis. I can't and am not trying to speak on their behalf. I'm therefore not moving the goalposts they established. I'm making my own related point.
That point is that the information provided above about these movies is worthless. It does not add any new value beyond what would already be available in the streaming interface. Several of the descriptions are nothing but the genre and one person involved in the making of the movie. And yet even with these descriptions being incredibly short and vague, they still manage to contain at least one misleading summary.
I'm aware that you're a different commenter but you are addressing yourself to a comment that was in reply to them and therefore not necessarily appropriate to measure such a comment against entirely new criteria that you want to bring into the conversation.
Despite your protestations to the contrary, these descriptions seem perfectly fine in that they're accurate and meaningful. And it if you want to start getting fast and loose with all kinds of new extra criteria and requirements for what it's supposed to do, they all seem squarely within the reach of the capabilities on offer, with some prompt tweaks.
>these descriptions seem perfectly fine in that they're accurate and meaningful
The description of Wicked doesn't mention either The Wizard of Oz or the Broadway musical. So yes, the descriptions don't contain obscene mistakes like calling Wicked a courtroom drama. If that is enough for you to call these "accurate" while ignoring the vagueness or the 1 in 10 failure rate on the Anora description, fine by me. But you must have some weird definition of the word "meaningful" to apply that to descriptions like the one of Wicked. That simply isn't a helpful way to describe that movie.
The comment thread you're at the end of started with this:
> I would expect nothing but hallucinations and nonsense coming out of any LLM regarding recently-released movies (aka. the ones you often find on flights).
The comment that replied to it (the one that you're arguing against) provides evidence that proves it wrong. You are correcting someone who isn't incorrect, and I think the person you're responding to is very justified in saying you're moving the goalposts here.
A reply downthread is not an endorsement of everything said upthread. I'm happy to discuss the points I made, but I’m not going to be made to defend something I didn’t say.
Well if you're not endorsing what was said upthread, then your comment is a complete non-sequitir. The parent comment said "LLMs can't give movie recommendations for recent movies because they'll hallucinate or spout nonsense", the next comment responds with a list of accurate movie recommendations, and then you come in and say this:
> Those descriptions are less detailed than the information you will see on basically any streaming interface and yet it still manages to not being very good.
The points you made were not relevant to the discussion at hand. It's like if people were having a debate about where to find the best tacos in town and you stepped in to say "tacos aren't as good as hamburgers, you know" and then got upset that nobody wanted to debate that point with you. It's not everybody else's fault if you don't understand how conversations work!
I don’t know why you are letting that one reply define the bounds of this conversation. My comment was directly relevant to the first comment in this thread and the comment I was replying to.
I _have_ seen Anora and I think that description is perfectly fine. It certainly isn't "hallucinations and nonsense" which is what the parent comment is claiming. What part of that description do you consider "wrong"?
For comparisons, Wikipedia's opening paragraph for Anora reads:
"Anora is a 2024 American comedy-drama film written, directed, and edited by Sean Baker. It follows the beleaguered marriage between Anora (Mikey Madison), a young sex worker, and Vanya Zakharov (Mark Eydelshteyn), the son of a Russian oligarch. The supporting cast includes Yura Borisov, Karren Karagulian, Vache Tovmasyan, and Aleksei Serebryakov."
I haven't seen the film, but it doesn't seem incompatible with ChatGPT's briefer description.
Instead of just checking with a first party source, you ask a statistical guessing machine for an answer.
There was a disagreement about the answer, so we needed to dig deeper.
You bring up Wikipedia, a 3rd party source of information. That description could also be wrong (it’s probably not, but stick with me)
Instead of just checking with a first party source (IMDb is very easy to search on), we went through several layers of obfuscation.
This was an issue for Wikipedia early on, but it has citations, at least. AI doesn’t and doesn’t have an army of people constantly fact checking every answer generated either.
There’s no benefit to asking AI for information like this. Especially since the in flight summary has accurate information that’s more than “drama, sex worker, cony island”
Maybe something like perplexity is better, since it has citations, but I haven’t tried it for very long yet.
They were responding to a commenter suggesting it would produce completely unusable results, the question was never about whether the results produced would be redundant.
I know that any mention of fallacies, valid or otherwise, causes instinctive eye rolls, but in this instance I agree with them that this amounts to moving the goalposts.
This type of response is called moving the goal post. When someone responds to one claim, the claim is changed to something different which was not part of the original argument. This is debating in bad faith.
Great! Now show me a system that can verify that list for accuracy as well. Not to be flippant, but this is the complaint. You can't approach outputs uncritically. And no I don't want it to be as unreliable as a person who also forgets how English or basic knowledge works at random intervals.
They were responding to a comment that suggested that this was a category where the only thing you would get is unintelligible gibberish.
You don't even seem to be disputing the actual results here, just gesturing towards a kind of philosophy class exercise of whether we can ever "really" verify its accuracy. I see Wittgenstein's name increasingly tossed around in these parts (a good thing!), so I'll just note that one of the reasons he's hailed as one of the great philosophers of the 20th century is because he felt these puzzles about "really" knowing were frivolous.
I don't think I agree that what's needed here is some new and extra process of verification. I think the same usual quality control criteria that are already being used are good enough in this case.
> Great! Now show me a system that can verify that list for accuracy as well. Not to be flippant, but this is the complaint. You can't approach outputs uncritically.
In general you can't, but surely it's not that big a deal if ChatGPT offers an inaccurate summary of a movie you're about to use to kill time on a flight? I suppose it becomes important if, e.g., you're relying on it to tell you whether a movie is appropriate for children, but, if you're just asking it whether a movie is worth watching, that's a question that doesn't have an objective, factual answer anyway, so a hallucinated answer is probably about as useful as that of a not-previously-known reviewer.
> If I invested money into a film, I would want its representation in the world to reflect what the movie is about at the very least.
Sure, but that's the filmmaker's interest. As someone sitting on a plane trying to decide whether to watch a movie, I care about my interest, not that of the person who made it. I'm not particularly arguing for the use of ChatGPT here (I wouldn't use it), just that the risks it usually poses are fairly minimal in this case.
You're forgetting the information hazard of five years from now someone mentioning a movie and you saying "oh I didn't want to watch that because of the car chase" and everyone looks at you funny because it is a film set in the 1700s about a carriage driver.
You’d be pretty wrong, then. ChatGPT in particular will cite its sources via an internet site.
My wife wanted a pair of boots for Christmas that I couldn’t find in her size. Google was a wasteland of SEO, but ChatGPT found 5 sites and was able to tell me current stock levels.
Looks like this is using GPT-4 and has no knowledge after January 2022.
'
As of my knowledge cutoff in January 2022, the last movie I have information on is "Spider-Man: No Way Home", which was released in theaters in December 2021. It was one of the most highly anticipated films of that year, marking a major event in the Marvel Cinematic Universe (MCU) and the Spider-Man franchise.
'
Here's a comparison of asking ChatGPT and Meta AI about actual in-flight movie choices.
I pasted the same initial prompts in both, but Meta AI needed more clarification. When ChatGPT found multiple entries with similar titles, it gave information about all of them.
>[The Campaign] received mixed-to-positive reviews from critics. On the Rotten Tomatoes website, it holds a 65% approval rating from critics, based on 191 reviews, with an audience score of 60%.
The first thing I fact-checked, the Rotten Tomatoes scores are actually 66% and 51% respectively[1]. Probably not enough of a difference to sway any opinions, but an excellent example of the type of inaccuracy that the previous comment was referencing.
Llama in WhatsApp can search the web, so usually gets these queries right.
Hilariously it often believes that it can’t access the web and then hallucinates reasons for how it can know things beyond its knowledge cutoff date. But in any case, it works very well for this use case.
Web or other search access for LLMs really isn’t that new anymore, and I doubt that Grok will do a statistically significant sampling of everything on X, so I don’t really expect it to fare much better than a model with access to regular web search.
So the perception of those aged 50+, is one of people so far removed from technology they’d prefer to use a telephone to avoid their discomfort with computers?
I’m well into this group and still make a lot more api calls than phone calls.
Fresh out of college I recall vividly thinking, I’ll need to build an impressive list of side projects to overcome preconceptions about how much I can truly offer at my age. Maybe nothing has changed.
I had chatgpt read through my recent bloodwork results and helped me understand it better than my doctor.
50+ are going to be so addicted to this thing its not even funny. My parents are not reaching for AI immediately yet, but thats just a yet. This is the wave that could come at any moment.
My dad sells farming-related equipment to mostly older people and there are still people more comfortable giving him their credit card info over the phone instead of purchasing on his website online.
(Though I see that as mostly a failure of our financial industry. Credit card numbers should be obsolete by now.)
Just one data point but my father is in his 70s and has never owned a smartphone, when he wants to Google something he goes to the computer in the basement. On the other hand there are landline extensions all over the house. So yeah it would be more convenient for people like him.
the idea that someone who was 20 in 1995 is too old to be comfortable with computers is a horrifying and offensive stereotype that deeply worries me for my own future
our industry is old enough that the first generation of pioneers has died of old age.
Do you really think someone who grew up with computers in the 80s is incapable of using a smart phone? These are people who are still in the workforce today. These are your most skilled colleagues.
Some of them probably designed the device you think they're too old to understand
Not really. A lot of people in their 20s might have never actually done much with a computer, yet they cannot put their phone down. I know lots of 20-somethings that cannot type. I know even more that do not own a traditional "computer". It has nothing to do with fear, but lack of need.
I don't think anyone is denying the existence of Greybeards, it's more that the field has exploded so much in the meantime that the probability of a random 30 year old being in it is much higher than the probability of a random 50 year old.
You're surely right that they anticipate it being be a novelty that people share during holiday visits.
But as you can probably tell from the other replies, the idea that older people don't know how to use internet-era technology is a meme that was wearing thin 20 years ago already.
People who haven't had ChatGPT "land" for them yet are likely just people who don't find themselves asking a lot of questions they need a chatbot to answer, regardless of the medium. That probably has some age skew right now, but isn't really about the medium at all.
I’m a few years from 50 and while Google has deteriorated my Google fu and ability to see signals through the noise still serves me well enough and is my comfort zone.
When I dabble with chatgpt it always feels like I’m playing with a toy as I don’t really have a use case I’m taking to it. I’ve used a few websites creators and code generators which have been useful but also I don’t think they saved me much time overall. Web design, graphic design, etc and creative stuff are things I suck creating so it gives me a new power and is easy to iterate on. Otherwise, I’ve not found much actual value from it yet.
If it makes you much more efficient in your job, like it does for professional software devs, many of HN users, then i think you’re more apt to be excited by the tech
Aren't they doing a 12 days of Christmas thing where they release new features for 12 days? This would fit into that idea.
I was thinking earlier today that an agent listening to my calls would be helpful. I was on the phone with a financial institution that will require some followup. Being able to sync in an agent to transcribe and remind me would be valuable.
Your comment made me lol. And it’s very rare for that to happen to me via reading text. And I needed it today. So I just wanted to tell you thank you and I hope you have a good day.
Because it shows that it's perfectly plausible for people ages 50 plus to appreciate the value out of these technologies every bit as much as us whippersnappers. Some of them are writing books about it, after all
You seem to have forgotten the context of this conversation. Right now we're talking about whether 50+ somethings can appreciate the value proposition of the 1-800 line and more generally of the whole line of GPT releases presently coming out.
Pointing to the book authorship help support the intuition that our Gen X friends are able to get it, because after all, it's not out of the question for them to be involved in these very fields. I don't think any of those points, which again in this context are the points that are under discussion, are things that hinge on whether or not the book specifically addressed particular llm methodologies.
I like this a lot. I don't use AI a lot and I often find it annoying, so I don't eg feel the need to install the OpenAI mobile app (which I assume exists). Having ChatGPT in my WhatsApp (I live in a place where WhatsApp is everywhere) is a nice middle ground, lets me occasionally ask it stuff without worrying about accounts and projects and models and all that stuff. Cool!
You're right they did add anonymous access at some point, but it was quite a while ago I think. Smart move on their part. Makes casual use much more convenient.
This is a killer feature for me. In fact, I briefly explored building a semi-self hosted version for myself.
My biggest use case for ChatGPT voice mode is when I _need_ or _want_ to be handsfree. Think working around the house/yard, Driving, Walking around the grocery store, cooking, etc. I find that I end up using my iPhone's voice-to-text then simply communicate with text mode (in the case of driving, I stop). After all, once I have to touch my phone, it's just faster to work in text mode.
All of my devices know how to make calls. All of my devices know how to make calls from a voice command. All of my devices know how to hang up a call. This is really nice.
How ironic that it's not actually Apple delivering that despite being in the perfect position to do so (they have a deal with OpenAI for ChatGPT using Siri, have all the contextual knowledge they could ever need etc.) – my iOS 18.2 Siri + ChatGPT experience has been extremely disappointing so far: It seems to completely forget all context between questions, ignores me for follow-up questions 80% of the time etc.
I nerdishly get angry at Siri and Apple's Not Intelligence while driving. ChatGPT iPhone app i can have a whole conversation with and get things done... Siri on my iPhone 15 Pro running 18.2 is so frustratingly still dumb and a one now only two trick pony compared to the chatGPT's voice mode.
Im still hoping one of Open AI's 12 day announcements is they are creating a AI Phone with Microsoft called GPT and or an Phone AI OS.
That's cool ... i just want an awesome new holy moly personal device and I see a GPT phone could be it. Open AI has that Facetime an AI where it sees you and your surroundings and it acts like your talking to a real human.
I want that front and center in my AI phone as my personal assistant. The AI Phone's UI would be sparser.. wouldn't be a lot of UI (some app icons but not as app icon driven). While the image/video of your AI chat bot personal assistant you look and talk with could be a celebrity to a deceased relative or loved one (they live on and help you through your day to day). There's so much things to innovate and move forward from the boring iPhone!
Hopefully Open AI makes an even bigger announcement of getting into the personal device business soon (or later).
OMG! Try calling from Microsoft Teams :D You will end up with, "Thanks for calling Agenta".
Did OpenAI outsource and release this implementation with some of the company's internal phone numbers?
> not business critical to anyone and mostly just a toy
Except it's business-critical to OpenAI, who hopes to look impressive when you call the number.
Instead, for some unknown percentage of folks that call will become confused, or think OpenAI is a bit janky. Based on the anecdotes here, it seems the percentage of people who will experience this issue is not trivial either.
My guess is OpenAI paid a truck load for this 1-800 number and rushed it into "production" for this product launch without waiting for all old routing to be updated.
Maybe. I’m sure it was calculated either way and their mistake to make. Could be that they are working in the background and have a plan to resolve before next week. I don’t think people in general have such high SLA expectations. It’s a minor blip in the grand scheme of things.
Ah I got the same result calling from Google Fi. Thought this was a weird April fools joke for a bit. Then third call and it went through to GPT. Telcom is weird!
Reminds me of 1-800-MY-YAHOO. I remember hiking in a national park in the 90s and calling in from a pay phone and having my email read to me over the phone by a robot. I could record an audio response that was sent back as an attachment. Good times!
I agree. A good chunk of the tech trends in the last decade were indeed rent seeking, but silent revolution was happening in the transformers and the neural network architecture domain, which made today's products possible.
And I'd wager that there are silent revolutions happening all across colossus that's the tech industry that will become apparent in the next decade.
Jeff Bezos put it best during his recent interview at the 2024 NYTimes Dealbook Summit, "We're living in multiple golden ages at the same time." There's never been a better time to be alive.
That's easy for a billionaire to say, isn't it? Jeff Bezos is not exactly a reliable narrator here. His business practices are built on exploitation and externalising his costs (such as the massive environmental damage).
I agree about an abundance of apps, but what type of value are LLMs adding?
It can sometimes be useful to input a more "human" search and have something get spit out but 60% of the time it completely lies to you. I'm talking about questions related to web specifications which are public documents. Section numbers, standards names, etc.. will be completely made up.
Off the top of my head, and just for the last couple of months, and only outside of work (where its value is even more immense), it has saved one of my indoor plants, told me how to handle a major boiler problem that would have left us without a working boiler during a weekend in the winder, with the next "emergency" repairman only available on Monday, advised me to use Kopia as backup solution for my personal files instead of Syncthing, helped me choose the right type of glass for a painting frame, answered a couple of questions about bikes and helped me when I was stuck in an harmonic analysis of a piece of music. All of that are extremely valuable to me (if only for the time not wasted googling answers), and in none of them its potential hallucinating would have been an issue. And I can't count the number of times where "specialists" in bike repairs or plumbing told me something incorrect or outright false, so I've learned to deal with hallucinations already!
> And I can't count the number of times where "specialists" in bike repairs or plumbing told me something incorrect or outright false, so I've learned to deal with hallucinations already!
So much this. So many times I've argued with hired experts saying "can't be done" just to see yes, it can be done.
Yes, but which of those things would you not have resolved just as well 10 years ago? All those possibilities were added by the maturing web itself, as a genuinely novel change from having to source books or experts/friends in the days before.
I'm glad ChatGPT didn't lead you astray, but I'm not seeing what it's added here besides shuffling up the user interface in a way that you presently and subjectively prefer?
> I'm not seeing what it's added here besides shuffling up the user interface in a way that you presently and subjectively prefer?
This. But in the same sense the past 50 years merely changed interface from dusty textbooks in libraries to Google Search, and the past 100 years gave us dusty textbooks over writing to Royal Society, and that just replaced the option of asking a local whisperer or hoping you'll find answers on the Sunday mass.
Do not underestimate the power of being able to get an answer to your problem described, visualized, and perhaps complete with interactive demo to explore it further, in time it would previously take you to formulate the right search query that finally gives you relevant information.
EDIT:
And that's on top of all the arbitrary data transformations prior tools couldn't do. E.g. I'm increasingly often using GPT and Claude models to turn photos of (possibly hand-written) notes or posters into iCAL files I can immediately import into our family shared calendar.
Another frequent use case, data normalization. Paste a whole dump of inconsistently structured data multiple people collected (say, addresses of various local businesses that helped a local NGO and now are supposed to get a thank-you card for Christmas). Like, you get 200 rows of addresses in a single column, with spelling mistakes, repetitions, junk at the end, arbitrary capitalization, wrong order of address segments, and such; you need to separate it out into 5+ columns (name line 1, name line 2, street address, zip code, city, etc.) and have it all normalized.
The fastest and most robust way to do it as a one-off job, today, is to paste the whole thing to GPT-4o or Claude 3.5 Sonnet, tell it how the output should look (give one-two examples, mention some mistakes you saw), then send the message and wait 30 seconds for the job to be done for you.
(Yes, it may make mistakes - it didn't for me in recent memory, but it can. But for that, I quickly add an extra verification column for each one in LLM output, and do a simple case-insensitive substring match with original, and eyeball any data row that shows an error. And guess what, the formulas don't take much time either, since LLMs are good at writing them for you, too!)
My plant would have been dead. As for the rest, sure, I would have resolved them eventually, after many frustrated hours of googling and trial and error.
Time is my most precious thing, I already don't have enough time to do all the things that I want to do, I don't want to waste that trying to find and test solutions when ChatGPT gives me instant answers. I'd rather spend time playing with my cats or riding a bike instead. It's not a matter of UI, it's a matter of preventing waste of time, energy and money, and less frustration. For that alone, €20/month is a very good value. And that's just for my personal life.
"many hours of frustrated googling and trial and error" isn't a familiar experience to me, but I'll trust that it is for you. I'm glad you see that as behind you now with this. I suppose you must not be alone.
I wouldn't discount this effect. As someone with sensory issues, one thing I like about ChatGPT as opposed to the "raw" internet is that I can see the answer to my questions in a nice and calm textual format without some website who created the article specifically to catch my search terms, but is trying to get me to deceptively click on ads or pull me into buying something through their affiliate links. That's absolutely increased my own enjoyment and productivity.
objectively, it takes less time to ask a question and get a direct answer than it does to search for some words, leaf through a couple of results, find one that has the information you want, and then read that page. If I want to know the height of the Eiffel tower, being told it's 1083 meters tall is faster than searching for its website, finding the stats section, then locating that information on the page. Google realizes that, so they pull that info out of the page and just put it on the results page for you.
This is a thin edge of the wedge issue, right? ChatGPT is pretty darn good for most things. I’ve used it extensively for the past 18 months and only in a few cases would I say it “completely lied to me”.
My general rubric is: “would I trust someone on Reddit to correctly guide me on this”. If the answer is “yes” then ChatGPT is likely going to do well. If the volume on a particular subject is low / susceptible to false information then it’ll lie.
Recently it lied hard about how to configure MikroTik routers. I lost many hours. But for a large construction project recently it completely balled out.
Are you doing cutting edge / complicated stuff? Have you examples of where it lies?
No specific prompts, but most were related to the XHR/Fetch specs and behaviors within. It would say "X.Y.Z sections defines this" but that section didn't exist at all and the answer provided was not accurate.
> My general rubric is: “would I trust someone on Reddit to correctly guide me on this”. If the answer is “yes” then ChatGPT is likely going to do well
I see. Well, I don't know if I find that very valuable but if others do, then so be it.
Agreed this is a bad idea in the case you are replying to, but I love ChatGPT as a way to recover the name of a book or film I’ve forgotten. I recently prompted for “a book about nuclear wasteland dominated by a church” and it gave me A canticle for Leibowitz (which is great). I’m not sure how easy that would be any other way.
I wonder how many people are promoting it correctly. You can’t just query it like you might for google or something. It works best with lots of context and back and forth. And yeah, for many things you are going to get directional answers not exact ones (esp with “rote memory” like exact quotes from a book or something.)
I don't want to turn this into another Claude lies less than ChatGPT subthread but since you mentioned configuration of MikroTik routers I felt like I should.
ChatGPT lies a lot about RouterOS, I don't know why. Claude helped me a lot on the other hand with all things MikroTik.
I find it useful, and it brings value to me (literally: I exchange valuable money for API access), even if it doesn't for you. Many other people report the exact same thing. Just because you don't find value in a technology, doesn't mean that others don't.
In the past week I have used it for helping write a script in a framework I'm not super familiar with (OpenSCAD), I was able to finish a project in 5 minutes that otherwise would have taken me hours. I have used it to help make movie recommendations (none of them were hallucinated). I have used it to translate a conversation with a non-english speaker, etc. There are other tools that can help me do all of these things, but none quite as fast or painlessly.
It might not be useful for your use case of asking questions related to specific web specs, but that doesn't mean that the technology has no value. Horses for courses...
My experience with code completion tools (i.e. single line/method snippets) has been positive. But, anything more complicated seems to fall apart rather quickly.
I have upgraded to the $200 Pro tier, and, with o1-pro, all of my tasks delegated to the "junior" have been so much better. It takes longer to complete, of course, but the overall duration is less because I'm not having to go back and correct it as much as I was with 4o. It's been able to figure out problems that 4o continually failed on.
LLMs have been a personal tutor to me for the last year, able to explain anything and everything I've been curious about professionally and personally. I changed jobs to new technologies in large part because I effectively had an assistant able to help cover any gaps in knowledge I had, train me up quickly, and offer ongoing help on the job.
They can make stuff up, but saying "60% of the time they lie to you" hasn't been true for years.
>They can make stuff up, but saying "60% of the time they lie to you" hasn't been true for years.
If you're using them to fill knowledge gaps, what scaffolding have you set up to ensure that those gaps aren't being filled with incorrect-but-plausible-sounding information?
That's because we're currently largely not using them correctly, i.e. hooked up to RAG instead of hoping that they've memorized enough of the training data verbatim, which is arguably a waste of neurons in a foundational model.
Imaging being graded on your ability to quote exact line numbers of particular parts of your codebase as a senior software engineer without being able to look at it!
i think when people say things like this it indicates that they tried LLMs in 2022 and solidified their opinion there.
I had the same impression about the hallucinations 2 years ago. The reality is in at the end of 2024, you can get incredible value from LLMs.
I've used copilot to code almost exclusively now for the past few months. Anyone still comparing it to text completion I feel is operating on completely out of date information either intentionally or unintentionally.
I'd (generally) agree. About 5 minutes of using Flux, Claude or Suno would have provided more net new value than I've yet to get out of blockchain, self driving, gig brokers, metaverse, 5G, AR/VR, quantum computing, hyperloop, and whatever people were trying to make web3 be combined over the years. Not that I don't think all of these things will always perpetually fail to deliver (hell, if I had a chance to try Waymo already then self driving probably wouldn't be on the list), just the hype cycles were unrelated to when that delivery occurred (if ever).
The hard part is, despite actually having some "real" value delivered, you still have to sort through the 99% of bullshit that comes along with it anyways.
I will personally say that if you ever get the chance, definitely try a Waymo. I did recently for the first time and it's a hell of an experience. You can very vividly imagine it being the future.
I'm also going to stand up for AR/VR here. I'm in a long-distance relationship and me and my partner spend an hour or so in VRChat around two to three times a week. The power that has to reduce the badness of an LDR is well well well well worth the three hundred bucks I paid for a Quest. That and some of the golf games on it are fun.
I am super stoked to try a Waymo when I'm in a city with one. It's hype failures have more to do with 10 years of hype about its public availability yet not being available to 99% of the world's population 10 years later. Hype is useless without the result.
I've had an HTC Vive and an Oculus Rift 3 (Walkabout Mini Golf is one I tried!) and while I wouldn't try to argue NOBODY has found a use for it (somebody somewhere found uses for all of the things I mentioned, just not me and just not the majority of people like big new things are promised to) it never really ticked the "new value" box before they ended up in the closet for me.
That's totally fair. The tech is only barely coming out of the enthusiast adopter phase and there's not a critical mass of content on there to keep most people putting on the headset daily.
That and the ergonomics do still suck, even if I've mostly gotten used to them.
I do think VR will make it, though - starting with the kids. Apparently Gorilla Tag broke 1.5 million players recently, and those are mostly under-15s. The next generation is going to have a strange relationship with computers.
I had in mind the surge in LLM chat support and the surge in thin ChatGPT wrappers with a custom system prompt. Claude/ChatGPT do seem useful, "an AI companion for Microsoft Paint" less so.
And now we will have mediocre middlemen/gig economy brokers with bad customer service performed by AI agents that you can summarize with chatgpt and automatically reply back to. Progress!!
That's only because Zombo had everything. It was the original everything app/site that Musk so desperately wants X to be. Nothing can top that - not even AI.
something tells me all these bells and whistles around gpt are signs that scaling laws have plateaued, otherwise OpenAI et al. would focus more on improving model quality.
Maybe GPT-4 is the 1080p of LLMs: Noticeably better than 720p and 480p models, and not bad enough to warrant additional improvements.
Sure, 4K, 8K, ... are technologically available, but for the majority of use cases, 1080p is enough. Similarly, even though o1 and other models are technically feasible, for most cases the current models are enough.
In fact, GPT-4 is more than enough for 80% of tasks (text summarization, Apple (un)Intelligence, writing emails, tool use, etc.)—small models (<32B) are perfectly fine for those tasks (and they keep getting better too.)
Yes, surely they only have one type of Software Engineer and they all know how to improve model quality.
Alternatively, does it not seem more likely that they have different product groups? Surely the folks working on ChatGPT are an entirely different beast than the folks working in model development?
Yes, surely a sarcastic reductio ad absurdum of what was was said will inspire dialogue.
I think the GP's point is that their investing in new distribution channels could mean ROI in models has diminished significantly. Incidentally, I disagree with GP that's what this means-- this is another investment in brand awareness, AND data for multi-modal/audio. They might have gotten to 1080p for text chat but definitely not for voice chat.
Nothing more absurd than your response. OpenAI has a large engineering staff, it’s foolish to say they are all working on advancing models. The folks working on ChatGPT are going to continue working on ChatGPT. Let’s not even forget that O1 just got released recently.
Nothing I said was absurd in response to making an unsupported idea that model development has plateaued.
I don’t get this. Define focus and how is just improving model quality gonna allow OpenAI to survive, they need a mix of commercialization and model improvement. No $$, no gpus, no researchers, no improvements
thing tells me all these bells and whistles around gpt are signs that scaling laws have plateaued, otherwise OpenAI et al. would focus more on improving model quality.
o1-pro is that model. Expensive and slow, but significantly better at many tasks that involve CoT reasoning.
o1 is way better than gpt-4 imo, feel that many people just don't have complicated tasks/questions they have to do in their day to day. it's like a half jump between 3.5 and 4 to me
That's not been my experience, though I guess it depends on what you're using o1 for.
My experience is that o1 is extremely good at producing a series of logical steps for things. Ask it a simple question and it will write you what feels like an entire manual that you never asked for. For the most part I've stopped caring about integrating AI into software, but I could see o1 being good for writing prompts for another LLM. Beyond that, I have a hard time calling it better than GPT-4+.
lots of coding tasks, discussions about physics/QM. I find that it produces better quality answers than 4o, which often will have subtle but simple mistakes.
Even writing, where it is supposed to be worse than 4O, I feel that is does better/has a more solid understanding of provided documents.
Interesting, could you share an example of this where it provides something of value? I've tried asking a few different LLMs to explain renormalization group theory, and it always goes off the rails in five questions or less.
My guess is that the model got good enough to make its own bells and whistles — even the original 3.5 was good enough to make its own initial chat web UI.
I know it was that good, because I got it to do that for me… and then the UI kept getting better and the expensive models became the free default option and I stopped caring.
Calling it now: Google will ultimately lose this consumer battle. How? By doing what they've always done: building better tech. Gemini will be faster/better and it will have more features; but they will continue to fail to productize or explain it to consumers.
Google's offerings here are still a huge mess. OpenAI is crushing them right now at building products that people want, and making them accessible.
Have you been paying attention at all these past few weeks? Google is crushing it with releases. Gemini 2.0 is great, Veo2 is crushing Sora, live video conversation from aistudio... 12 days of OpenAI turned out to be 12 days of Google.
My respectful counterpoint is that most people aren't paying attention to tech releases at all, ever, unless they go viral like ChatGPT did.
I have very nontechnical coworkers get excited about cool new things ChatGPT can do, but I'm not certain any of them even know we _have_ Gemini in our Google Workspace.
This would hardly be the first time Google has produced innovative technology which eventually fizzles because it never captured much mindshare outside of the tech news circles
Google's recent launches have been technically impressive (especially Veo 2), but given the company's past track record on creating new products, I'm not very bullish that they can turn those launches into products with the same excitement and sense of direction as OpenAI at least appears to have. Google has the benefit of having platforms that span billions of devices and people, but with the looming threat of antitrust regulation, I'm not so sure they'll have the benefit of the last thing for long. Granted, I doubt that 1-800-ChatGPT will be a significant source of users for the product, but it does signal some of the creativity from the company that seems to be escaping Google regularly (see: NotebookLM's leads leaving to form their own startup).
Google search also was good at the beginning. Now it occasionally gives results that contain none of the keywords and have nothing to do with what I searched for.
Seems gimmicky. An audio wrapper on top of ChatGPT accessible by phone... Neither technically impressive nor an improvement to user experience. Sorry for the negativity, I'm trying to remain AI hyped but it's difficult.
I think so, but not a bad marketing gimmick. It gives a pretty easy way for the general public to interact with ChatGPT on a trial basis without signing up or paying for it using a somewhat hard to acquire identifier (phone number). I'm curious if they're doing anything to avoid abuse from spoofed numbers.
I can now ask my phone to call ChatGPT. 100% hands-free. It’s only a few steps less than using the app, but there’s a lot of incremental value to not needing to touch my phone.
Concrete example: I’m driving. I ask Siri a simple question, but it can’t answer it. Previously, if I wanted to use ChatGPT, I’d have to stop, pickup my phone, unlock, open the app, get my answer, then start driving again. I’d never do that. Now, I can just ask Siri to call ChatGPT
Those two persons fixedly looking to the other person reading some teleprompter that announces this feature -- likely written by ChatGPT -- is the weirdest thing ever.
I meant no disrespect, but from 2' or so, the conversation sounded more natural and things got smooth. Interest feature and I liked the 80's banner with the phone # like in the old TV ads!
Sure seemed lifelike. I started right in on asking it about the nature of consciousness and self awareness, and then pointed out that its own behavior matched the majority of the criteria it described while referring contextually to previous statements in the conversation. Then it turned into a seasoned politician, precisely understanding but vaguely answering in handwavey directions. Either that behavior is well-tuned, or it's in one form or another backed by human workers like Musk's humanoid robot theater. If the conversation raises certain flags then you escalate it to a human to preserve the illusion? Not asserting, just speculating from my brief few minutes poking at it
OpenAI is diversifying a lot, and I’m not sure that’s a good thing.
It’s great to ship fast. But you need to maintain things as well. And that requires even more time and engineers and money in the end.
There’ll definitely be projects within OpenAI that will be shutdown in a few months, just because it hasn’t caught and/or engineers want to work on something new.
That’s how Google worked in the 2000s - shipping new things fast - but then there was Reader and now they lost everyone’s trust.
> OpenAI is diversifying a lot, and I’m not sure that’s a good thing.
I'm not sure if I'd use term 'diversifying'. At least not in the sense of spreading themselves wider across more projects to reduce overall company risk (if that's what you meant).
I think that we're still very early into AI and because we're still not sure what kind of applications people will want to use in the future, it makes a lot of sense to experiment.
Something interesting (and vaguely concerning) I've noticed when playing around with it: I was wondering how they'd track the "15 minutes per month" usage for blocked called IDs, and it turns out that they are apparently still able to see the caller's number, or at least distinguish repeated from new callers!
I'm aware that the way "caller ID blocking" works is that it just sets a flag on the call metadata, and it's up to the receiving carrier to observe it and not present caller ID to the callee, but I'm not sure whether bypassing that is a common feature carriers (Twilio in this case) provide to their users. (It's also possible that the only thing Twilio exposes at the API level is a "recurring caller" boolean, of course.)
In any case, even skipping the disclaimer based on having called before seems like a problem: Different people can be using the same phone line at different times. Wouldn't it be required to still read out the disclaimer every time?
OpenAI says in the FAQs that "[t]he knowledge cutoff for 1-800-ChatGPT is Oct 2023". But when I message it on Whatsapp, it says cutoff is "January 2022" and that it's using "GPT-4".
Anyone know (and willing to share) what are they using for PSTN integration?
I've been looking into options for our non-profit tech startup (Ameelio) and about the best pricing I can find is about 1.35 cents per minute. It surprises (and saddens) me that it's still so expensive. I'm sure at a bigger scale you can negotiate better pricing, but based on the quick conversations I've had with vendors it doesn't get significantly cheaper.
[*] limited bandwidth (8 kHz), providing a valuable opportunity to enhance and specialize models for telephony applications, ensuring better performance and user experience even with low-fidelity audio inputs.
I mean, nothing prevents them from running their existing data through a "noisy POTS" filter in A/B tests to see how that impacts customer satisfaction.
But being able to blame the user's phone line probably goes a long way to avoiding unhappiness due to testing :)
I built this with twilio, STT, TTS and some glue a while back. Having a phone number to call to chat with GPT-4 was fun, but laggy and error prone. I primarily used it in the car for hands free discussions. I look forward to giving this new option a try!
I can't find any information about how to start a new conversation as opposed to continuing an existing one. I asked the service itself and it doesn't know. In fact it doesn't even know it's behind a phone number.
I hope they introduce a way to use Plus plan features/models. Would be neat to do quick queries in WhatsApp and forward results to friends & family without context switching/copy pasting.
It's funny, I asked it what model it is, and it replies:
> am based on OpenAI's GPT-4 model. Specifically, you are interacting with an instance of GPT-4, which is designed to understand and generate human-like text based on the prompts it receives. My responses are influenced by the extensive training on diverse datasets, but I do not have access to real-time data or events beyond my knowledge cutoff in January 2022.
But the linked page suggests knowledge cutoff date is Oct 2023. It hallucinated an answer even to that....
If you write “Zitrain” with only one “t”, it also works in regular ChatGPT. The speech processing likely has a similar effect of not matching the filter.
POTS voice quality is pretty minimal (8 kHz, 8 bit, one channel). I wouldn't be surprised if a model would struggle more to isolate a speaker there vs. a higher fidelity audio channel.
Then again, local noise reduction on modern phones/earbuds probably goes a long way to avoiding that problem.
Counterpoint: humans word best in quieter environments. If we're saying "...what!? It's crazy loud where you are!", you can't really expect AI to be much better?
it does the same as the chatgpt whatsapp chat, but well you can forward images to it, it can send your reminder emails in the future and can manage todos for you (some kind of memory)
if it would have gotten more traction i would have extended it that you can also forward emails to it and it responds to the original email as your assistant
(and hey, if someone from openAi is reading this, feel free to offer me a position as a product manager)
Honestly this seems like a really nice feature. There's been so many times I'm driving and just want to Google something. Used to use Google assistant for that, but I no longer use android auto, so that ended.
chatGPT: ~ You agree to openai terms and conditions...
Me: What's the square root of two?
chatGPT: What number do you want to know the square root of?
Me: Two
chatGPT: The square root of ten is approximately 3.1...
<click>
If they wanted to show how very non-understanding and un-intelligent chatGPT is, they are doing a great job. So much quicker to see in a voice interaction than through online query submissions.
I feel like everyone who has used a lot of AI tools has become accustomed to the LLM yap, but hearing it over TTS is much more annoying than when it's text you can skim through.
The yap factor is there, but they seem to be prompting this phone version to be more brief. I asked a few basic informational trivia questions and each response was 3 or four sentences. Less than the app or website version imo.
I did and should have given credit above for the voice, it's very good. I meant to comment on the verbosity of what was being said, not the quality of the TTS itself.
Afghanistan and Turkmenistan are allowed, but not China or Russia. Which makes legal sense, I guess, but did the Taliban takeover just take place too recently for Afghanistan to be placed on the embargo list?
My background isn't AI so I can't contribute to that. My background is WebRTC/telephony so I could build this. Even if I was involved in 'AI stuff' I would have zero impact, but I can build this!
Probably more about how they’re choosing to use resources
If they believe AGI is around the corner and they are competing with others to get there, seems silly to invest resources in standing up a phone line, etc.
(1) https://en.wikipedia.org/wiki/GOOG-411