Hacker News new | past | comments | ask | show | jobs | submit login

so this really does happen then? Because I used to be convinced it wasn't a coincidence when I saw ads online for some niche uncommon topic I had recently talked out loud about.



This matches the audio signature of the TV ad - basically, it's like Shazam, but for TV ads.

It's currently not economically possible to listen to user's conversations, transcribe them to text, and serve ads based on that. It would cost orders of magnitude more in processing power than you could get from the extra sales.

This might change in the future, of course


Yeah, my understanding was that it was audio fingerprinting tv ads, not transcribing anything, but I wouldn’t be surprised if they were trying to vacuum up other stuff. That said, I think it should be feasible to do basic low-accuracy transcription on-device, especially with all the neural engine hardware making inference more efficient.


Wouldn't cost that much if the transcribing is done on device


This would be immediately obvious in a cursory analysis of performance. On-device transcription is not only computationally infeasible, it would also require model capabilities far beyond what is currently SOTA.

Google had (and has afaik) significant challenges implementing multiple wake-word detection for precisely this reason.

Transcribing a couple of words accurately on-device without a major performance penalty (so that it can be running in the background always) is just _barely_ coming out now.


I would have to take your word for it but my phone is able to transcribe speech with no problem and no internet connection.

Of course running it 24/7 in the background would ruin my battery, you would have to be smarter than that.


Which phone/app? I would be very surprised if a manufacturer has an entirely on-device real-time ASR model, maybe I'm behind.


rewind.ai has entered chat.


There's this weird narrative I see that "computers just aren't powerful enough" to do things I remember them already doing on Pentium 1 class machines in the 90s.


> It's currently not economically possible to listen to user's conversations, transcribe them to text, and serve ads based on that.

Anedoctally I belive Meta does something like that because I consistently get ads on Instagram about topics I talk with a friend on Whatsapp and sometimes that is done completely via audio messages. Though I might be wrong and leaked the topics in text messages among other possibilities.

I think it can be economically feasible. They can have a model optimized for their topics which can be orders of magnitude faster than general-purpose speech recognition. Low accuracy probably wouldn't be an issue as they are able to fine tune the user topics of interest via its interactions with the ads (e.g. click rate, time spent before scroll).




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: