Embedded Audio ML engineer here (albeit mostly outside of speech). A modern MEMS microphone uses typically 0.8 mA in full performance mode at 1.8V. Doing basic voice activity detection, which is the first step of a continuous listening pipeline, can be done in under 1 mA. Doing basic keyword spotting is likey doable in 10 mA. But this is only done on the part that the voice activity module triggered on. Lets say that is 4 hours per day. Then basic speech recognition, for buying phrases and categorization, would maybe cost 100 mA. But say only 10% of the 4 hours = 0.4 hours have keywords triggered.
That would give a total power budget of (1.824)+(104)+(100*0.4) = 123 mAh per day. A typical mobile phone battery is 4000 mAh. People do not expect it to last many days anymore... So I would say that this is a actually in the feasible range. And this is before considering the very latest in low power hardware. Like MEMS mics with 0.3 mA power consumption or lower, MEMS microphones with built-in voice activity detection, or low power neural processing units (NPU) that some microcontrollers now have.
This is amazing thanks for doing the math. Didn’t realize the tech was feasibly there already off the shelf. I mean my Apple Watch can detect me saying “Hey Siri” all day with its puny battery.
If big tech isn’t doing this then it sounds like a huge startup idea worth $$$. I hope someone on here in the spirit of HN runs with it and blows the top off this topic once and for all if it’s monetizeable or expose the FAANG patent sharks that come out to play and silence them for infringing on their shady microphone tech.
Hah, that's another great argument against this being a real thing: where are the startup pitches?
If this targeting technique works and is feasible and legal and in demand by advertisers, why isn't there a competitive group of startups all trying to do it better than each other and sell the results?
Now the conspiracy theory has grown to include "dozens of companies compete at this, all of them secretively operating in a marketplace that is entirely invisible to the outside world."
Another question that comes to mind now: would this sort of technique run afoul of some wiretapping laws among various states? One is not listing to a wake word to provide a direct response but rather to... idk. just a random thought.
Thank you for taking the time to post this informative response. As a sibling comment posted, didn't realize it was so feasible. When posting my original comment, i was thinking orders of magnitude more power would have been needed to facilitate this.