Do these devices record all the time or only after the trigger word (they would ...

jen_h · on Oct 21, 2019

The device has to record all the time in order to "listen" for the wake word.

It's got a small couple-second buffer (enough to store "Amazon" or "Computer" or "Alexa" or "Echo") where it takes what it hears and compares it with its internal model for a match.

If there's no match, the buffer is overwritten with the next bit of noise. Once the device gets a wake word match, it transmits the statement that follows to home base to transcribe and handle.

Legogris · on Oct 21, 2019

From what I understand, it's slightly worse - the local matching is quite promiscuous in that it will be likely to trigger on false positives and then forward it to the remote backend where the actual match is confirmed, where sent data includes the entire buffer including a couple of seconds before and after.

jen_h · on Oct 21, 2019

You don’t really even need real false positives...case in point: My grandma has trouble saying “Alexa,” so we set it to “Amazon.” She listens to the news 24-7...& Amazon’s always all over the news...so there are a lot of stored news snippets on our account.

javagram · on Oct 20, 2019

They record only after the trigger word. It's the same as Android phones and iPhones that have "OK Google" or "Hey Siri" enabled.

In practice we know that trigger words for all these devices occasionally misfire (There was also an issue with one type of google home device a while ago which was shipped with a faulty physical button that caused it to be turned on at intervals as if the user had pressed the physical button to start speaking).

jMyles · on Oct 20, 2019

The more important question is: how do we know whether these devices (or a particular subset of them) record all the time or only after the trigger word?

nodamage · on Oct 21, 2019

Regardless of intent, we know that all three major voice assistant services (Siri, Alexa, Google Assistant) experience false positives and end up accidentally recording conversations when the device thinks the trigger word was spoken, but actually was not.

skeletonjelly · on Oct 20, 2019

By

a) viewing what they store via their log tools (though this isn't guaranteed to show everything, ie if they are recording everything they couldhide)

b) monitoring outbound network connections

kortilla · on Oct 20, 2019

Neither of those things are indicative. Secretly recorded things could be hidden from logs and bundle up recordings with normal voice queries on the network calls.

jka · on Oct 20, 2019

... while also considering the possibility of faulty software updates, bugs, and network attackers -- in an environment where hardware, network protocols, and APIs are proprietary and inscrutable.

And would we know if they had been recording unnecessarily?

cactus2093 · on Oct 20, 2019

Well they don't have big hard drives, so you can be confident they're not recording everything to disk that then could be unintentionally accessed or sent out later.

And you can look at network traffic (e.g. from wifi router stats) to be pretty confident they're not constantly live-streaming audio up to the cloud.

Of course most people will not actually do this monitoring themselves, but there are enough of these devices out there that if a significant number started recording constantly somebody would notice pretty quickly. And that would be terrible PR for the company involved, so I think google and amazon and apple have a pretty strong incentive not to do this.

kadoban · on Oct 21, 2019

How much of a hard drive would they need to do speech-to-text and upload that periodically in with other legitimate traffic?

The PR angle isn't that reassuring to me either, they've already absorbed some pretty bad PR hits on these devices and they're still going strong.

swebs · on Oct 21, 2019

I'd imagine using something like Wireshark to see how much data it transmits at any given time would be a good start.