Hacker News new | past | comments | ask | show | jobs | submit login
How to listen to (and delete) everything you've ever said to Google (theguardian.com)
217 points by nkurz on Oct 20, 2015 | hide | past | favorite | 88 comments



The first thing the article says is that users are probably aware Google stores all of these recordings. I, for one, was not aware. Am I really in the minority?


I was surprised and dismayed to discover that Skype does something similar.

Skype stores your voice mails and video messages forever[1]. This is something that they started doing 2-3 years ago and few people seem to be aware of it. Unlike Google, they don't provide any way to delete them.

[1] Details: Clicking on Preferences -> Privacy -> Delete history (OS X) or Options -> Privacy Settings -> Clear history (Windows) pretends to delete the voice/video messages but it merely hides them from your view. If you re-install Skype on the same computer or run Skype on a different computer, all those "deleted" voice mails and video messages re-appear.


That is not good. The least they could do is be explicit and clarify you're deleting the local copy (which I hope is what they're doing).


Google, Siri, Cortana and Amazon Echo all store your voice recordings. They are explicit about this in the fine print. Google and Amazon allow you to listen to these recordings yourself[1][2]. Not sure about Siri or Cortana.

[1]https://history.google.com/history/audio [2]https://www.amazon.com/gp/help/customer/display.html?nodeId=...

What's more, you may get to listen to the recordings, but you're not the only one: https://news.ycombinator.com/item?id=9101875


Siri as of 2013: http://www.wired.com/2013/04/siri-two-years/ "Whenever you speak into Apple’s voice activated personal digital assistant, it ships it off to Apple’s data farm for analysis. Apple generates a random numbers to represent the user and it associates the voice files with that number. This number — not your Apple user ID or email address — represents you as far as Siri’s back-end voice analysis system is concerned.

Once the voice recording is six months old, Apple “disassociates” your user number from the clip, deleting the number from the voice file. But it keeps these disassociated files for up to 18 more months for testing and product improvement purposes.

“Apple may keep anonymized Siri data for up to two years,” Muller says “If a user turns Siri off, both identifiers are deleted immediately along with any associated data.”"


Google asks you if you want to store your data to improve their model. Presumably they augment their model of you with the voice data and corrections you supply (voice-typed words are blue in android, and selecting one brings up a dropdown of alternate words that would be stupid not to record...).

I guess I always knew this data was being recorded, but honestly I would rather that Google not have released it via this interface. My Google account is not as well-secured as Google's backend, and if someone compromises my account I don't want them to be able to download recordings of my voice saying things (I probably have enough google searches on voice to make a soundboard for anything).

But, I can also appreciate that few users will appreciate the privacy implications of choosing to personalize the google voice model without a service like this, so...


> I guess I always knew this data was being recorded, but honestly I would rather that Google not have released it via this interface. My Google account is not as well-secured as Google's backend, and if someone compromises my account I don't want them to be able to download recordings of my voice saying things (I probably have enough google searches on voice to make a soundboard for anything).

I would argue, though, that the majority of people want to keep control of their data as much as possible. The responsibility to take care of it and it's security is just a corollary of that.


>I would argue, though, that the majority of people want to keep control of their data as much as possible

This seems like a nonsensical statement in 2015. What basis do you have for it? Most people hand over control of their data without a care in the world.


> This seems like a nonsensical statement in 2015. What basis do you have for it? Most people hand over control of their data without a care in the world.

Have you watched the news lately? At least in Europe that is most definitely what users want and what the laws are giving them the right to and the courts are enabling them to do.

Unless of course, you are trying to derail this argument by meaning whenever something isn't saved on a hard drive in my safe it's out of my control. Which would be complete BS and irrelevant to the discussion (which is about "is Google trying to give control over their data to the users").


Fyi,

Skype, Google, YIM, AIM, etc. all store large portions of your communications far longer than you realize and their "clear" mechanisms frequently are local-only. [i.e. Skype, reinstall, things reappear]

> The first thing the article says is that users are probably aware Google stores all of these recordings. I, for one, was not aware. Am I really in the minority?

Nah. Most people are convinced the deletion stuff actually works and/or people wouldn't engage in this sort of behavior and are surprised when they find out.

The reason people don't care about privacy is they don't really comprehend what they've lost and how extensive it is.


I was not aware of this until a couple days ago, when I stumbled upon that (OP) article. Surprisingly, though, my privacy settings were already configured to not keep these recordings.


I was not aware of Google's policy for this issue, but I knew that Apple keeps all recordings for at least 2 years... so I was thinking that they did about the same.


Last I read, that was anonymized data. Or have things changed?


Anonymized voice recordings? I can't recall if the last article that I read about that a while ago mentioned it, but either way, I would say that it is pretty much impossible to completely anonymize a voice recording.


Nor would anonymising a voice recording (presumably by doing some sort of frequency-domain transform) be of much use to them for the sort of thing they're using them for - training speech recognition on a wide variety of different voices.


It's virtually impossible to anonymize voice. Except by doing voice to text, and then synthesize voice.


Even if not "completely", there is a difference to me as a consumer


The data has a common ID, but that ID isn't linked to your Apple ID, and you can change the ID at any time.


Of course not. Most people don't even think about this stuff.


Well, I don't know. A typical HN reader should not be surprised by this. You know technology, you know how Google operates, it should be fairly obvious that they store everything, forever.

When I saw the article, my reaction was "hey, thanks for letting me know what the link actually is". I was not surprised at all by it. I sort of expected it.


I assumed they did something to improve matching, but I never imagined they stored the 'raw' (as in actual voice, albeit probably compressed) recordings.


So they store Google hangout video chats forever?


You might know something when thinking about it but never have been aware of it. "Oh yeah, I guess all this time they must have been doing that."


If you've kept up with internet privacy news the last few years from things like the snowden or ashley madison scandals, how is it not your assumption that everything that is exposed to the internet is stored and indexed somewhere forever?


There's a psychological difference between knowing it in the abstract and actually being confronted with and able to play back that history.

And least for this, there's a way to delete it.


How do we know for "sure" that it is deleted?


There are laws against retaining user-data that the user has explicitly requested to be deleted and Google spends a lot of engineering time making sure, that is really the case (which is non-trivial e.g. with failing drives that you can't access anymore. You still have to delete the data from them, meaning you must know, at every point in time, the physical drive that a users data is saved on). I think 30 days is the guarantee they make.


as per my previous examples, with the ashley maddison website users would pay to get their information "deleted" and it ended up not being deleted at all, just hidden.


Perhaps that's a fair point if you are the kind of user that keeps up with all that you've detailed. To say that users who use of Google's voice controlled features are probably aware of it is another thing entirely, and highly unlikely.


I'm a bit surprised myself, more at the potential storage cost than the creepiness. Even at a low bit-rate, that's 10 MB/hr, which would add up quickly if many people are actually using Google voice search.


Google says Google was doing a bit over 3 billion queries a day. Let's say 1% are voice (which is prolly way high). 30M voice queries at 5 seconds each is 42K hours. At 10MB per hour that's 420GB per day. Seems rather trivial.


Especially compared to YouTube, which was estimated to be growing at roughly 200TB per day 3 years ago:

https://sumanrs.wordpress.com/2012/04/14/youtube-yearly-cost...


every time I use voice search on my mobile (the only place I use it) I get a toast popup that says 'saving audio to $email_address'. So I knew about this.

I use an android phone, I have however noticed that voice search on the desktop doesn't issue this warning.


Do people generally like that this UI exists, or not?

To everyone who says "wow that's creepy, I wish I had known, God knows what else they have-" do you feel better or worse seeing this UI available, and having the option to turn off services or remove specific records?

I'm honestly curious, because Google is obviously trying to react to the "transparency" criticism. I also wonder who would be offended by a UI that gives you a method to stop using the services you find creepy. Seriously, you can opt out, it explains how pretty easily.


Google knowing a bit about me & my voice is little scary, but hearing my 8yr old daughter ask about the cosmos & other scientific questions on Google, is to me, priceless.


Any chance you are willing to share the content of some of those questions (not the recording, just the questions)?

I've seen little kids ask me and other adults about what a cordless phone was (even "why does it have so many buttons?" from the young owner of an iPhone who failed to grasp the idea of a screen without touch input), why they can't interact with the cover of a magazine by poking at it, and why we can't just take pictures of bacteria with a smartphone camera instead of looking through a microscope. I'm curious what a child would ask Google and what the answers would be given the vastly different expectations each generation has of every day technology.

I have really bad experiences with voice recognition going all the way back to early Dragon Naturally Speaking versions and I can't imagine asking google voice search a question so instead I always just text search for relevant topics or go on the science exhanges/forums.


@akiselv: Some of the questions my daughter has asked:

> how does an earthquake happen > when was the first Christchurch earthquake > what are the well known constellations > how big is cat VY Canis Majoris > what is the biggest star > how old is Pluto > what is Fusion > how was the sun formed

This is from one session.


And your daughter is only 8? Damn, nice job on the parenting. Might I ask what you've done as a parent to help get her to the point where she's asking these sorts of questions?


Early on I did show her how to use the Google Now to ask questions - like how far is the Moon & like. That's about all the parenting I have done in this regard. She has been watching way too many How Stuff Works videos lately - I want to see if that spills over into her being more curious about things.


Very cool. Have you by any chance considered the Amazon Echo or teaching her how to use Wolfram Alpha?


I don't own the Echo, might be something I would consider buying at some point.

I do have access to WolframAlpha but haven't yet taught her how to use it - maybe it's a bit early for her, I'll give it a go.


parenting done right, I'm envious now :D

keep up the good work


Other comments say that these recordings are unlinked from your account after some time (6 months for apple) or you turn this feature off. In a previous HN [1] post we learned that third parties can listen to these anonymized recordings to improve the system, and there are actual humans listening to it...

What just jumped to my mind is that, leaving aside how unique your voice can be, or how identifiable your behaviour and patterns are (hint: a lot), there are a lot of searches that reveal unmistakable and straight personal info. What about:

"Hey phone, text [name of my special other] [naughty and quirky stuff]"

I wouldn't care too much if someone listened to me ask for a Starbucks, but I certainly don't want anyone to listen to that.

[1] https://news.ycombinator.com/item?id=9101875


I dont like it either but most of this stuff already exists with them in email, texts, im's etc.


Playing with Maps one evening led me to find that at one point I had enabled some kind of tracking, and every single day of my movement was graphed in Maps. Where I went and how long I was there was recorded by Google.

I should do something with this data...


There's a neat site where you can turn your location data into a heatmap: https://theopolis.me/location-history-visualizer/



I just had a look at the information Google has been keeping on me, and noticed that when I visited the visitor center for Zion National Park earlier this year, there was a photo of my car's license plate was associated with the event. Any ideas what's going on here?


If the photo was taken with an Android device, and uploaded to Google Photos, most likely Google associated your location with the GPS coordinates in the EXIF data of the photo.

I was at Zion a few weeks ago, and did not see any equipment on the road or in the parking lot that was automatically capturing license plate data, if that's what you're asking.


Did you take the photo to remember your tag # for parking purposes?

That's what it would have been for me.


Where did you see the photo exactly? Was it a close-up of your license plate?


http://arianvp.github.io/GoogleStalker/

I built this once. I have no idea if this still works. All data is processed locally by the way with the html5 file api


I downloaded this data from Google and mapped every GPS location from 2014 using R.


I just came here to point this out. You can see it on the website. I found out that last year on my birthday I was at the cheesecake factory.


Yea you should "delete" that data on your google account.


The title is misleading; as the article reveals, you can't delete the recordings, all you can do is ask Google to not tie them to your identity. (and hope that they actually do that...)


You can delete them (or at least, request that Google delete them); either by selecting the checkbox next to each recording, or by choosing "Delete options" from the vertical-line-of-three-dots icon in the top right. What it looks like you can't do is ask Google not to keep anonymized recordings of voice searches etc that you make in the future.


Sorry, didn't realise that. I did visit the page but the checkboxes weren't there as I had no recordings.


So you have no recordings, and you claim Google doesn't allow you to delete recordings...

I guess you're technically right, Google doesn't let you delete things that don't exist.


seems more reasonable then to leave them as-is and regularly delete them :)


Even then, you don't really delete the recordings, you just delete the link between your account and the recording. As the article points out, the only way to stop Google from having any voice data on you (anonymized or not) is to stop using their voice features.


Hmm good point. I overlooked that fact.


Remember that as the AOL search history leaks showed us, the anonymization is a fig leaf and it's not terribly challenging to re-associate a complete search query history with a person -- even without identifying the voice.


Didn't the AOL dump still have a per user id for each record?


"Turning voice Activity off doesn’t stop Google storing your recordings, but it means they get kept with an anonymous identifier, and can’t be easily linked back to your account."

"Off" is the new "on". The Direct Marketing Association's definition of "not tracking" is similar.


Hmmm, all off as it was. Thanks past me for doing that.


Mine all say "paused" under "Activity Controls" [1]

Out of interest, why does Google use the term "paused" rather than "off"?

It makes me suspect that they are still saving it, but just hiding the fact that they are doing so from me. Or am I just paranoid?

Update:

The end of the article outlines this a bit better:

  Turning voice Activity off doesn’t stop Google storing
  your recordings, but it means they get kept with an 
  anonymous identifier, and can’t be easily linked back to 
  your account. If you want to stop Google recording your 
  voice at all, well, there’s only one solution: stop 
  talking to it.
[1] https://www.google.com/settings/accounthistory?continue=http...


I suspect a bit of it is psychology, too. If it's "Paused", it implies that it's usual to be enabled, both as peer pressure and for features/usability.


"This call may be monitored or recorded for quality assurances or training purpose. Your information is confidential and protected by the law." Call centers do that long time ago. And our voices are stored and used globally. I wish they have a delete button too.


Delete? There's this thing called soft-deletes. Things which get stored on Google (and Facebook for that matter) will never ever be deleted when you, the person which added the content, decides to do this.

They can ofcourse, when they decide to free up storage space by actually deleting stuff which was already soft-deleted -- but given the price per gigabyte these days I bet this will never ever happen...


I'm more surprised they managed to even get Material-UI to pages like this. Their UI/UX team is doing something right.


While i knew that google saves my location, i have explicitly told it i can. That fucking map scared my god.

Not sure how it got all those locations some of them makes sense i used google maps in my iphone or i caried my Android tablet with me.

Some of them seams to be based on to me unknown sources. I do not like being tracked when i dont know how it is done.


Decided to listen to some of my recordings. What seemed off to me was that it also records myself saying "OK Google". Meaning that the mic is always on. Do they store audio outside of me calling for Google?


They buffer the audio in a ring buffer constantly, so that the audio can be checked for a wake word. If the wake woRd triggers, then they send the contents of the buffer. The buffer shouldn't be super big. (E.g., one or two seconds). They use it to train their wake word model.


As someone generally critical and untrusting of Google, I assumed if you asked Google to delete your voice recordings, they actually did delete it. Kinda amazed that they don't.


I did not know they were keeping this crap, but it's kind of neat I can hear myself getting directions to the bakery on my wedding day to pick up the cake (donuts).


Now if only they would let me see all my past Google searches and known web sites they think I've visited, then that would be interesting.



https://history.google.com/history/

Search history, and chrome browsing history, both from the desktop and Android.


If you're using Chrome, have it authed to your google account, and have browsing history enabled in your sync settings, it's all here:

https://history.google.com/history/

I find it invaluable.


Amazon does the same thing with Echo recordings.


Link?


Amazon's pretty clear about handling / management of Echo/Alexa voice data.

FAQ: https://www.amazon.com/gp/help/customer/display.html?nodeId=...

View Dialog History: https://www.amazon.com/gp/help/customer/display.html?nodeId=...


Thank you! :-)


One way to think about this is you record a piece of voice and put it in Google drive. Sounds less creepy to me.


Thanks for this. Good to know.


QuestionsIAskMyPhoneWhileDrunk.ogg


Gees, Google is creepy.

This might be the tipping point that makes me finish my switch off Google and move my email to something else.

Fortunately, I haven't used this, but God only knows what else they're saving forever that I haven't managed to avoid or turn off.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: