It's not surprising to me that they keep it, but they do not need to keep person...

misnome · on April 20, 2013

Hazarding a guess, history associated with a user could help accuracy or determine usage that causes problems? I mean, sure, don't keep the user data forever (and I imagine you can't, depending on local laws), but as long as it's useful? Six months seems a reasonable timescale if you have to identify, track, test and implement problems, and six months per user probably doesn't give you a whole load of statistics on it's own.

blueprint · on April 20, 2013

It does not need to be personalized, it only needs to be associated with a given data set. How it's applied to the personal device in question is an implementation problem, not a matter of whether anonymization should have been implemented.

drivebyacct2 · on April 20, 2013

Presumably. Google even desires to keep the data personalized if you opt in and allow them.

glitch · on April 21, 2013

Indeed. The more data points, the better. At least for speech recognition personalization for Google Voice Search, it is opt-in: http://support.google.com/android/bin/answer.py?hl=en&an...

Not so long ago, the primary goal of the 1-800-GOOG-411 service was to record people's voice queries for the purpose of bettering their voice recognition technology. After they got sufficient data samples, they discontinued the service.

http://www.infoworld.com/t/data-management/google-wants-your...

> "Google had stated that the company originally implemented GOOG-411 to build a large phoneme database from users' voice queries. This phoneme database, in turn, allowed Google engineers to refine and improve the speech recognition engine that Google uses to index audio content for searching."

drivebyacct2 · on April 21, 2013

Fascinating. I had no idea that's what that service was being used for. Interesting, it was more or less retired as soon as Android blew up and voice search was everywhere.