It's not surprising to me that they keep it, but they do not need to keep personally identifying information for six months in order to do that. Seems a little improper, and also unnecessary, for employees to have access to that kind of thing if they're working on improving Siri.
Hazarding a guess, history associated with a user could help accuracy or determine usage that causes problems? I mean, sure, don't keep the user data forever (and I imagine you can't, depending on local laws), but as long as it's useful? Six months seems a reasonable timescale if you have to identify, track, test and implement problems, and six months per user probably doesn't give you a whole load of statistics on it's own.
It does not need to be personalized, it only needs to be associated with a given data set. How it's applied to the personal device in question is an implementation problem, not a matter of whether anonymization should have been implemented.
Not so long ago, the primary goal of the 1-800-GOOG-411 service was to record people's voice queries for the purpose of bettering their voice recognition technology. After they got sufficient data samples, they discontinued the service.
> "Google had stated that the company originally implemented GOOG-411 to build a large phoneme database from users' voice queries. This phoneme database, in turn, allowed Google engineers to refine and improve the speech recognition engine that Google uses to index audio content for searching."
Fascinating. I had no idea that's what that service was being used for. Interesting, it was more or less retired as soon as Android blew up and voice search was everywhere.