I'm not actually trying to imply that this is what PRISM does (no one has made that claim). I'm just saying that on a government scale, the cost of storing all voice calls ever made forever is not even very expensive.
So let's add bandwidth: the most expensive estimate I've seen is $0.019/GB <http://blogs.howstuffworks.com/2011/04/07/what-does-a-gigaby.... Let's assume the original audio is captured using G.711 (64 kbps). So that's 438 * 10^9 minutes * 60 seconds/minute * 64000 bits/second / (8 bits/byte * 1024^3 bytes/GB) * $0.019/GB = $3.72mln.
Facility: The NSA's Utah facility is projected to cost $1.5...$2bln <http://en.wikipedia.org/wiki/Utah_Data_Center> and will contain a 100,000 square foot data center <http://nsa.gov1.info/utah-data-center/>. A 42U rack is about 7 square feet. Let's assume a floor occupancy of 25%. That's $2bln/facility / 10000 ft^2/facility * (7/0.25) ft^2/PB * 40 PB = $22.4mln.
I don't have a good estimate of the personnel involved, but I doubt it'd require anything out of the ballpark of the other numbers here. You could have every rack maintained and operated by its own PhD-level researcher at less than $10mln/year including all overhead and benefits.
Commercial speech compression algorithms are hamstrung by the need to only add milliseconds of delay: they can only compress over a 'window' of tens of milliseconds. You can almost certainly do a much better job of compressing speech in batches of an minutes or tens of minutes: there is much more redundancy to remove. So if the spooks wanted to store massive amounts of speech data, they may have invested in such algorithms.
Storing voice (audio) data is not what the article says. I'd imagine you transcribe the audio to text and search in that. Storing text is incredibly easy. Besides you can throw away 99.9% of the data almost immediately.
I'm actually curious how much text data this would be per day; number of call minutes * average number of words per minute. I'd be surprised if that wouldn't fit in a reasonable cluster.
You underestimate the CPU power needed to do this. The Netherlands has a population of 16 million, by comparison Google voice has about 1.4 million users. This is an order of magnitude difference. On top of this they only transcribe voicemail not all calls. What is the ratio of calls to voicemail?
Transcribing all voice calls to text in the Netherlands computationally could easily be two orders of magnitude more difficult than Google voice.
I'm sorry, but do we really think that machine transcription of millions of cell phone conversations is worth anything? How can anyone believe that after using google voice?
So you use a hybrid approach. The text transcription can be fed into programs that look for specific phrases, build up social networks, etc. And then anyone you decide you actually want to monitor you keep audio as well as the machine transcription.
The machine transcription remains incredibly valuable for broad surveillance even though it is highly imperfect.
A lot of people underestimate the amount of storage it would take to store all voice data.