This is a great start. I'm glad Google has the wisdom to devote a team to just this goal. I look forward to end-user tools they may develop to further accelerate this progress.
I agree, mildweed. Especially about the end-user tools.
It looks good to do this, and it is good. It's all win, and clearly it sets them apart.
That being said, in many cases, their answer is "There's an API for that." Why can't it be exported into a file format that will work in (at least some) other programs right off the bat? Surely there exists at least some suitable existing file format for the kind of data stored in Calendar, for instance.
I don't know why anyone would leave google apps for MS Office, but if they wanted to get their info out of Google Calendar, reading "There's the iCal API" doesn't actually sound that liberated to an end user.
Someone could make a tool that uses the api to convert the data, but it'd be nice if they provided data export into an existing format whenever one exists.
( see the disclaimer above; yes, they're the only ones doing this, it's awesome, etc. )
Hopefully each system will export data in an easily parsed structure, perhaps XML or something. Most of the API's in place are a little convoluted. A common, standard format and technique across all Google services would be best.
There are inherent privacy issues with any cloud service, and even more with a company as big as Google.
Still, I'm VERY impressed with how easy they make it to take your data and leave. Can you imagine Facebook helping you migrate to another social network?
Their infrastructure makes it ridiculously inefficient to support on-demand erasure.
In a system designed to never lose data, where most things are stored immutably in a log-structure, a delete feature is an unwelcome opportunity for massive failure. It's much better to delete things passively, even if it means waiting indefinitely and just not copying them in the next system migration.
If this worries you, assume nothing you type can be ever deleted.
I own a very, VERY small web application. It takes input from users and, potentially, saves it to the database, then performs operations on it.
Suppose a user types "foobar" into my application, waits a while, and then tells me "Hey, I want you to totally erase any evidence that I ever typed "foobar". Well, um, I don't think it is physically possible for me to do so.
Minimally, "foobar" is now present in my database. I can zap that fairly easily. Foobar might also be present at a few places in memcached, which are difficult to me to calculate but theoretically accessible to me. I suppose if they're theoretically accessible I could, with significant effort, call them up again, which means that with significant effort I can delete them. OK, zap.
Then I have backups of my database. And here's where delete starts to become a matter of "Uh oh, now we're talking hard." I can't just blow away information in a database backup -- I'd have to unpack it, load it into a database (binary dumps = do not work on me with ad hoc tools!), blow away the record, then save. This would have to be done very carefully to avoid nasty effects if two or more people wanted to blow away data at the same time, since maintaining ACID guarantees is pretty difficult when you've got multiple independent copies of the same database running around. At this point, I'm already strongly inclined to say "If it ever hits the database backups, it potentially stays until doomsday."
Then we have server backups. My server is a VPS. If you were in the database backup saved on disk at T1, when a server backup happened (which freezes an image of the VPS in time so that I can return to exactly the same state as T1), then even if I blow away the backup there is ANOTHER opaque backup in an even more opaque data structure. So I'd have to spin up another VPS from the image (for each image I have), deal with the pains of moving that to the present day/time, load each backup on the VPS, blow away your record, refreeze the database, then refreeze the VPS.
All of this unfreezing and refreezing presents many, many opportunities for me to corrupt other users' data, which is the reason I have the backups in the first place. Corrupting data is unacceptable.
Except, wait, I'm not the only one who has copies of my images! Slicehost also has redundant backups of the images because I can't lose an image if they lose the box the image is on! So I'd need to somehow gain access of their backups (which I have no control over) to spin up the VPS to spin up the backup to nuke the record to backup the DB to snapshot the VPS to remake the backup of the hard drive containing hundreds of images from people who are not me.
EXCEPT WAIT! It is conceivable that, without notice to me, Slicehost has moved from owning their own backup media into a cloud storage solution. Which, for reliability, would duplicate the backup machine multiple times! So now we need to access each of the the copies of the backup machine to spin up each of the images to spin up the database so we can nuke the record to save the database to snapshot the image to back up the snapshot to replicate the backup machine which persists the backups of the snapshots containing the backups of the database which holds your record.
So, yep, that's where I am. You probably did not read my privacy policy, but I'm pretty sure it says something to the effect of "I make no guarantees about being able to delete your data." That is as much as I am going to say of the matter. As soon as you give me the data to hold for a nanosecond it could very well be out of my control to ever delete again.
Liberation is good - Google is good - so are privacy agreements.
A better name for the site could be "Grab a Copy of Your Data!" - much more accurate. Liberate implies that you are actually getting your data back - which you're not...you're getting...a.....copy.
Immediate deletion is hard. Retention policy is easy. At my company after 90 days the the backups are blown away. I presume Amazon S3 will take time to age out the data after I'm through with it, but they too will eventually delete it.
Very cool move. Especially coming form a big company like this, and indeed it was a smart move to "design" this part of Google as a stand-alone team instead to delegate every other team to take care about data IO.
The name sounds scary though. Reminds me of Whackos belonging to the "Earth Liberation Front" who toppled several AM radio towers in Washington a few weeks ago. Reason; "AM radio waves cause adverse health effects including a higher rate of cancer, harm to wildlife, and that the signals have been interfering with home phone and intercom lines."
Which are themselves references to any number of leftist revolutionary movements, because remember kids, Communism is cool.
Like Che T-shirts, I severely doubt anybody ever thought through "Hey, I wonder what an all-red color scheme and the words 'Liberation Front' say about our values here". Hint: it isn't "Do no evil."