Why Google Takeout is sooo bad

cloudking · on June 17, 2024

The team behind Takeout genuinely care about making it easy to export and transfer your data out of Google. The problem is that team doesn't control any of the underlying services and resources that Takeout needs to pull data from. Every product at Google is run by a separate team, and essentially is a separate business unit within Google, with their own resources and priorities. Making sure that their products work reliably with Takeout is not a high priority for most teams, integrating is more of a compliance checkbox, and once it's "done" they will move engineering resources onto other features. That's why Takeout can be unreliable.

DaoVeles · on June 17, 2024

It is a shame it works like this. For any service my first questions is "How easy is it to get out?"

bentcorner · on June 17, 2024

I agree, IMO it's this kind of integration that really allows products from organizations to feel like a complete offering. Some companies do this really well and some don't. This is where company leadership can make a real difference.

kwanbix · on June 17, 2024

If they are, they failed miserably. For once, there is no way to use a download manager.

Waterluvian · on June 17, 2024

I can only imagine the office politics going on when the Takeout team has to beg, borrow, and steal access, docs, etc.

dmix · on June 17, 2024

What do you think the implications of this are on Google's AI strategy? Not so much this particular data but the structure of Google as isolated units.

cloudking · on June 17, 2024

I think this problem exists for any feature that needs to span multiple Google products, the organizational structure inherently makes it difficult for these features to be reliable and successful. Regarding the AI strategy, you can see it already causing challenges as each team is integrating Gemini technology into their products separately, instead of being a cohesive top-down vision and strategy. It's why Gemini can't access data from all the Google products, and each product that does support it has it's own integration.

I will caveat that though, I am bullish on Google and AI in general given the incredible talent and vast amount of data Google has access to. I think eventually they will make a technological breakthrough that puts them back in the leader position - they had it with transformers and just didn't know how to turn it into a product.

CuriouslyC · on June 17, 2024

Google has shown that they lack product vision and a cohesive strategy. The engineering is top notch though, and the GCP hugging face/AI training integration is for real, if they can build community trust GCP will absolutely eat AWSs lunch as AI apps proliferate.

As for a breakthrough to leadership, Demis' approach of using successively more complex video game environments to develop AGI is absolutely the right path, so I wouldn't be surprised if DeepMind generates a prototype "AGI" first, but I would be VERY surprised if goog successfully capitalized on that.

tomcam · on June 17, 2024

I tried to use GCP for about 10 years and never found it to be robust enough for my business. I enjoyed it far more than AWS

CuriouslyC · on June 17, 2024

GKE and Cloud Run are very robust at this point, and I consider them best in class. Cloud SQL is fairly mature but I still prefer AWS RDS.

voiceblue · on June 17, 2024

AI is a top priority company-wide at Google right now, so it’s relatively easy to get a ball rolling.

lolinder · on June 17, 2024

I'd be less concerned about them getting a ball rolling than them getting one ball rolling in one direction. If AI goes anything like most things Google then in short order I expect them to have 2-3 different AI strategies that are all in conflict with each other.

voiceblue · on June 17, 2024

More rolling balls means more wins and more promotions all around!

Flimm · on June 17, 2024

My issue with Google Takeout is that it is very difficult to download the generated archives on a normal Internet connection. If a download gets interrupted for any reason, I cannot resume the download, I have to re-enter my Google credentials and start again. It can take days to finish downloading a 10GB download, as I constantly monitor the downloads, restart them and re-enter connections. That would take me a couple of hours using BitTorrent, with no manual interventions required. This problem cannot be blamed on other teams within Google.

Dah00n · on June 17, 2024

What connection are you downloading over? It sounds very annoying.

bmacho · on June 17, 2024

I am almost sure that downloads are resumable using wget. I'll probably check it in the following days

jimbobthrowawy · on June 17, 2024

Web browsers' and wget use the same HTTP features for resuming downloads (range-requests) so if it works in one it should work in the other. I've never had enough data in a google takeout that it would fail partway though.

bmacho · on June 17, 2024

In my experience web browsers don't use range requests for resuming downloads, but just start downloads over. Or at least I don't know how to make them resume.

23B1 · on June 17, 2024

So what you're saying is, Google's own internal structural incompetence means they cannot be trusted with data of any type.

Huh!

arp242 · on June 17, 2024

Any organisation of this size with so many different products will have these kind of problems. This is not really a "Google problem".

That's probably a good reason to avoid organisations of these sizes unless you have a good reason not to, but ... that applies to any organisation of comparable size.

benterix · on June 17, 2024

I agree to some extent only. Any big organization faces this issue, so you're right this is not just Google problem.

However, they way they approach it varies considerably, and I'd expect Google engineers to address it in a way that minimizes problems resulting from, say, changes in product A causing problems in product B. I'd bet some work on that has already been done because these folks aren't stupid but apparently that's not enough.

23B1 · on June 17, 2024

bandwagon fallacy

masneyb · on June 17, 2024

I've come to the conclusion that the best way to avoid lock in with vendors like this is to use a photo organizer on your computer, and use that organizer to upload your media to $BIG_CLOUD so that you have a copy of all your media on your network.

If you go this route, then you'll need to organize your media in whatever photo manager you use, and then again on $BIG_CLOUD. Yes, your photo manager will sync some things like titles, comments, and tags as you upload new media, however not all things are synced, such as the event(s) that you want your media to show up in. Also if you make a change in your local library to media that's already been published to $BIG_CLOUD, then those changes will not be reflected there.

Personally I use Shotwell under Linux: https://wiki.gnome.org/Apps/Shotwell and I wrote a program that generates a static HTML site based on my library: https://github.com/masneyb/shotwell-site-generator. When I make a change to my media library in shotwell, then the static site is regenerated to reflect the most recent version of my site. This also makes it super easy to backup my photos to $BIG_CLOUD (like Amazon S3) for redundancy, while retaining full control of my media.

I have my generated site on a password protected website that my family has access to. When I need to share photos with friends, I'll upload them to a photo hosting service like Google Photos or Flickr.

neilv · on June 17, 2024

Agreed, this can be a big win, and I did something like this while I learning photojournalism on the side, and generating a lot of images.

Combined with the tech (RAID array, backups, sharing script), it also helped to have a manual practice of culling photos.

I didn't cull as selectively as I might pick photos to cold-submit to a publication. But if I had several almost identical images from the same event in my archive, I'd try to delete all but one of them.

Reducing space requirements to 1/4 has home IT benefits: maybe don't need that NAS or bigger drives yet, backups run 4x faster, backups might fit on a single backup medium or much less expensive one, can afford that second big local drive for a little extra RAID-mirroring protection, etc.

It's also good encouragement to be a little more judicious about pressing the button on the camera that makes more culling work. :)

eichin · on June 17, 2024

Sure if you're (like me) a weirdo that uses an actual camera :-) Most people use phone cameras that go directly to $BIG_CLOUD and everything else is More Work.

(Personally I use rsync (FolderSync) from the phone to a home server and pull things into my (kphotoalbum-oriented) workflow that way. But it's more about personal control and paranoia than being something that would be useful to random people for whom vendor lock-in is actually a threat...)

masneyb · on June 17, 2024

These days, I only use a phone camera (Pixel 6 Pro) and it is configured to not upload my photos to the cloud. Every few days, I plug in a USB cable, move all of my photos off the phone to my computer, curate the photos (i.e. delete ones I don't want), and add them to my photo manager.

Curation is key to avoid having a mess down the road since I may take 10-15 photos of the same scene, and only save the 1 or 2 that turned out the best.

1oooqooq · on June 17, 2024

didn't google kill google photos once?

philsnow · on June 17, 2024

they "killed" Picasa Web Albums, does that count?

datenyan · on June 16, 2024

The institution I work for is a completely Google-based shop, with an Enterprise agreement with Google for data retention and various other requirements.

Even for us, Google Takeout is a complete mess that fails all the time, or straight up corrupts files that need to be exported. It doesn't surprise me at all that the service sucks for general users, but the fact that it's terrible for Enterprise customers really tells you all you need to know about Google.

Similarly to the OP, we also run into issues with logs wherein the Google Admin Console also just straight up doesn't provide detailed information about what went wrong, and questions via our relationship manager often get passed around for what feels like years.

Thanks, Google.

ajb · on June 16, 2024

If you're on enterprise, you can use the Legal Discovery feature (vault) instead - it worked at my previous employer when all the HR files mysteriously vanished. I think they maintain it properly because otherwise they might get summoned in a case. The downside is that it does a format conversion to word format (and presumably Excel for sheets), and doesn't export in the same directory structure (although I bet the metadata exists in the dump somewhere)

datenyan · on June 18, 2024

Oooh, this sounds promising - thank you so much!

JZL003 · on June 17, 2024

The past few weeks it won't work repeatedly, I don't understand

Also a trick to download which avoids using wifi is to rent a tiny VPS. Then do takeout -> email link (not google drive). When you go to download it, open chrome devtools -> networks tab. Click "download" and find the networks row which is for that download (is preceded by a 301 redirect). Right click -> "Copy as cURL" and then can copy into that VPS

It's a way of spoofing the connection with cookies and everything so you download over the fast VPS connection. Then can use `rclone sync` or whatever to copy to a backup place

JZL003 · on June 17, 2024

oh my bad, this is pretty similar. But you don't need an extension on firfox or chrome, just inspecting the networks tab. I can find my notes if other are interested to find the exact URL, but I think it was a gstorageapi url. If you clear the log right before you press download, there aren't that many network requests

treflop · on June 17, 2024

Google Takeout predates the GPDR by like 6 years.

I’m no Google proponent but I hate people who spread outright wrong information that they could just double check.

sanchez_0_lam · on June 17, 2024

Ok, sorry for that. Will correct this in post.

herf · on June 17, 2024

I think Takeout is a heroic effort, but it's pretty clear most of these datasets would be much more useful in "streaming" form, like you would use for a database migration. For instance, you almost always would want to trial a new service for a few months before "moving" to it entirely, and you might run two things like this in parallel for a long time.

If you want to download a snapshot every so often, you should be able to give it a token representing the last timestamp, and have it give you all the things that have been added since then - most datastores can do this. Then, if the service supports it, it should also indicate changes to old data (including deletes) - but this requires some additional state so it probably wouldn't work for every case.

totetsu · on June 16, 2024

My free “forever” unlimited university google account was deactivated after ten years or more of using it as and endless google photos backup. Takeout is really sooo bad, it was failing repeatedly and the deletion day was coming up fast. Wasted so much time.

undopamine · on June 17, 2024

This was the turning point for me as well. Decided I'd use my own hardware for primary copies.

crazygringo · on June 17, 2024

Takeout is not really a great solution for backup, since it's not incremental. You're going to be wasting a looooot of space.

I use dedicated tools for backing up my most important Google data -- rclone for my Drive, and gmvault for my Gmail. And I'd use gphotos-sync if I used Google Photos (but I don't). And around once a year I use Takeout for the rest -- stuff like Calendar, Contacts, etc.

Takeout doesn't really fit the author's use case. It's intended for migrating your data to another service, not for regular backup.

philsnow · on June 17, 2024

Have you pointed borgbackup or similar at it? i.e. extract the archive to a specific directory, let borg create an archive of it, and then a month later do the sage thing and see if the incremental size is egregiously large? I would expect the overwhelming bulk of data to be media, and those will consume (nearly) zero incremental space with borgbackup or some other deduplicating backup system.

crazygringo · on June 17, 2024

I don't know what the point would be? You still have to perform the entire Takeout, on a disk that already has a previous Takeout, so you always need double the space, and you always need to spend days (?) downloading terabytes (?) of data.

Once you've downloaded the entire new Takeout, there's no reason to deduplicate -- just delete the old Takeout.

philsnow · on June 17, 2024

Ah right, it would still take a long time to download.

My use case is a have a local NAS that i use for backup but i also want things backed up offsite, so i mirror the backups to b2 (and soon to glacier).

I would download and extract the takeout archive locally, then run borg with the NAS as the borg repo. It tries to dedup and only store incremental data in the Borg repo.

If the takeout data consistently has enough of the same “shape”, the b2/s3 storage would only grow by roughly my incremental takeout archive size, rather than storing 200 more GB every time I export a takeout.

So yah, it would use a lot of space locally and temporarily, but the idea for me is to minimize cloud storage but also being able to extract files from older takeout archives.

63stack · on June 17, 2024

The reason for deduplication and incremental backups is that you can recover accidentally deleted photos.

You don't need to keep the previous backup on the disk, it's enough to have it on the backup destination (at least in the case of borg).

Gigachad · on June 17, 2024

I don't mind that it isn't incremental, because it massively reduces the complexity and risk. I download a zip file and that just contains all my data. I don't need to keep all the past copies to build up the final dataset, that one zip just works. There is also no software managing it. If someone hacks my google account and deletes my data. Nothing is going to automatically sync that deletion to my local copy.

sanchez_0_lam · on June 17, 2024

You're right, that it's not the best. It would be better to have an incremental thing. For me it's just the easiest thing to work with.

But, as stated in the blogpost, I'll check gphots-sync in near future.

The problem with that kind of approach is that I need to setup my own infrastructure of some kind to run that sync app on it. With Google Takeout I can just do that to external vendor.

jawns · on June 16, 2024

I have followed this guide a few times to successfully back up on Glacier:

https://gunargessner.com/takeout

One of the tricky things is making sure that you're not duplicating things you've already backed up. The way I handle it is through a combination of date ranges and tags.

xhrpost · on June 17, 2024

The article says it costs $.40/mo to store 400GB on Glacier. Looking at current Glacier pricing of $.0036/GB, that should actually be $1.44/mo. I realize an additional $1.04/mo isn't much but the number is off by a factor of 3.6x so I'm just trying to understand. Also, if you're willing to spend a tad more, the 400GB could be stored on Backblaze B2 for $2.4/mo but egress is free!

kiwijamo · on June 17, 2024

It depends on the region and which Glacier product you choose. I think your pricing of $0.0036/GB is the cheapest I've seen. However it is for the 'Flexible Retrieval' option. 'The Deep Archive' option is $0.00099 and thus 39.6 cents a month. This rounds up to the $.40 figure provided in the article.

xhrpost · on June 17, 2024

Ah OK, I wasn't aware of the Deep Archive option, it's not on the price list of the main Glacier page but it is linked to. Thanks for the details.

sanchez_0_lam · on June 17, 2024

That seems nice! Will try this approach.

maxlin · on June 17, 2024

Been dealing with this a while, it's hard to be consistent as google doesn't give any way to do this automatically so at best you can schedule it to provide you downloads for 1 year every 2 month and then remember to download them in time, also it's not just one download but might be over 5 if there's more data. And also remember to set it up again in one year, dark patterns galore.

One good thing I noted in Takeout though, is it converts your google docs stuff to locally-viewable formats easier than other options (and of course a lot better than you'd get by just syncing in Drive as they are just links to the gdocs in that case!).

Another issue where google makes transferring stuff harder is just transferring stuff to another Google Drive. I had to do this from company account to my own (with a valid reason, I was a cofounder and we were shutting down). There was no built-in way so I programmed my own recursive copier that retains google docs as google docs and files as files, an also manually manages to retain comments and where they apply to in the docs. It still wasn't possible to retain edit history which sucks, considering if these were just normal files doing be most basic folder copy would do all of this.

sib · on June 17, 2024

Addressing the goal of the author vs the symptoms of Google Takeout, I've found the best way to handle a reasonably large photo library (~3TB) is to manage it locally with a DAM (digital asset management) platform and then back that up automatically and into whatever cloud(s) I want for offsite backup or sharing.

The backup process puts both the underlying photo assets plus the DAM's DB of metadata into backup automatically and continuously. I could (always) be more paranoid, but so far this has worked. And I have successfully restored from the cloud backup after a motherboard death on my primary local machine.

I think using a 3rd party cloud system as the primary system and source of truth is making life harder unnecessarily.

ilrwbwrkhv · on June 17, 2024

Many products from Google are facing this trend. Search sucks. Pixel doesn't receive calls. Gemini asks me to do the work. YouTube has bugs in comments and the app. Gmail doesn't clear spam count when it's emptied. Building on the Google cloud platform ties me to someone from India who approves my app but doesn't understand basic English. The list goes on and on. If we had done our jobs and broken up monopolies, these companies which are no longer going up would have crashed and burned much faster clearing the ground for better products and services.

zer0zzz · on June 17, 2024

One thing Google isn’t the best at is retaining metadata in the photo files for Google photos. Takeouts will just throw up a JSON with the meta data rather than put it in the exif portion. I’ve definitely found that Apple photos and iCloud retain this information much more consistently than googles takeout.

Gigachad · on June 17, 2024

They also mangle live photos for some reason. When you export them, they become two files, one the photo, and one the 3 second video part.

zer0zzz · on June 17, 2024

yeah, seriously. I am remembering that bit too. How horrible.

This, and the fact that Google Photo's web UI is a poor alternative to a good native photo manager is probably a large part of why I pretty much don't really bother with Google's ecosystem of apps and devices at all anymore.

BXlnt2EachOther · on June 16, 2024

> Great idea! Of course Google, as other BigTech companies was coherced to implement it mainly due to GDPR restrictions

I don't think this can be correct. Takeout was released June 28, 2011. GDPR was passed April 14, 2016 and went into effect two years later.

I seem to be luckier than sibling comments; my gmail+photos tasks have been reliable. Three big differences from the author: I have closer to 40GB than 200GB of photos, I download the zips from Google rather than having it post 200GB+ to OneDrive, and I unzip my backups when they happen and move them to network storage, rather than rely on them to be 100% automated. I do recall one manual run failing and I was notified.

I do wish some pieces were a little more intuitive, for example how the Drive vs Docs/Sheets/Slides overlap is handled.

I know anti-Google sentiment is very high here, and disclosure that I worked there in the past, but IMO Takeout is a user-friendly effort and they're well ahead of other companies, big tech and especially smaller and non-tech, there. I might be selling Facebook short; they do have an exporter but I haven't used it. [Edit: of course my opinion would be different if I ran into broken exports like a couple sibling comments report.]

philsnow · on June 17, 2024

Once upon a time within Google, there were people who were pushing for Takeout specifically for data portability. The thinking was, if they couldn't hold people's data hostage, they would be better incentivized to make their products the best on the market.

> they're well ahead of other companies

Agreed; it has never had a great UX but I'm glad it exists. I would like very much for it to be automate-able, as it is, it's pretty ADHD-hostile. I have to remember to start the process, remember to go get the files, and remember to deal with the files after they're done downloading (even on "gigabit fiber", it still takes long enough to force a context switch to something else and thus another opportunity to forget to deal with it).

crazygringo · on June 17, 2024

> Takeout was released June 28, 2011. GDPR was passed April 14, 2016 and went into effect two years later.

Exactly. It really bothers me that people make these assumptions that all companies are bad and must be forced to do things that help users. And in this case, the author straight-up lies.

Google did NOT do this in response to legal regulations. They launched Takeout before the GDPR was even first proposed, let alone passed. They did it on their own, precisely in response to user desires not to be locked in. You're more likely to start using Google if you always know you can get your data out.

sanchez_0_lam · on June 17, 2024

Thanks for clarifying the GDPR error.

Also, to be clear - I don't see Google as inherently bad. My opinion here is just about Google Takeout.

liquidgecka · on June 17, 2024

My last five or so takeout runs just failed with no reasoning why. When I try to save to google drive it just fills up the 2TB storage allocation I have without ever making actual files, or worse, making dozens of oddly names files of incorrect sizes.

I filed like five support requests and never got any actual replies.

My next step is going to be moving off google for most things, but amespecially for photo storage (80% of my usage) which I was relying on google takeout for a backup solution.

As a former Googler i have pretty much given up on google being useful these days. :-/

jchw · on June 17, 2024

> Of course Google, as other BigTech companies was coherced [sic] to implement it mainly due to GDPR restrictions - so that any user can take the data he/she owns and move out to other place.

Well, that's just simply false. Google Takeout was in development and released much earlier than GDPR was published. It was an internal project of a weird Google team known publicly as the "Google Data Liberation Front". I wasn't actually at Google during any of this, but I recall Google Takeout from ages ago, and it's hard to forget, in part because it seemed so novel for the time, and also the name "Data Liberation Front" stuck out as being surprisingly bold.

(Of course, Takeout has probably expanded greatly due to fears of regulation, but according to Wikipedia it was in development for four years upon being released in 2011, and I don't think Google was as worried about data portability regulation by that point in their history.)

--

That aside, I was actually genuinely curious to see how bad the process would be for me, as I've been gradually de-googling myself for years now. I chose to export my Google Photos library, which is apparently around 60 GiB. It isn't growing, because I no longer use Google Photos. I opted to just export them to 10 GiB tgz chunks to download directly. About an hour or so later, I got an e-mail with the download links. I'm currently downloading them. The download speed is tolerable: it's between 10 MiB/s and 30 MiB/s, jumping around a bit depending on the part file (probably random chance to some degree.)

(Edit: I did indeed finish downloading the batch with no problems. Seems fairly complete. The formats are documented, not sure how much stuff supports them.)

So at the very least this seems like a "YMMV" situation. For me, this works perfectly well, which is great news considering I didn't really have another contingency plan for this.

Granted, I'm not sure what I'll do with the exported data yet.

sanchez_0_lam · on June 17, 2024

Thanks for clarifying the GDPR thing. I'll update the article.

As for the way it works for me - I've tried multiple times and it always fails - but I backup to other cloud.

Havoc · on June 17, 2024

Google is making me increasingly nervous about trusting them with data. eg the GCP source repo product I was using for personal source is getting canned

Probably won’t get rid of Google entirely but definitely need to do something to derisk this.

SoftTalker · on June 17, 2024

I don't trust any cloud provider. You have zero recourse if they change strategy, change policy, or simply fuck up. You are inconsequentially tiny in the scope of their universe, they just simply don't have to care.

trog · on June 17, 2024

It's not a solution to all of Takeout's problems, but if you just care about backing up email within a Google Workspace, check out Got Your Back:

https://github.com/GAM-team/got-your-back

It is, I believe, written by a Google employee. It's a really great CLI tool that allows you to dump mailboxes to disk as a bunch of text files. You can also restore them to another account. Lots of flexibility as well.

I gave up on Takeout and Google's own Data Migration tool, both of which I found just too flaky and inconsistent - my experience was not dissimilar to that in the OP.

tonymet · on June 17, 2024

it works better if you Trigger google takeout to put the archive into your google drive account, and download that archive onto a VM for further processing.

I do regular copies from google takeout to AWS Glacier Deep Archive (via S3) , it's cost effective and works well. I use rclone for the reading and writing.

Just spin up a VM for the duration of the processing. You can use the intermediate step to filter, transform, prune , chunk or edit files as well.

davchana · on June 16, 2024

My first struggle with then newly launched Google Takeout was, if you export all of your google keep notes, if you have any note without title, the whole takeout will fail, or will give you damaged zip file. I opened it with 7-zip, and then found that all notes with titles were there, and first note without title had a filename Error: something. So, the loop was failing at encountring first note without title.

omnifischer · on June 17, 2024

Pretty sure EU/Fed should some how make transfers from all FAAMG interoperable or just works but in my experience take out always worked.

crazygringo · on June 17, 2024

Wait, when did Google Takeout start supporting a) backing up to other cloud providers and b) backing up on a regular basis?

Last I used it, probably a year ago, those options were not there. I'm 99% sure. And I can find lots of forum posts over the past decade complaining that Takeout doesn't support uploading to other cloud providers, and that it doesn't support any kind of automation.

When did Google add this?

abirkill · on June 17, 2024

I can't speak to the cloud provider support, as I don't use it, but they have supported scheduled backups since at least June 2019, as I have locally-stored backups from scheduled Takeout jobs going back that far.

Bear in mind that the scheduling support is extremely basic -- the only option available is to schedule six exports, one every two months. You can't change the frequency or the number.

Also, you can't pick when the schedule starts, so if you want backups every two months indefinitely, you have to remember to schedule the next set of backups two months after the final backup of the previous scheduled job finished.

It's better than nothing, but only just.

cypherpunks01 · on June 17, 2024

Takeout for Gmail has usually worked well for me, albeit not exported too many massive amounts this way.

My main use case is backing up Gmail accounts of users in our GSuite org, before deleting an account.

1. Reset password and login as user 2. Initiate Gmail Takeout, export to Google Drive 3. Return to admin, and delete user account, using the option to xfer their entire Drive to admin user

Ferret7446 · on June 20, 2024

Google Takeout isn't really intended for backup, is it? The idea is you take out all of your data at once and migrate to another platform.

If you're doing recurring backups, there's likely a better solution (e.g., file syncs from Drive/Photos).

trogdor · on June 17, 2024

My experience with Google Takeout has been overwhelmingly positive. I use it regularly, and I extract huge amounts of data. In my experience, extractions rarely fail.

A few weeks ago I used Google Takeout to download more than 40TB of Google Drive data, compressed to 50GB zip files. It worked perfectly.

plorg · on June 17, 2024

I migrated my Fitbit account to Google not considering the effect it would have on my data exports. I had been using the "legacy export" feature to dump out CSVs maybe once a week and copy the new data into a spreadsheet I had setup. When I migrated my account all the data export now goes through Takeout, take it or leave it. 7 years of data is about 2GB uncompressed, 170MB compressed. It will sometimes take days to export, sometimes just an hour and it will tell me there was an error, or occasionally as much as a week. The files are almost always the same (mod new data), but it will almost always appear in my email as 8 archives, then when I go to download them there will be a random number of archives listed on the landing page, between 1 and 5. It's not super crucial, and it's easy enough to digest with a script, but it's wild how janky the process is.

rstat1 · on June 17, 2024

Its worked well enough for me both times I've used it. First time it wasn't happy about giving me 10+GB of data in one file, but once I set it to break it up in to multiple files, it was fine. No corruption or anything.

Apreche · on June 16, 2024

It’s bad by design. Not because there is a conspiracy and they twirled their mustaches and made it bad on purpose. It’s bad by design, because what incentive is there for any Google employees to make it better?

It won’t get them promotions, raises, or even kudos if they make takeout really awesome. It doesn’t generate revenue. It only costs Google to have it. As long as it meets the minimum to satisfy regulators, it won’t be touched.

GauntletWizard · on June 17, 2024

It was good and good by design. I knew several of the people in the Data liberation front commercial, and they were serious about making it a worthwhile experience. https://youtu.be/QP4NI5o-WUw

Software tends to rot, though, and software at Google double so.

rty32 · on June 17, 2024

Linked YouTube video was from 12 years ago.

These guys who initiated the effort could mean well and put all their efforts into it. That doesn't invalidate the parent's point a tiny bit -- nobody is incentivized to make it good today.

Yes software rots, but some much more than others. Look at Gmail, Search and Android. See the difference? Some generate revenue, others don't.

asvitkine · on June 16, 2024

People can absolutely get promoted for improving products, even if those products don't make the company money.

The problem is more likely a lack of prioritization at the leadership level.

JAM1971 · on June 17, 2024

> The problem is more likely a lack of prioritization at the leadership level.

Great. If you're in leadership why would you prioritize a feature making it easier for people to leave your platform when you could instead prioritize a new feature that might generate value for the company?

There's no incentive to make Takeout any better than it is today.

vitiral · on June 17, 2024

False. If there's an army of people saying it's impossible to leave Google then fewer will use it.

SR2Z · on June 17, 2024

> People can absolutely get promoted for improving products, even if those products don't make the company money.

Yeah, but there better be a high-up patron for these products because Google is notoriously stingy with promo.

Source: quit Google right after L3->L4 because another company was willing to offer me an extra $200k/yr and L5. I've since been promoted at THAT job, and am now looking again because they gave me a raise to the bottom of the next band and that's dumb.

Aeolun · on June 17, 2024

> gave me a raise to the bottom of the next band and that's dumb

Why is that dumb? By your own account they’re already paying you at least 200k/year.

There’s a limit to how many large raises you can give if you intend to give x% for the rest of time.

ZephyrBlu · on June 17, 2024

Not sure why that was your takeaway. Much more likely that he feels that it's dumb he was promoted and put at the bottom of the band by default, regardless of his performance. How large the raise is shouldn't have any impact on this.

SR2Z · on June 17, 2024

> There’s a limit to how many large raises you can give if you intend to give x% for the rest of time.

The tl;dr for this is that if the company makes getting paid a market rate and promoted internally more difficult than just interviewing, they should expect people to just leave.

I'm really not sure how the economics of this work out. Obviously Google has a much easier time swapping engineers in and out (it's responsible for basically everything that people hate about the company both internally and externally) but there are still specific teams where engineers leaving represents significant knowledge loss.

Hell, companies that DON'T follow the same engineering practices that let Google hotswap engineers still do this and there's no way it doesn't have significant hidden costs for them.

Aeolun · on June 17, 2024

Yeah, I didn’t mean to say it’s actually impossible, but if you are at 200k+ and more likely towards 300k/year given the information, you are already at the top of the market for most positions (that I’m aware of).

If you still expect 200k raises in such a position you are likely in for a bad time (nothing is guaranteed though, you might get lucky).

SR2Z · on June 17, 2024

> are already at the top of the market for most positions (that I’m aware of).

This is not even remotely close to true for high-paying SWE jobs in the Bay, and the part of me that grew up in the middle of nowhere still has a hard time believing it.

Staff engineers make between $500k to $1M a year. My first year at Google I was sat in a row of absurdly high-level engineers, many of whom made even more.

Aeolun · on June 17, 2024

> My first year at Google

At Google is already not the norm is it?

jiveturkey · on June 17, 2024

> coherced [sic] to implement it mainly due to GDPR restrictions

Google Takeout existed long before Google Cloud existed, and before GDPR existed. At any rate, it was novel at the time, meaning other vendors were not so compelled. So this is an obviously incorrect statement.

I suppose the article author no longer has the original photos? Because the obvious solution is to start from the originals, and store them in the multiple places.

Lastly, this article is not why. It's how. That's too bad because I am more interested in the why.

twp · on June 17, 2024

I was briefly involved with Takeout at Google.

The manager in charge at the time (a) hated the project and didn't want to take responsibility for it because it was a steaming pile of poop and (b) Google Takeout being a steaming pile of poop served Google's interests of keeping users locked in

senectus1 · on June 17, 2024

yeah, I've just finished exporting all my families photo's. and importing into selfhosted immich.

its almost as good as photo's, better in some ways.

Next will be emails. though thats going to be a while away.. hosting emails sucks.

lupire · on June 16, 2024

All of the Table of Contents links are anchored incorrectly.

sanchez_0_lam · on June 17, 2024

Right, that seems bad. I'll fix this.

prirai · on June 17, 2024

I need a google takein service, seriously.