Another fun one. Go into Facebook and download your data and look in the ads folder. There's a very helpful "advertisers_who_uploaded_a_contact_list_with_your_information.html" page that lists every advertiser who somehow obtained your contact data and ran targeted ads to you. There's a lot of sketchy companies out there who I never had any kind of relationship with who somehow managed to obtain email or other contact data. Bit of an enlightening experience as to just how widespread your personal data is being shared, even if you're somewhat careful of who you give your personal data to.
The facebook dump also includes email addresses and phone numbers that you have deleted from your account. After I removed my contact information from my account, I was curious why advertisers were still able to target my account with these uploaded lists, so I guess that explains it.
Facebook is currently involved in multiple legal battles in Europe about the GDPR, including at least one where it's the instigator. Facebook has different views about what's allowed under the GDPR compared to the people in charge of enforcing the GDPR.
That first link isn't the same thing as what GP is saying, at all. That's just a list of advertisers for whom you have been caught in their advertising dragnet (eg you're in one of their targeted demographics).
Apparently I'm doing something right because mine is empty.
"You have no available activity to show at this time."
Maybe because I've never had a habit of linking phone apps, web apps, etc to Facebook. I tend to keep different accounts completely isolated. Which is why Google really bugs me, I don't like that my YouTube, Gmail, etc are all the same account.
you're absolutely correct.
its only against the tos to create another account if any of yours have been banned. Having multiple accounts is allowed for work/private purposes.
> Many people have more than one Google Account, like a personal account and a work account. Uses like that are fine.
> you're absolutely correct. its only against the tos to create another account if any of yours have been banned. Having multiple accounts is allowed for work/private purposes.
Huh. What happens if you already have multiple accounts, and one of them gets banned?
Any organized efforts to export and donate data? Data is not data in some important senses unless it's aggregated. A "watch-them-watch-us" dynamic can't exist if one side is aggregated and the other isn't.
Mind linking a project if you find it? Currently working on a tool that needs text conversations for data training and would be happy to donate if we can use the data as well.
Not only that - Facebook allows you to create "lookalike" audiences from e-mail lists. So as a marketer, if you can get your hands on a good e-mail list, you're golden.
i'm very curious about something -- people here rave about the takeout -- does anyone actually use the thing?
i've tried the takeout option about a year ago when i wanted to switch from google photos to apple photos and it was an absolute mess, it exported random zipped folders, all broken up (i think there were 14 of thema ll together), multiple images were duplicates, somehow it managed to corrupt apple format of pictures and i would have one picture with the same name that was something like 24KB and another one with 1.2MB, etc.
if it wasn't for the option (which they took away) of syncing photos to your google drive and copying them out that way, i have no idea what i'd have done, i'd have lost about 120GB of photos and videos going back to 2008.
so seriously, do all of you just use the takeout and store it away without actually opening up and trying to use it?
I think breaking up is a ZIP limitation (max archive size is 4GB)
I'm exporting it regularly (although I certainly don't have 120Gb of photos on it). You can choose an option for regular exports (every two months), and delivery method (e.g. Google Drive). Then I have a script that runs daily, mounts google drive, moves the takeout locally if it's present and removes from google drive (so it doesn't take space).
Then indeed, inside you have mess with some data in HTML, some in JSON, etc. But well, at least you can parse it... I have a library which I'm using as an API to various data exports, in particular, archived takeouts too (so I don't even have to unpack them to access)
"Then indeed, inside you have mess with some data in HTML, some in JSON, etc. But well, at least you can parse it..."
how is some regular schmuck that wants to move his data out of google to another service supposed to determine what they actually have to parse? the user simply uploads pictures into the system but gets garbage out?
the scarry part here is that google makes it extermely easy to suck in the data but for an average user it's extremly difficult to get back out and takeout is absolutely not a good solution.
They seem to use standardized formats where possible: vcard for contacts, mbox for email, image files for photos, etc. I'll grant you that they do some not-nice things (like separating photo metadata into json files), but I'm curious what format for search or timeline activity would be useful for a "regular schmuck"?
If said person wants to view the data on their own time, HTML seems adequate. And JSON seems ideal if they plan on sending this data to a new service that ostensibly supports parsing Google's takeout.
I think a big part of the problem is even if Takeout is using standard formats, none of their competing services or software platforms are set up to ingest those formats.
Like mbox is fine for opening in a desktop client, but if you move from Gmail to Fastmail or Outlook or whatever, mbox might as well be a ClarisWorks spreadsheet file.
I just finished found through the takeout process for photos last night. 39 2GB archives. Request took a couple of days to process, then I got an email with a bunch of links that they said were good for 7 days. I planned to load them into my Synology download manager, but the takeout system seems to rely heavily on browser state, so Synology couldn't download them. Each took 5 minutes to download at home over WiFi, and the takeout interface demands authentication every 10 minutes. It also reloads the page and reset my position in the download list on every click.
I've uploaded and expanded about half the archives on my Synology, and it's currently indexing everything, so I can't comment on the photo issues you've mentioned quite yet.
Overall, I'm happy there exists a mechanism to get my photos, but the quality of the experience is truly awful.
More like, they're just wanting to make sure they're legally covered. There is no added benefit for the company in making that experience better.
Source: working on a product engineering team for a larger company. Why would we upheave our roadmap for something that doesn't help provide explicit value to our product offerings? We've got a list a mile long of improvements and new capabilities to deliver on, and no engineer likes working through legal hoops anyways.
I don't see how your statement contradicts your parent comment's. Looks like Google just did the bare minimum to cover themselves legally (as you said). But that shouldn't mean that they can't make the experience better.
As for no engineer liking working through legal hoops, my observations disagree. Many absolutely don't care. Give them a well-described Jira ticket and they'll happily chip away at it for a year if it's necessary.
My brother passed away very suddenly a couple of years ago. Google Takeout and similar services were a godsend for extracting and archiving everything from his accounts. As far as I know, everything exported fine, thiugh he didn’t use Google Photos.
Gzipped tarballs are an option now with what looks like a 40GB max.
$ ls -lh takeout*
-rw-r--r-- 1 ben ben 39G Sep 26 18:29 takeout-20200925T172738Z-001.tgz
-rw-r--r-- 1 ben ben 38G Sep 26 19:20 takeout-20200925T172738Z-002.tgz
-rw-r--r-- 1 ben ben 35G Sep 26 20:06 takeout-20200925T172738Z-003.tgz
I use this one every other month -- download everything and throw it in cold storage. Mainly keeping it for the scary possibility of being kicked out of my account.
I just used it to pull over 350gigs of photos. Not fun. They really make it hard to reimport photos elsewhere. I wish their app just had a download all.
As I wrote elsewhere, previously. I agree it's a mess, but not the end of the world. Extract the file -not- preserving directory structure(-j flag). Now you just have everything in a single folder. Then delete the metadata files by file extension. This took all of a few seconds to do for each zip.
Shameless selfplug: As part of ongoing privacy research me and colleagues developed a website that parses personal data exports/takeouts from google, twitter, instagram and facebook and visualizes the data in a treemap and a timeline. We aim toincrease the awareness of personal data and the effects of online behaviour.
The data is not uploaded and instead parsed in the browser.
For the Google takeout make sure not to include data from Photos, Gmail, Youtube and Drive as they make the export too big. Also select "JSON" for "My Activity".
I do a full takeout every month or so, both for myself and also my parents. It's largely so that we don't lose anything if Google does one of its awesome sudden account disablings.
I have this run every few months and then download and backup the archive. I like to think it will be helpful in case of the dreaded 'locked out of google for no reason and no recourse' situation.
The easiest way to figure out the ethics of the company is to see if their export tool is easy to use. If it is easy, its safe to stay with them, because they are not doing it just for legal compliance. If not, run away from that company as fast as possible.
I periodically use the takeout service and selectively copy to the 1TB of storage I have on OneDrive (which comes with an Office 365 subscription). Having my digital stuff backed up on two cloud providers is enough for my capacity for risk. I used to make a 3rd backup in Dropbox but don’t do that anymore.
I once tried to download my Google photos, and it became super messy. I got tons of different folders all with duplicates and other stuff and it was pretty unusable.
One of the positives of the Google+ shutdown was that Google Takeout saw a major overhaul around February of 2019. Third-party tools (Alois Bělaška's Friends+Me was invaluable https://blog.friendsplus.me/) still proved very useful, and gave capabilities missing from Google's offerings.
Yes! I do the backup OneDrive backup too. (Not a typo, I meant to say backup twice.) And I actually do it for my live Google Drive as well.
I rsync my Google Drive folder to network storage, without deletion, and I have OneDrive syncing that. So I have a local and cloud backup of everything that's ever been in Google Drive.
It's really hard to download from takeout.google.com these days. I have 86 gb data on my google account and is trying really hard to export.
- the export size is around 176 gb. Mostly photos.
- it has the option to move to one drive or box. But 100 gb on Google will be 200 gb on one drive. Images are copied into multiple folders to recreate the albums. Note that google photos automatically create albums for family and trips.
- tried to use 2gb zip to split the files. We have to click and download 100+ pieces of zip files. Even if one file is corrupt we are done. All this shows in a modal window. We can't download more than 5-8 files at a time.
- split it into 5gb zip files. Now download numbers are manageable, but the network keeps dropping and we have to download again. We can only retry 3 times making the entire set useless.
- no options to separate videos and photos.
- we only have a week to takeout and test the whole thing.
TLDR; it's designed to make sure that we don't actually take out the files...
I'm not sure why you're getting downvoted. I went to download my photos today from Google Photos after reading about the end of the unlimited storage. The interface is incredibly annoying. It split them into 2GB zip files, which is fine, but then it takes multiple clicks to get each download to start. Oh, and then it logs you out every 5 minutes so you have to re-enter your password.
It's pretty clear they were not super concerned about making it a user-friendly process.
If you say something negative about Google you’ll reliably get downvotes from Android fanboys even if it’s completely accurate. (Apple has similar fans, although they seem to be less defensive now that the company is doing so well)
It’s not worth worrying over except as a reminder that rating systems need to handle bad-faith voting.
This doesn't mirror my experience at all, and I think blaming it on fanboys isn't fair. I used Google Takeout recently, had it set to split into 50GB tar.gz chunks, and it worked perfectly fine. I wasn't logged out once, and downloaded the archive with around 90 MB/s (720 mbps). It was a very smooth data export experience. I'm very much not a Google/Android fanboy and avoid Google's services wherever possible. The OP is being downvoted because their experience isn't representative.
EDIT: It does seem that there are more people than I expected for whom the experience isn't as good as the one I had, so maybe Google should test this with (and make it work better on) internet connections that aren't as good as their office lines.
I just finished the process last night for about 80GB of photos and GP's comment represents my experience very well. It took hours of tedious clicking and logging in dozens of times to get through it. Miserable.
I’ve also used it and the original poster’s experience rang truer than yours. Most people do not have gigabit internet connections and manually downloading the default smaller chunks is annoying.
Yeah, it's been a real hit-or-miss process for me, too. The process is way too manual, and increasing the chunk size from 2GB to 10GB causes a lot more failures of the individual downloads for me.
And I don't particularly want to hear "get a better network connection". My connection works just fine for everything else.
That said, if you're able to download everything, it's reasonably well-organized, although as stated elsewhere, there's a lot of duplication of data.
But can't you just download larger chunks over whatever connection you have? Or is it common for internet connects in (I assume) the US to be so bad that you can't download a 20GB or 50GB file without errors?
It’s really a question of how well it resumes: if you’re trickling in a 20GB file and get a couple lost packets at 10GB, was that earlier transfer wasted? Most people use wireless so it’s not hard to have a transient failure which will disrupt a TCP connection but won’t last that long in absolute terms.
not sure why you're getting downvoted, i just wrote about the same issue i had about a year ago, it's an absolute mess for photos exporting, i don't think people actually use this service here, or they use it without actually looking what's in there.
Before downvoting, just try to use the new interface and try for youself. I have used takeout about 2 years back and it was okay. Ofcourse I had just few gigs at that time.
I have tried to spin off an Amazon EC2 instance to download and copy to an S3 bucket. But it logs out every few minutes disrupting the downloads. It will not allow to down the same file multiple times. If one zip fails, the whole set is useless.
Those aren’t good solutions, they’re workarounds - and if you think about them even slightly you’ll realize they’re not very good:
1. The interface requiring multiple downloads prevents automation or simply waiting out a large transfer, and not having a robust automated retry mechanism ensures wasted time and increases the odds of data loss.
2. Few people have a high-speed free WiFi network nearby. You’re not getting better results at Starbucks or the local library, and Google’s campus networks require logins even if you are one of the few people who lives near one.
3. Setting up a VPS and running downloads from a web app requires money and skills most people don’t have, especially if you care about not accidentally leaking your personal data. If you have enough data to matter, you’ll also hit many providers quota limits or bandwidth charges. If you navigate all of those challenges, you still haven’t solved the problem of getting it home - at best you can now use rsync to remove the manual component of the second transfer.
GoogleGuest is an unsecured SSID at every campus, if you have one nearby. It generally reaches the parking lots (which are conveniently empty right now)
https://about.google/intl/en_us/locations/ ... not as isolated as I thought, waiting patiently for the geo-data people to work out an approximate census of people who live within a 50-mile radius of a google office. :-)
Even though I love using Google services, account being blocked and losing access to everything from domains to personal photos and all important docs in drive is a concern that is haunting me since seeing an uptick in these stories. Maybe it is that I am observing them more but in any case reducing Google dependence and having a strategy for account block scenario has become a need due to the large impact it would have on my life in the current scenario.
I really wish google takeout had an API to request takeouts at weekly or whatever works, instead of the bimonthly (once in 2 months) option currently offered. Then one could keep data in Google services and not have serious concerns as at most the loss would be of a week's worth of data. Also on this note- does anyone have any recommendations for domain registrars? I cannot continue to have my domains on Google even though it has a great UX.
I don't use Google a lot, but I have an actively used gmail account. I never give its address to anyone and all incoming mail is forwarded from other addresses. I don't send any mail from that account either.
I am somewhat worried that their great algorithms will one day decide it's there is some violation of their ToS and close the account. I had that issue even with a paid (well, voucher that came with a PC, but still) Microsoft Drive account, I used in an atypical way (no sharing, all contents encrypted).
In the past I used IMAP for backing up my messages, but over time my scripts to do so have fallen into disrepair... Whether takeout would be a way to do somewhat regular backups? Or might that trigger their algorithms, that you are not a good customer? Has anybody read the ToS whether anything is mentioned about takeout?
Hey, I have about 17tb of data. Is it even worth doing takeout? Are there any programmers out there who might wanna take on directly transferring this data to a Google one unlimited account? I have seen some third party companies. But I just don’t trust them. I need someone who will write a script to run a download and upload offsite where there is a fat pipeline and instant data transfer.
"You have Advanced Protection switched on, which means that it could take days or even weeks for your files to be ready to download, but we'll email you when they're ready."
Same. Do you know a good way to add meta data to songs? I’ve been carrying around a library since I was a teenager and I when I moved it from Google Play to iTunes, a bunch of the meta data was missing.
I've used Takeout. Nice for them to provide a bundle. That said, I'd love the option to export my Messages content, and that's something I haven't seen anywhere. Has anyone successfully done this for Google Messages? I don't mean backing up to Google Drive, I mean a full-on export.
Yes it does (I’ve done this). However up until recently you could NOT export yt vids from any “sub” yt accounts. Ie I have one google account but 4x YouTube accounts under that google account. Google support confirmed that you could only export your YouTube uploads for the primary account. However as of about 6 months ago this was fixed and you can now takeout yt vids for all “child” accounts as well (and I have confirmed this).
Altough it is a bit janky (as is all of takeout) as you need to use a separate yt/takeout interface to do so.
I just deleted it all, it was easier, now I just keep it all on my machine. No more takeout--home cooked meals. I can see why it doesn't appeal to all though.
I wasn't using Google for anything significant so it was pretty easy to delete it. If I care about a picture I'll print it and keep it in a book, I would never trust a corp that's only been in the data business for 10-15 years, especially one known for stealing your information for advertiser's gain.
You don't have to "trust" them. In fact you shouldn't. That's why you encrypt locally first. And if they lose your data you still have it locally. It is good to also have two local copies.
Once the export is done and I see something I don't want to exist anymore, where do I request that data be permanently removed (GDPR style)? Asking as I honestly don't know.
> If you do not normally deal with data protection requests, please forward this email to your Data Protection Officer, or relevant member of staff. Please note that you have 30 days to comply with this request.
Options are GDPR which applies to European Union residents and CCPA which applies to Californians. Is there anything for people within the US who are residents of the other 49 states?
Exactly. Unfortunately GDPR failed to realize that most data companies hold about us is inferred. Data takeouts usually provide you with all data you willingly provided (uploads, reviews, likes, playlists, etc). But most interesting information is missing. How often did I watch each video? When did I open each e-mail? There's so much data that companies collect about your behaviour that is never given back to us.
It's entirely possible to collect information to identify a unique human without it being considered PII - combine it all together and maybe add a sprinkle here and there (perhaps public domain info, buying "anonymized" info) and boom, you know who it is.
Yet, if you're audited, it's just a series of IDs and numbers, nothing identifying there...Right?
If you ask Spotify for your data dump, you'll notice in a lot of the .JSON files the information is encrypted such that you can't understand it (it's just numbers). It's impossible to say whether it's actually stored like this or if they encrypt it before they provide the archive to you.
Meta data is almost impossible to legislate against, and as far as I can see, is entirely legal to collect and use as you see fit.
How many people in the world are on hackernews, named "lopis", use Firefox 65, have an IP address in $country, use this screen resolution etc etc
You are identifiable by combining multiple factors, even if every individual factor is not enough to be identifiable.
If I understand it correctly, the EU 'personal data' (PD) concept is much wider than the US 'Personally Identifiable Information' (PII) concept. You are touching one of the differences here.
For GDPR purposes, data is PII if it can be used in combination with any other data to identify an individual. Doesn’t matter if the individual data points are not themselves identifying.
One thing I’ve been curious about is whether AI and algorithms that can potentially take a huge amount of anonymous data and “identify” a user (but not explicitly), only identify in the sense that the output of the AI was only possible by correlating individuals granularity enough. I’m almost certain the answer is yes. I’m not clear on whether GDPR addresses that issue or not.
> For GDPR purposes, data is PII if it can be used in combination with any other data to identify an individual.
By that definition, all data is PII. There is no information available on this planet that has not been influenced by people.
I'm not trying to be obtuse. I worry about this problem a lot. Obviously we need to keep companies from doing stupid stuff like storing the first digit of a Social Security number (can't identify someone by that!) and then the second digit (also not uniquely identifying!), etc.
On the other hand, what if I have web log files that only store URL, timestamp, and status code? Is that OK? If I get hits for two specific pages within a couple of minutes of each other, and there's only one person on the planet who would know about both those pages, I know they were visiting my site at that time.
People influence the world around them and it feels like privacy laws are trying to prevent companies from understanding that influence. At the same time every other incentive is pushing those companies to understand more.
> By that definition, all data is PII. There is no information available on this planet that has not been influenced by people.
I think that is a step too far. For example, it seems quite clear that a dataset of daily average temperatures from the top of Everest is not personally identifying information.
Black hair = PII, address = PII, drives black BMW = PII, any of this information together with other information could be used to identify an individual and that is exactly the issue. It is like saying that one brick is a house just because multiple bricks can make a house. If you gather enough data you can potentially point to specific individual. Just like unique PC fingerprinting - gather enough data points so that the fingerprint is unique.
AFAIK according to the GDPR, knowing each individual fact is fine. Only the combination is PD.
Hence, installing a camera that counts black-haired people, another that counts people entering some location, a third counting people having a BMW is perfectly fine. Merging the 3 recorded tapes to identify a person is not. Giving the 3 tapes to someone else is only OK if you guarantee somehow they wont do the merge.
Privacy laws are mainly aimed at allowing those whose data is being used to be aware of this, understand what is used for which purpose, and to elect to control this should they object.
My gdpr requests have usually included inferred data. I'm not sure if it was facebook or tinder's that showed a giant list of categories they thought I fitted in, which was btw hilariously wrong (I'm a 30 y/o single male and I was categorised as a single mom, for example).
GDPR does cover inferred data. Source doesn't matter. Only whether this is data about a specific identifiable person and whether it's covered by the list of protected types of data.
Right answer: You can go to [1] and [2] and [3] and ask them to delete your information. It's important to retain a copy in writing that they have removed your information. If a copy is ever found online (in a data breach or otherwise) you would be able to enact legal rights as a result of their GDPR breach. I would encourage people who upload data to leave "fingerprints" in their accounts, such as certain photos, emails, and other data that you have ONLY created on this service (for example, email your own gmail account a unique email, if it's ever leaked, you know where it came from).
It's the same way Spotify's GDPR tool does NOT give you all the information they store, yet if you ask via their DPO (usually privacy@) you get a lot more data, rather sneaky way of hiding their true data collection.
ALWAYS use email or a physical letter, ALWAYS get a reply by the organization when enacting your GDPR rights, your lawyer/legal authority will be very thankful ;)
AND NEVER EVER USE AUTOMATED TOOLS! The chances are, there is data that isn't included within them. For example, go ahead right this second and submit a SAR for "technical log information" to Google, this data is NOT included in their official tools and you will be amazed how much they're storing!!
Google Takeout is a great solution to the problem of, how do we get Google programmers, without protest, to write functionality into their services to allow Big Brother to acquire a neat and tidy copy of all a user’s data?
I’m not complaining. I’m marveling at the clever solution to the problem of, “How do we get hundreds of product teams to support our legal obligations without feeling morally conflicted?”
Could you elaborate on that? Do you disagree that Google Takeout is used for government user-data requests, or that absent such functionality, programmers tend to balk at implementing such requests?
Google takeout is an inefficient tool for government requests, yes.
I'm sure it may get used but I highly doubt it's a primary tool in any way. PRISM revealed how much custom tooling is made specifically for governments and for giving back data to them, automatically addressing their requests etc. Takeout is slow, bulky, and its audience is the end user.
Legal processes take priority over virtually anything else; be assured that every large company has gotten and responded to valid legal process before these download tools were available.
Additionally, the scope of what's provided in response to legal process depends heavily on who and what was requested and so the software is far more complex than "download all the user's data". Investigators often aren't aware of what data these companies store and if it's not specifically requested then it's not provided. Lawyers basically copy and paste their last successful warrant/wiretap/whatever and send it to the judge because that's how the legal system works.
I don’t disagree with any of that. The Google Takeout interface makes it very easy for the user (or person responding to a search warrant) to pick and choose exactly what data to zip up.
Hey I see you got your Tinfoil hat on, but seriously what? Do you think its really tough to find a 50 group of bean counting programmers in entire google, who cant build this tool for BigB?
The functionality needs to be written into every product. Every PM would be aware of it, every software engineer would see the code. The story would leak and articles would get written about the “pervasive privacy back doors written into everything at Google”