Hacker News new | past | comments | ask | show | jobs | submit login
Facebook's Download-Your-Data Tool Is Incomplete (privacyinternational.org)
266 points by Garbage on March 2, 2020 | hide | past | favorite | 84 comments



This is a topic that I'm intimately familiar with, thanks to a bizarre set of circumstances (and a ton of reverse engineering). Story above, technical details below:

Part 1:

A couple years ago, I noticed that the number of photos I was tagged in kept going up and down, as a couple of people I knew would disable their accounts occasionally, and re-enable them a couple weeks later.

I manually the images from them, but wanted a way to automatically scrape any images I was tagged in, so I wouldn't need to do this manually.

I got myself a Facebook Graph API key and created a sample app with full account permissions, only to discover that Facebook won't let you export photos you're tagged in (that you didn't take). The numbers the API reports are wrong, and there's no indication that it's being purposely redacted.

As a result, I wrote a tool that crawls a profile given a set of authenticated cookies, and essentially clicks the download link automatically on every photo. This worked decently well for a couple years, and continues to work to this day.

Part 2:

I had some spare time on my hands in December 2019, and wanted to write a tool to browse chat logs from across a variety of services (Facebook Chat, Hangouts, SMS), such that you'd be able to click a name and see a chronological discussion, regardless of what service it was on. I downloaded the Facebook data dump, figuring that was the easiest way to get access to my Messenger data.

The Messenger dump revealed a few things that surprised me: * The character encoding is messed up, and requires decoding as Latin1, then re-encoding as UTF-8

* Some messages are straight up missing, despite being in the UI. The dump is supposed to include attachments (images are included), but is missing audio messages / voice snippets, presumably among others.

* If a user has deleted their Facebook account, the username will appear solely as 'Facebook User', so now you need to figure out who you were actually talking to. Some conversations were very obvious, but others involved wasting a ton of time on and involved dumb techniques (like finding Adium logs of the same chat from an old computer).

To identify certain conversations, I started scrolling back through certain Facebook posts (which I wrote), to figure out who had been at certain events with me (to narrow things down). I read a bunch of comment threads that didn't appear to make much sense to me, until I realized that anyone who deletes their account also has their comments removed, so basically all old comment threads are somewhat nonsensical if anyone in the conversation has since deleted their account. For comparison, deleting a reddit account changes the ownership of a comment/post to [deleted], which seems much more appropriate.

Presumably wall posts (including happy birthday messages) from people who have since deleted their accounts are also removed, which is exceedingly shitty - if someone sends you a greeting card and then dies several years later, it's not like the post office comes to your house to take your cards back in the middle of the night.

Part 3:

Because of this, I figured that the only way to mitigate future data loss on Facebook is to consistently archive things. Since the 'download your data' tool is basically useless, I started work on a tool that scrapes the site and "decompiles" pages into raw directed graph DB rows, which can be re-rendered into a new version of the site. It features a reasonably complete implementation of Facebook's TAO (https://www.facebook.com/notes/facebook-engineering/tao-the-...) on top of PostgreSQL, and works decently well - notably, it also maintains things like proper links to profiles and stores all assets offline.

Writing a bug-compatible "decompiler"/"recompiler" taught me several things about how the site works (or rather, doesn't). Here's a small list of errata I've discovered along the way:

* Objects can have multiple FBIDs

* FBIDs can contain comments/reactions

* Since there may exist multiple FBIDs for a given object, it's quite common for multiple comment threads to exist for a given item, such that commenters on one don't see the responses on the other (and vice versa). Several of my friends have confirmed finding disjointed discussions on their posts after discovering this bug.

* Facebook has several types of deprecated reactions that they store in the DB, which cannot traditionally be viewed from the site anymore. Sucks to be you if you reacted to something that way.

* Certain objects can get lost in their UI, with no easy way to find them. Uploading a photo in a post will put it in your Timeline Photos album, but uploading a photo as a comment to someone else's post will basically make it impossible to find again.

* The number of reactions/comments on a given post is often wrong - this isn't the traditional bug due to eventual consistency, but rather is due to not adjusting the counts for items when a person deletes their account. To a certain degree, this will show you how many people that interacted on something have departed the site.


I'm glad to hear the last section of Part 2 works the way it does. In reverse, if a person deactivates their profile, all comments you made on their posts completely disappear from your controls. If they reactivate, your comments are back.

I used "Social Book Manager" in the chrome store to (painfully, repeatedly) flush out all content from my account down to the last Like. Walk away and your account appears 100% empty until a reactivated friend occurs, then all your stuff linked to their content magically reappears for you to try and delete again. So I sat on it for another year, periodically logging back in to flush out whatever hidden content had reappeared.

Having now (this Jan.) deleted my supposedly "empty" 10yr old account, it's good to know the hidden content is being removed properly, as my intent was to 100% scrub myself from their service and boy did I try.


When you interact with a piece of content owned by someone else (or transfer ownership of your content to someone else), you shouldn't expect to retain full ownership of your contributions. If I make a post and then delete it, all comments (from everyone) should be deleted because the parent object is now gone. However, if Bob makes a post on Alice's Timeline (e.g. a Happy Birthday message), Bob shouldn't expect to fully control that object anymore - Alice's friends can now fully see that post (regardless of what Bob's privacy settings are set to). In fact, if you look under the hood at Facebook's DB, you'll see that the row that Bob created has ownership transferred to Alice (AUTHORED_BY/AUTHORED <-> Alice), with a secondary pointer to Bob that's used to draw the "Bob > Alice" text.[0] Bob should not be able to delete this anymore because he no longer owns the object. You can argue that Bob should maybe still be able to edit the content (and replace it with something new), and I wouldn't disagree, but Bob should not be able to delete the fact that the row ever existed. Comments on someone else's post by a comment author that deletes their account should be transferred to a [deleted] account, because that maintains the integrity of the discussion for everyone involved - it just drops 'who' wrote a particular line of text. If you want to literally replace all of your comment text with something like '-', I'm not going to argue with that desire (or suggest you shouldn't be able to do it). However, it's insane that the rows themselves disappear, because there's no indicator that anyone else ever participated in that discussion (and thus most old comment threads appear schizophrenic).

[0] One hilarious side effect of replacing the object's ownership is that technically, all of the messages Alice received wishing her "Happy Birthday" appear to be written by herself. You have to find the additional TAO edges to reconstruct the true post author.


> If I make a post and then delete it, all comments (from everyone) should be deleted because the parent object is now gone.

Why does the ownership start at the post level and not the comment? If you consider that comments shouldn't be removable, that the post is now the owner of its child, what about the higher hierarchy. Is the forum category creator is the owner of the post too?

> However, it's insane that the rows themselves disappear, because there's no indicator that anyone else ever participated in that discussion (and thus most old comment threads appear schizophrenic)

If the cost of staying owner of what you post is a few comment threads that "appear schizophrenic", it doesn't seems like an high cost to pay.


There are some valid points to keeping comments for posts that have since been deleted, and I don't necessarily disagree with them. For the sake of simplicity, however, I'm assuming that all of the comments on a typical Facebook post aren't going to be relevant if the post itself is deleted. On Hacker News, however, if the URL for this post stopped working, I think it's pretty clear that the discussion below continues to hold merit.

The ideal solution to comment threads would be to replace the usernames of people who simply delete their account with [deleted], but keep the content unless someone explicitly overwrites it. The biggest issue isn't that the content is gone (any given comment), but that nobody knows how many things are missing. It would be much better (with no loss in privacy) to at minimum just replace those comments with null text and a null owner, to at least show that 3 rows used to be here, between these existing rows.


> There are some valid points to keeping comments for posts that have since been deleted, and I don't necessarily disagree with them

Then your argument isn't about parent-child relationship, but value of the actual content. I can assure you, even if you don't give it value, there's one.

The funny thing, this is most probably why it's important to delete it if someone want to delete his content. It's his value, not yours, not Facebook's, no one else.

> It would be much better (with no loss in privacy) to at minimum just replace those comments with null text and a null owner, to at least show that 3 rows used to be here, between these existing rows.

I don't care about this.


Facebook is like a mail server and address book, that at any random time deletes emails you got from someone (when s/he quits Facebook), and deletes that persons name and phone number. Like a phone book, that sometimes self destructs one page here and there, and then you notice some time much later.

Better download & backup messages regularly, and also exchange email addresses or something else with the other person.


Great post! Thanks for all the details.

Do you have any plans to release this? I could understand if you don't want to, since it might violate fb's ToS or get blocked in some way. But certainly many others would find it useful.


There's a few issues that prevent me from currently releasing all of it:

The code is currently in two repositories - the first is a generic runtime I wrote that provides certain subsystems to a bunch of unrelated plugins (a small percentage of the total number of plugins relate to social networking), so I would either need to open source that repository as well, or port those plugins to something else. While I legally own the code for the first repository, it currently forms the basis of a company I founded (the social network plugins were written as a joke for personal reasons, and have no business use case - it was just easier to do this way), so I'd rather not release that just yet. I'd either need to replace the calls in those plugins (not too difficult, just annoying), or release those plugins without patching those calls with a disclaimer saying something like "drop replacement scheduling service here". Almost all of the runtime calls used by the social networking plugins are basically just for scheduling how frequently they should re-scrape things, and use almost none of the other features.

The second repository is in charge of actually viewing/browsing the data, and doesn't concern itself with repeatedly obtaining it. This repository will likely be open sourced soon, once I make the UI nicer and fix some bugs. It also contains some 'one and done' import code, for loading in data from things that just need to run once (importing from legacy chat systems, like Hangouts, Windows Phone's SMS DB, Adium, Pidgin, and a few IRC clients I no longer use anymore).

As for the social networking plugins in the first repo, these fall into one of two categories: the first are "things that access legitimate APIs, or otherwise just grab public data (e.g. YouTube)". The second category, which is mostly just Facebook, is "this logs in as you with your username/password and downloads a bunch of shit by actively impersonating you". The first category isn't really encumbered in any way, only the second one is.

In the case of Facebook, there's a few specific issues that I have to deal with. The first is that there's a bunch of exploits that it uses in order to scrape everything, and they could theoretically be patched if someone at Facebook sees what I'm doing (which is also partially why I'm not using my real name here). The second is that some of the code may or may not be correct - I've had situations while developing it where I made a basic assumption that turned out to be wrong (like objects having one primary key, comments only going two layers deep, or guessing the wrong author of a post by the username in the URL). Since this is mostly guesswork, I occasionally have to mark a bunch of rows as 'untrusted', and invalidate them if I find that an assumption I made turns out to be very wrong. This has happened a few times, and it wouldn't be nice to tell people that their DB dumps have a bunch of errors and they should just throw it all away. Scraping is hugely problematic if you don't fully understand what you're grabbing (as an example, if you just download the HTML for a given page, you can't necessarily grab the images later as all the URLs have expiry tokens in them, so the HTML scraper also needs to be aware of photos). There are certain object types that I haven't fully figured out how to decode, and I don't want to have tons of people constantly re-indexing the same URLs over and over again because some code I wrote is buggy.


I take an opposite tack, so I've enabled tagging approval and every time someone tags me, I remove it.

On top of that, I use a tool that goes through my activity log and deletes all my posts, comments etc. from a selected period of time. I remove everything older than six months, nothing on social media is worth saving, it's ephemeral and transient.


Please stop this. This is a huge violation of your friend's privacy. Just because a friend has shared something with you on facebook doesn't mean they expect it to be in someone's personal cambridge analytica forever.


I'd be curious to hear from other people with similar viewpoints, but I personally disagree. Cambridge Analytica was bad because it was a third party receiving data from others, and those people had zero reason to think that their data would ever be shared with a third party. In this case, all of the data I've obtained is either public, or was explicitly shared with me by those people in the first place. If you're arguing that I shouldn't be able to hold data that someone else wants removed, then I'd ask at what point we should be deleting your memories of events that someone else wants repressed.

To be clear, if I want to create the equivalent of a data-hoarder bunker, I don't think there's anything wrong with that. I do, however, agree that sharing things with people that couldn't see it themselves (expanding the original audience) isn't something anyone should ever do. I'd also like to point out that once someone removes something from Facebook (or any other service, for that matter), the authenticity of copies of that information is now debatable - I could scrape a bunch of stuff and show people, but you have no way of proving whether or not that information is authentic. For all you know, my script makes any posts%5 super racist.


People post on facebook assuming the privacy model of facebook, where they have revokable control of who sees what they share. What do you think your friends would say if you told them you're keeping a personal copy of their photos even if they try to delete them?


I've actually shown this project to several people with varying degrees of technical knowledge. The technical people have generally already considered the privacy implications of what they've shared on the internet, fully expecting most things to become public thanks to crappy code/policies. They're usually more interested in what a rewrite of TAO looks like, or how you decompile rendered frontend code back into accessible JSON.

As for the less technical people, they tend to be significantly more annoyed at Facebook when I tell them that their old comment threads / messages are likely incoherent junk.


The friend published on an Internet public platform and added metadata to tag ancestor poster.

Where is the reasonable expectation of privacy?

If you share anything at all over Facebook, you should expect it to be in someone's personal dossier of you, forever. It's not always going to be just the people who know you and presumably bear you good will, either.

That's exactly why I don't post photos on Facebook.


> If you share anything at all over Facebook, you should expect it to be in someone's personal dossier of you, forever.

No, they shouldn't, and this why we have privacy laws.


You think that everyone always obeys the law?

Please tell me how you expect to enforce such laws without intrusive surveillance on everyone's networks and storage that would entirely defeat the purpose of having them.

"Three can keep a secret, if two of them are dead."

Even if Alice sends a file to Bob that is entirely safe from Eve while in transit, Alice has no recourse whatsoever from within the channel if Bob then turns around and just hands it over to Eve once he recovers the plaintext. Facebook is an untrustworthy recipient. Once you hand over any data, you lose your absolute control over them.

They have the capability to hand over your data to untrusted third parties, there are no safeguards in place to prevent them from doing so, no surveillance in place to detect when they have done so, and no effective recourse for any individual who feels that they may have been injured by it. We have sufficient evidence to believe that Facebook has done it in the past (Cambridge Analytica being just one well-publicized example), without even needing to know whether it was intentional, accidental, or paid business. I therefore conclude that they are still doing it now, and will continue doing it in the future.

Anything you say or do or remember in front of Facebook can and will be used against you in the court of commerce. The machines will process it all to squeeze the pennies out, and your privacy is an issue only insofar as it may impede the flow of data you willingly hand over to them in the future.


Unless you're referring to something outside of GDPR, those laws explicitly exempt individuals not acting as processors.


How is this a violation of the friend's privacy?

The friend shared with Facebook, and Facebook intends to store the data forever. The friend shared via Facebook with the GP, and GP could go and read/view the data as often as they wished via Facebook.

Yes, GP shouldn't subsequently share this data with other parties, and GP should take care to secure this data. But why shouldn't GP store the data for their own use, though? Is this any different than archiving emails from the friend?


Facebook deletes the data when a user asks. OP is explicitly trying retain a copy of people's data against their wishes.


When you share something, whether it's publicly or within a closed loop of social media, it is inherently not within your control anymore. You can NEVER create any kind of guarantee that clicking "delete" erases every potential copy of the data. Folks need to have that in the front of their mind when they create and share data on the internet. Full stop.

Now, morally, I could maybe understand where you're coming from. It's more of a "jerk move" than it is subverting a technical promise.

But I backup my Telegram messages occasionally through their data export tool. Are you proposing that I cross-reference my own backups with messages that get deleted from our chats? Same thing with WhatsApp.

Assuming I'm not commercially monetizing those backups, I consider it well within my rights to have a copy of conversations I've held with people in the past. And in fact, it may be by design that I don't want them to manipulate the "cloud copy" of our conversation in the future.


> someone's personal cambridge analytica

We're talking correspondence logs, not private investigators. Similarly, GDPR applies to a company tracking birthdays but not the one on your toilet. That your visitors can see all your friends' birthdays is not a personal Cambridge analytica, that comparison doesn't make any sense.

Should I delete that we talked to each other today in a few days when it is no longer relevant? As it is, this will be here in perpetuity, doesn't that seem dangerous as well? (I'm genuinely curious where you'd draw the line since your example isn't in line with what virtually anyone else would feel.)


This is a semantics argument. Facebook considers "your data" things you uploaded to Facebook intentionally.

They do not consider "things people learned about you by watching you" or "things people said about you" your data, they consider it their (the watchers or speaker) data.

It's not all that different from how photography laws work in different countries. If someone takes a picture of you, is it their picture, or your picture?

EDIT: Im in agreement this doesn't match up with GDPR. Outside of Europe, facebook can treat "Your Data" differently, and the button can function differently in different places.


This is disingenuous.

If Alice wrote on Bob's wall, it should be a part of both Alice's and Bob's takeout. Similarly, if I am tagged in a photo, it should be a part of my takeout. Imagine if my email provider (Google Gmail) said you can only takeout the emails in your sent folder.

In fact, I'd argue if Alice has made their contact information (email, phone number, physical address) visible to Bob, it should be a part of Bob's takeout. Including Alice's location history (provided Alice shared it with Bob) would probably be pushing it a little but only because it becomes difficult to argue for people with whom too many people share data with but anything that Bob is explicitly and manually tagged and is visible to Bob on Facebook in should definitely be a part of Bob's download.

Facebook can't have it both ways: you can't make it easy (by default) to share and still use the data as a moat.


Following your metaphor, what about email between Bob and Carol which mentions Alice explicitly, using her name and email address? Should Alice be entitled to a copy of that private message between Bob and Carol?

Does information become yours if it is about you, even if you aren't a participant in it's creation or distribution?

It's easy to argue that advertising distribution lists are private conversations between FB and advertisers, and hard to argue that they're not. Saying I'm entitled to a copy of that conversation if I'm mentioned becomes a hard position to take.


No, because you weren’t party to the conversation. This seems like a pretty rugged rather than slippery slope.


So to be clear, you agree that advertising distribution lists shouldn't be included in the takeout request? GP's point was that you're not a party to the conversation in that situation either.


I don’t think I’m asking for much here. If I have access to it on Facebook on my profile, in my message box, or somehow associated with my profile, it should be a part of my takeout is all I’m saying. How is this controversial?


Does you being tagged in a photo make it your photo? Or do just want the metadata, that says "you were tagged in person x's photo number #2345345?


I don’t claim exclusive rights but that photo should be a part of my takeout.


Obviously not. The cases are where parties are explicitly included: my timeline, tagged in photos, etc. Mention someone, w/o direct linkage doesn't include that communique in their dataset.


The posted story is about exporting the list of parties who UPLOADED your contact info to facebook. An advertiser who bought a list, and presses import.


"Heres a list of everyone with your phone number in their phone" would be a weird twist on transparency, privacy, and disclosure.


If facebook has it, why shouldn't we?


I am not saying this is the same thing. But.

Should you be able to ask for every photo ever taken of you? Lets say someone in public takes a photo, its their private work, copywritten, and never distributed to the public. Youre in the background. Can you request a full copy if their image?


Honestly, that would be pretty cool. And, in an interesting version of this world, a company like Facebook could have enabled such a self-awareness. And done it with the intent of allowing people that kind of self-growth.

It's odd that the imagined objective data-collection about one's self that many people dream about actually exists (and many could use for self-improvement), and it is instead only available to those wishing to extort us. And that that kind of socialized agreement that all may contribute to each others' growth is now impossible, but not for technical, but for weird policy/economic reasons. I'd love to be able to purchase every photo ever taken of me at Facebook's market info rate - what, $200?


Even if one were to agree that this would be reasonable, would you say facebook would be liable if the tool failed in some cases? And how do you enforce that, one way or the other?

Algorithms that recognize people in photos aren't (and probably never will be) 100% accurate, especially in this context (people in the background of photos will often be partially obscured, poorly lit, maybe really small/far away, etc.).


If that photo was sold to corporations and used for some revenue-generating service I would say yes, I should be allowed to see or request them...isn't that what Facebook is doing with the data? Distributing it publicly?


I'm not sure what your point is. Are you trying to argue that all data Facebook has should be public? That sounds like the opposite of privacy to me.


If Alice is moving her phone book between Apple and Google, we do want her data export to include Bob's phone number. Anything else would be absurd.

If Alice is sharing her facebook profile with an online personality survey, we don't want her data export to include Bob's phone number. That would be a huge invasion of Bob's privacy.

Alas, it's impossible to know what Alice is planning to do with the data when she downloads it.


What you want is an egregious breach of privacy. Just because you have access to Alice's personal data does not mean it belongs to you.

In fact, I'd argue that EU would not be happy if the data people downloaded included information about anyone but them and only them.


What if you licensed or bought it from someone who legitimately obtained it from Alice?

There are lots of ways it can become your own data, if you subscribe to the idea that data can be owned.


Email is not a good analogue because it doesn’t have the concept of revocation or editing. Facebook posts can always be deleted... and Alice can always delete her post from your wall...


> I'd argue if Alice has made their contact information (email, phone number, physical address) visible to Bob, it should be a part of Bob's takeout.

What about Alice's friends-only posts? What about the comments on those posts? What about Charlie's posts, where Charlie is a friend of Alice and has set their posts to "visible to friends-of-friends"?


> If Alice wrote on Bob's wall, it should be a part of both Alice's and Bob's takeout.

What about copyright?


Covered under Facebook's ToS


as basch said, that's not how copyright works in photography. If you're tagged in a photo as a subject of that photo, it is not your photo, and if you take a copy of it then that's a breach of copyright. It belongs to the photographer.

Our expectations don't match with actual law here.


We’re not talking about copyright, though - we’re taking about privacy rights, and - even if we discard compelling fair use arguments - everyone who uses Facebook gives them a license to distribute the photos they upload to the site to others for various reasons. Copyright law doesn’t really come into play, and of course much data involved here isn’t even copyrightable.


its an analogy, im not bringing up copyright law as something that impacts the situation, its an illustration of a similar concept.

Some countries treat photography as copyright as the priority, others let privacy law take precedent.


I think this is a country- and thus law-specific case.

In Germany, AFAIK, if the picture has less than five people on it, it is considered a photo of you, and both you and the photographer have a copyright on it.

EDIT: Maybe not the entire copyright, but you definitely have a say as the subject in such a photo, including barring it from the public against the photographer's wishes, if you please.


this is true, and we obviously have Model Release Forms for a reason, but the point remains that just because it's a photo of you, it's not your photo.


If you explicitly and manually tag me in a photo and it shows up on my profile, it should be a part of my takeout. I don’t understand what you’re having difficulty understanding. I never said use machine learning to comb through all photos ever uploaded to Facebook and add them to my takeout. I don’t think what I’m saying is controversial at all.


Using the photo commercially may be a breach of copyright. Having a copy I do not think is and likely Facebook could fix that I'm their tos.


The entire music industry fought global legal battles for decades claiming that making a copy of something that someone else holds copyright on, even for your own personal use, is a crime.

I don't think you can waive copyright with a ToS, either. I could be wrong, though.


You don't "waive" the copyright, you just give Facebook a worldwide, royalty-free, irrevocable, etc, etc license to do whatever they need with the photo, including adding it to the data download archives of the persons pictured.


"Facebook considers "your data" things you uploaded to Facebook intentionally."

Thought experiment: How is Facebook different from a file hosting service.

Do file hosting services provide access to analytics? Do the hosting companies take the position that any information about who is accessing the user's files belongs to the hosting company, not the user? Can user's control how a file hosting service "promotes" their files.

Facebook has never been transparent about who is accessing "your data". Users have a limited view of who is looking at their profiles and posted content. A Facebook account amounts to a "personal website" for many people. It is their own set of globally accessible (but not necessarily "public") webpages, often displaying their own content, hosted on someone else's (Mark Zuckerberg's) public website, allegedly with per page "access controls". Free web hosting, with extremely limited analytics.

Imagine if a Facebook user could look at a log of every username and IP address that accessed her Facebook profile each day. Pre-Facebook, there were early "social networks" that revealed to each user the other users who had looked at their profile. It is interesting that Facebook has never done that. Perhaps having this information would be eye-opening for many Facebook users. Those accessing a user's Facebook profile and posted content may not necessarily be her "family", "friends", "colleagues" or "people [she] may know". Perhaps the reasons a user's profile is being accessed are not ones that the Facebook user might expect, nor agree with.


Facebook has my phone number.

I have never given facebook my phone number.

The only reason I know that they have my phone number, is because a couple of years back they decided that getting me to enable two-factor auth would be much easier if they prefilled the phone number field with my phone number

They probably got this phone number from the address books of any friend of mine with an android phone (as Android required accepting all permission request to use an app, until a couple of years ago),

When I go download my data from Facebook, the data doesn't show that they have a phone number linked to my account

Under GDPR this is unacceptable


Perhaps I'm just playing devil's advocate here, but is it possible that your phone number was just autofilled by your browser?


And, under GDPR, Facebook should receive a huge fine for it.

Just that, in practice, they won't.


It doesn't matter what facebook considers "your data" when it comes to GDPR


> ‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;

Facebook might not consider it data, but the GDPR does.


This appears to be a legal argument not a semantic one.

Facebook is required to have a tool like this by the GDPR, and the claim is that the tool they are providing does not comply with the law.


They have also made a very insidious change to their /ads/preferences page within the past month or two:

1. Entries on the list of companies that have uploaded your contact information to Facebook cannot be blocked by simply clicking an 'x' on their name. You must click for "View Controls" and then click on two additional buttons to not allow them to target (or exclude) you.

2. The list is in RANDOM order (as far as I, a user, can tell)

3. There is no ability to distinguish between blocked and unblocked entries from the main page. Users must click into each entry to check the status.

4. New entries are not highlighted or indicated in any way. Because of this, and combined with facts 2 & 3, users must now re-check each entry manually in order to be sure they are catching any new entries that arise.

I get new entries on a near-weekly basis despite now (but not in the past) using a dedicated Facebook-only email address and removing my phone number and all other identifying information from the site. Facebook is maintaining a shadow profile on me which includes data I scrubbed from my account and profile over a year ago, and they are still matching advertisers to my profile based on the scrubbed data. That scrubbed data DOES NOT appear in the "download your information" tool, either.


Yeah, I noticed this, too. Very frustrating. I had posted a helpful gist in the past to make this quick and easy (https://gist.github.com/bluetidepro/bfa60c1d63925180daf3dd53...), but it no longer works because of this frustrating change.


Lots of dark patterns in these interfaces. The Google one has an incredible amount of white space and drop downs to obscure content.


Are you talking about adssettings.google.com? That site is pretty good, it shows what it thinks you're interested in and turning them off is click -> turn off; plus the topics you've already turned off aren't part of the list of things it's currently targeting.


It is clear that they are striving to make the process of using the page to manage your privacy controls as frustrating and time-consuming as possible. It is certainly no accident.


SayIt's comment is very relevant so I'm sharing it here:

> These companies are like online governments. They have a great deal of influence over very many persons, and can affect their lives substantially. But they are not democracies. They are totalitarian, with one person at the head of the company able to make many powerful decisions unilaterally, if he wishes. [...] Giving you some control over your data does not change the fact that these are essentially digital dictatorships.

From: https://www.schneier.com/blog/archives/2020/03/facebooks_dow...


granted, but could you imagine not having one person at the head of the company? What a sh*tshow that'd be.




Does anyone know of tools that make it easier to browse and analyze the data downloaded from Facebook or Google (offline)? For example visualizing the location timeline, statistical analysis etc. This is something I have been looking for for a while. Both as a tool to analyze my behavior, and to better understand and communicate to others how much these companies know about us.


I've been building tools for this at https://github.com/dogsheep

The unifying idea is to convert data dumps from these kinds of companies into SQLite databases, then query and visualize them using https://github.com/simonw/datasette and https://github.com/simonw/datasette-vega


I do not, but as the linked story mentions, the data they allow you to download is incomplete and the more "shocking" types of data are hidden in shadow profiles that Facebook do not allow users to see in any form.


After downloading my data, I’ve begun to worry not only that they know too much, but that what they ‘know’ about me is quite wrong.


> but that what they ‘know’ about me is quite wrong.

Isn't that a good thing?


When it's innocuous, such as Google thinking I'm interested in _women's_ fashion and showing me ads for women's clothing sites since I've indicated an interest in fashionable clothing, it's not a bad thing(this is actually what happens!). But imagine if Google's profile of me indicated an interest in something dark like fascism or racism, and they sold that data on which later resulted in me losing job opportunities or credit. That would be a really bad thing, because their highly inaccurate algorithm had a negative impact on my life.

The slippery slope is slippery because there are a lot of downward steps, not because there's one big step. Companies already buy your credit score and financials before deciding whether to hire you. Imagine if they bought your entire Google profile and used that too? It's about as accurate as a personality test and there's no way to dispute it.

It's worse than Social Credit Scores because Google doesn't even admit they have these profiles of you, yet they sell the data as accurate.


I think you'd like the book Qualityland. This kind of scenario is a major plot point.


Do you believe that Google or Facebook sells personalized profiles of the form "jschwartzi is interested in tacos"?

Because that isn't how these companies work.


They might not right now, but they certainly could start doing so literally any day.

Also they can be compelled to legally hand over such data by governments.

Or it could be leaked due to a security vulnerability or rogue employee.


TBF it may be best to explain your working a bit better, considering you work at Google.


After having a bank account opening denied only because they didn't have any data on me, I share your worry.


And they do not comply with GDPR: https://ruben.verborgh.org/facebook/#history


That is both hilariously written and quite frustrating to read. Thank you for your perseverance and writing it up!

Is there an RSS feed or mailing list where updates are pushed to?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: