Hacker News new | past | comments | ask | show | jobs | submit login
Using GDPR to obtain one’s data as JSON (mazzo.li)
182 points by rostayob on May 10, 2021 | hide | past | favorite | 133 comments



My favourite is still Art. 22, p. 1:

"The data subject shall have the right not to be subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly significantly affects him or her."

Which means that if e.g. a bank declines your mortgage application based on a decision from a fully automated system you have a right to have a human being review it.

Interestingly this made banks reluctant to use AI and helped with efforts in development of so-called "explainable AI".


Just did exactly that with Avant. Could see where I was initially approved but then their manager said I was "still on contract" and quashed my application.


doesn't sound like this worked in your favor, or did i get that wrong?


I read this as you do, but it's still a big win.

If you make a machine learning model explain itself to humans, you open the possibility to challenge the model's conclusion. It's not just the question of whether the model does its math right - it also means we, as society, can say, "I see your math and it works out, but because $POLICY, you're not allowed to take these inputs into account". It means adding control points to a previously opaque process.


you are right, and i absolutely agree. basing any decisions that substantially affect someone on machine learning alone should be illegal (even if it is the right decision)


I was denied, but now I know why instead of just guessing. I could see that my "willingness to pay" (unsure what that means) etc and income were all fantastic, their manager just didn't like that I was on contract instead of a permanent employee.


so arguably, in this case the machine learning algorithm was possibly right, and the human was wrong. on the face of it, not using machine learning here would not have changed the outcome.

what i am wondering is though if this can't be used to everyones benefit.

apart from neding a human to verify a decision, how about, if the human and machine learning decision disagree, then the consumer gets the right to appeal for a second human review?


Funny thing is that I specifically opted out of automated decision making, having been declined before. It wasn't a big deal, really. EU lenders are far warier of contract jobs than American ones.


oh there was an opt-out. that's interesting. how does that work? you obviously opted out before, but could you also opt out after they tell you what the automated decision would be? if so that would effectively be that kind of appeal that i suggested.


I applied for a loan a year ago, didn't opt out got denied, and wondered why. Then I applied again ~4 months ago, opted out of automated decision making, and the subject access request showed the conversation.


What about a human reviewing the AI review and just nodding all along ? Does that fly ?


In that case there is a human being who is legally responsible for the decision that is made. I have no problem with that. If the person is "just nodding", with possibly illegal consequences, that person can be sued.


Why not suing the bank? It always works, AI or no AI.


Well, if all the decisions are given to the AI, it's harder to establish mens rea. They can just claim they never intended to discriminate anyone but that it was only an "unfortunate glitch" in the system. Of course, this defense might not hold, but AI is a decent legal cover as it's more difficult point to any single person knowingly doing anything illegal.


Yes, the Nodder has to have proforma authority to override the AIs decision.


Some countries have pushed it further : for administrative decisions where a program was used (whether alone or helping a decision), all the steps of the algorithm resulting in the decision must be provided in clear and simple language (if a request is made).


But also banks don't have to explain you why they have taken this or that decision.


It is about whose mistake or choice it is, if a bank denies loans to $ProtectedCategory because of a bug in its software it is one thing, if it is because employees have biases and discriminate applicants it is another.


I love this clause too. It's been making it possible to help people migrate to my app: https://github.com/TheLastProject/Catima/issues?q=is%3Aissue...

Of course I document my export format too so others can do the same with data from my app: https://github.com/TheLastProject/Catima/wiki/Export-format :)

The sad part is the allowed 30 days timeframe. Stocard really abuses this to make you wait as long for your data as they legally can make you wait: https://twitter.com/SylvieLorxu/status/1389343401435439112


I literally did this with GitHub they told me to get lost and go to court to force it through.


Instead of taking them to court, send an email to your local Data Protection Authority and complain while CC'ing github. Results aren't guaranteed, but the effort involved is small.

List of local DPAs: https://edpb.europa.eu/about-edpb/about-edpb/members_en


I'd recommend reviewing the rules of specific DPA, in Poland you have to provide quite sensitive data in the request (e.g. your full name and address) and has to be digitally signed - you can't just e-mail them. All other electronic messages can be legally ignored.


Your experience is not unique. Even when data is returned, it's sometimes incomplete or (purposefully?) made near impossible to read. I’m working with a privacy lawyer and a couple other engineers on some tools that would allow users to exert their rights (under GDPR, CCPA, and similar laws) more effectively. If anyone is interested in learning more or being a test subject, feel free to reach out.


I find it hard to believe they'd say so in that tone.

That said, where does the GDPR stand when there is an API already in place that allows for data extraction?


It was more of less that tone, they accidentally send an internal memo to me.

GDPR doesn't require me to become a user to get my data.


What data does github have about people who aren't users?


Somebody could make a GDPR request to find out.


Where it gets tricky:

1. There would be entitled to ask for a form of ID in order to verify your identity.

2. Even if they are confident that you are really John Doe, how do they know that a mention of "John Doe" in all the data they have really means you, and not someone else?

At some point it becomes too difficult and time-consuming so the easiest is to decline the request on the grounds of an exemption.


git commits, names, emails, photos. I could push a git repo that has your information in it via commits and not know they're processing that informaion at all.


> I could push a git repo that has your information in it via commits

Hmm, wouldn't that you in trouble rather than github?


I think he is referencing that sites collect data even if you aren't a user. In that case, you could use the GDPR to get that data. That scenario makes no sense with github though.


Github still collects data for marketing purposes. And other people can push your commits which are your personal data without your knowledge.


>other people can push your commits which are your personal data without your knowledge

If i write a biography about you, can you GDPR your way to a copy of the book from amazon?


This should be covered by most FOSS/OSS licenses.


What do OSS licenses have to do with the data GitHub collects when you visit sites belong to it?


This is not about what github collect about its own usage, it was about other people mirroring your commits on github and the fact that those commits contains PIIs.

It is reasonable to assume that FOSS licenses also cover the commit.

IMHO at this point the transaction is more comparable to you selling your personal diary on ebay, you have no right to force the buyer to destroy it under the GDPR


Mind sharing the memo?


That would be a privacy violation under GDPR :)


I have a feeling that the unnamed todo service is Todoist. They offer a free plan, but backing up data requires a Pro plan which is really not ideal:

https://todoist.com/help/articles/backups

I use the free plan, don’t reside in Europe, and recently wanted a backup. If you’re in the same boat, I recommend the following project — it has good documentation and immediately worked.

https://github.com/darekkay/todoist-export


YouTube will not give out data after suspending an account, which is likely a violation.


I was just thinking about this.

I've been delaying allowing whatsapp from sharing my data with facebook for a while now, but last news is that unless I give in to the extortion, I won't be allowed to send and receive messages to my contacts.

I'm going to request all my data to Whatsapp using GDPR before switching all my conversations to Telegram, I guess.


This won't work with WhatsApp because since they do not store your messages in plaintext.

I had to use some unofficial software ( https://www.wazzapmigrator.com/ ) to extract and store the messages from an unencrypted iPhone backup. Then you have everything in an sqlite file.


You're right indeed.

I was just looking through my google drive and there's no trace of my whatsapp chat backups. Even after running another backup. And pages online confirm what you're saying.

I guess I'll have to use my GDPR rights.

Interestingly enough, there only is the possibility of exporting account information info, but the information page about that procedure explicitly says that messages are not included.

That's relevant because since there's no other procedure to export data, this means that Whatsapp is already not okay with GDPR procedures.

edit (2): I just sent an email to Whatsapp via their contact page (https://www.whatsapp.com/contact/?subject=messenger) asking for my data in accordance with GDPR. Let's see what happens.


Facebook doesn't hold your chat data. As soon as message reaches recipient(s) it is deleted. That's why Whatsapp nags you every year about making a google drive backup (it archives data from your phone to drive, not sends it from its servers to drive). So no GDPR could resurrect your chats and it is not violation because they don't have such data.


The right to easily export your data still applies, Facebook has my messages on my phone storage and I should be able to easily export them without third party solutions.


> Facebook has my messages on my phone storage

No, you have your messages on your phone. You just don't have your messages in a way you want.


The GDPR asks for more than access, it also mandates an easy export. I have recently learned that this feature is actually implemented on a chat by chat basis.

Personally I would like a global export option too, but I am not going to die on that hill.


>this means that Whatsapp is already not okay with GDPR procedures.

No it doesn't, the data is local to your phone and not in their systems or servers. So you already have your data on a device you control. GDPR is not about local data inter-op.


hmmm, that's rather annoying as it still creates "whatsapp lock-in" :(


Companies aren't going to voluntarily make it easy to switch to the competition.


that's not why i want to export the data though in my case - i want to export it as a backup that doesn't depend on the continued existence of whatsapp.

If e.g: MSN messenger had the same strategy, you would one day find yourself without the ability to backup your messages.

not everybody values past conversations the same way - but i have a few that have emotional value to me and don't want to lose them if e.g: my phone gets stolen or whatsapp loses popularity and shuts down one day.

edit to add: the automatic backups IIRC are unencrypted. I don't appreciate google having that. Why can't i choose my own backup target if the functionality exists?


You can import your Whatsapp chats into telegram without asking Whatsapp for your data, or using any 3rd party apps.

https://telegram.org/blog/move-history


That's neat.

The annoying thing thoug, absolutely on whatsapp side, is that I have to export chats one by one.

Shouldn't I be enabled to extract my own backups, on my own gdrive, to read my own chats ?


On Android (which I think you must be since you mention backup to Google Drive?) the otherwise encrypted database is an unlocked sqlite db while the app is running.

Spent way too long diving into that rabbit hole one time when I switched phones, activated WhatsApp without thinking about it on the new phone (and didn't restore, or switched back and forth, or something) and refused to go on with a history split between two devices. Eventually managed to merge the databases, put it on the new phone, and reinstall (since you can only restore during first-time startup...). Absolutely not worth the time it took, but I was stubborn.

I'm not saying that isn't shit, or that WA expects you to do that, just that if you want to, it can be done.


> the otherwise encrypted database is an unlocked sqlite db while the app is running.

this is news to me - i will definitely be using/chasing this lead to get my data, thank you!


A bit of a tangent, but I think some people often criticize GDPR because it doesn’t have perfect enforcement, or some people are annoyed by the cookie banners.

However, a positive aspect I don’t hear talked about enough is how it has had a chilling effect (in the most positive pro consumer way possible) I’ve noticed in my industry people are just much more careful about user data now, compared to it hardly being talked about before GDPR. Just the threat of those fines has scared C levels enough to put at least some engineering resources on privacy and security where there was much less before from my experience.


I totally agree. I think the Cookie thing is also deliberately overblown. Useful cookies don't need consent (which are not used for tracking).


Its also a different directive from GDPR, but people like to lump them together for maximum outrage.


That's not accurate, at least as far as GDPR is concerned.

Only necessary ones don't need consent, but the bar for "necessary" is high: the software wouldn't be able to function without it and there's no way to implement the software without it. Think: "address" is necessary for "delivery".

Even then you still need consent to store the cookie under most versions of the "Cookie law", which is a complementary but different thing to GDPR.


> Even then you still need consent to store the cookie under most versions of the "Cookie law"

I don't think the cookie law is different from GDPR in that respect. IANAL, but from the EU directive itself [1]:

> Member States shall ensure that the use of electronic communications networks to store information or to gain access to information stored in the terminal equipment of a subscriber or user is only allowed on condition that the subscriber or user concerned is provided with clear and comprehensive information [...] and is offered the right to refuse such processing by the data controller. This shall not prevent any technical storage or access for the sole purpose of carrying out or facilitating the transmission of a communication over an electronic communications network, or as strictly necessary in order to provide an information society service explicitly requested by the subscriber or user.

I read that as having the equivalent "no consent required for strictly necessary data" get-out clause to the GDPR. Yes, strictly necessary is a high bar, but for cookies that clear that bar I think both GDPR & the cookie law let you off the hook.

[1]: https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CEL...


> Only necessary ones don't need consent, but the bar for "necessary" is high: the software wouldn't be able to function without it and there's no way to implement the software without it. Think: "address" is necessary for "delivery".

Yup. That's literally the point. Phrased in an equivalent form: cookies that require consent are ones you don't actually need.

It's thus not GDPR's fault that a site opts to spam their users with a consent popup - it's their choice to include cookies that aren't required to provide the service.


You're assuming everyone agrees on "need". People disagree with governments all the time, so it's not surprising here that a website operator might consider a cookie to be necessary for the operation if their service, but the government views their needs differently?


If i can delete the cookie and nothing goes visibly wrong, it's obviously not needed.

Same with blocking a script that sets such a cookie. Most cookies are not needed for providing a service.

edit: see the article 29 data protection working party guidelines here: https://ec.europa.eu/justice/article-29/documentation/opinio...


I think they literally mean "technically need". This is objectively deductible.


I am a privacy lawyer that has spent far too many hours on cookie issues. It is disappointing that your correct answer was downvoted. It goes to show just how much misinformation is out there about GDPR.

The top comment in this thread demonstrates that as well as the Data Protection Directive of 1995 had a functionally identical requirement allowing users to opt out of completely automated decisions for credit purposes.


Sure? https://www.iubenda.com/en/help/23672-gdpr-cookie-consent-ch...

I would claim the only way to make a webapp with login securely function in a usable manner is to use a session cookie with secure transport policy. Do you really need more than that?


not sure if you want to say that such a session cookie would require consent or not, but just to make it clear, it definitely doesn't.

See 3.2 in data protection working party recommendations: https://ec.europa.eu/justice/article-29/documentation/opinio...


Very true. Enforcement is never 100%. Crimes will continue to happen, some people will always get away with it. But that doesn't mean we should give up on making laws.


I feel like the cookie banners could be fixed with an iteration of the law that requires adherence to an HTTP request header like do not track.


I think DNT could be rescued if it could be turned into a browser-wide consent UI. Currently, with its history of being set to 1 by default in some browsers, it doesn't really distinguish between "I don't consent" and "I haven't expressed an opinion", giving sites an excuse to ask you anyway.

Myself, I wish another GDPR iteration would instead mandate the shape and form of the initial consent popup, requiring it to fit to the following template (or something similar/equivalent):

  +------------------------------------------------+
  |     Allow additional data collection?      [X] |
  |                                                |
  | This site would like to use technical means    |
  | such as cookies and local storage to collect   |
  | data about you and your computer. This data is |
  | not necessary for the correct functioning of   |
  | this site, and does not impact the service     |
  | it provides.                                   |
  |                                                |
  | Do you consent to this opt-in data collection? |
  |                                                |
  | GDPR requires this message to be shown because |
  | the data collection requested is not necessary |
  | and may carry data privacy risks. Necessary    |
  | data collection does not require consent form. |
  |                                                |
  | [Learn purposes and]      [>I do not consent<] |
  | [configure consent ]                           |
  +------------------------------------------------+
With an explicit [>I do not consent<] button, pre-selected, in the "call to action" color, doing the same thing as [X] does, which is declining data collection described. Displayed in the same language website content is, and with specific regulations guarding against the common "dark pattern" bullshit. I'm sure Brussels has some webdevs that would be happy to provide standard templates and React components and whatnot, so that site authors could just plug in a stylesheet and a JSON blob to configure the [Learn purposes...] section.

The ultimate solution would be for member states' DPAs to get off their collective butts and start issuing fines for the current crop of blatantly illegal consent popups, but in the interim, it would be helpful to regulate the popups, so that they clearly communicate that a) they're requesting strictly unnecessary tracking that can be safely ignored, b) showing an annoying popup is a choice by the website owners, who decided to request consent for additional tracking.


>I think DNT could be rescued if it could be turned into a browser-wide consent UI.

This is kind of what I was getting at.

I think there would he subtleties to the user interactions though - I might say yes to marketing cookies if I knew not accepting it would lead to a degradation of service on some sites, but not for every site.

Driving a standard that can manage that kind of thing might be something regulators simply aren't up to.


Some people might reasonably prefer ads that are more tailored to them and would consider it a negative impact on the service.

Other sites might, for example, use your current location to display the local weather, which isn't required (you can type in your city in the search bar) but would be prefered by many.

A better solution would be to have the browser ask once, globally, on first install and then send the DNT after that. Any attempt to circumvent anything would be an instant fine.


You are not wrong. Browsers could standardize the information exchange and approval that happens for cookies and then implement a sane UI for that similar to how e.g. browser location tracking requires opting in. That should be a perfectly valid alternative for home grown UX that website developers add themselves and offer a better UX.

To do this you would need to provide the legalese for that in some standardized way so the browser can pop up some UI that allows users to review that and approve/reject that. It should simply refuse any kind of cookie until the user has approved. That approval should be removable as well. Part of that should also cover having a sane API around that so sites can decide if they need to fall back to displaying their popups for this. Browsers that support this could even start defaulting to block all forms of cookies until explicit permission is in place for a website, regardless of existing UI. Many users have extensions that do this.

Wouldn't be the worst idea. Of course the flip side is that it also makes it easier for users to say "no" a lot (I would). And there is the notion that this may be a grey area under the current legal text. And of course some browser vendors have vested interest in the whole cookie & tracking business (Google).


> I’ve noticed in my industry people are just much more careful about user data now

My company and all the sites that I use didn't decrease data collection by a bit. They just added consent form in place of T and C. I would like to hear any counterexamples though, that some company actually stopped collecting data that they were collecting before GDPR.


Here's a high profile example of a company that stopped placing non-essential cookies: https://github.blog/2020-12-17-no-cookie-for-you/


I did this to a bunch of companies a while ago now, the results were incredibly boring with the exception of eBay who delivered the data by posting a USB drive to my house.


This might work for data that is stored but another regulation enforces storage of only that data that you need — making it inefficient for backup restore (systems can just wipe your data and call it a day instead of soft-deleting it).


since the author mentions whatsapp:

What's the best strategy for exporting _all_ whatsapp messages on a device to a format that is readiable without whatsapp? - the export functionalities i've tried work with a message cap or other limitations.

Otherwise, they involve emailing yourself conversations one at a time.

I _think_ this is region dependent, but would like to hear from others.


I am the author -- I addressed this in another comment: https://news.ycombinator.com/item?id=27104896 .


I just replied to someone else about this [0] - the tl;dr is on Android at least you can grab its sqlite database unencrypted while the app's running.

It's readable without WA, though obviously proprietary in the sense of the structure of it, but it is just sqlite, you can then dump it out however you want.

[0] - https://news.ycombinator.com/item?id=27106454


The title is misleading. It should be something like "machine readable format". Since the law does not specify json.


The title was edited by the mods -- the original title simply stated "GDPR -- A Success Story".


I wrote my own comment fetcher for reddit, since there is a limit as data gets archived.

I still managed to get all the 8 years of comments.


Netflix’s GDPR dump was kind of painful to actually make use of. In particular I wanted to port my 15 years of ratings (over 1000 movies) to Letterboxd but the Netflix dump only gives title as the identifier of the media, and some titles are VERY indistinct.

Any other information, anything else, like Year of movie would have helped. Instead I spent literal hours adjusting my data in Letterboxd’s fantastic import tool.


This might be illegal. After all, the dump should contain your ratings of movies, which is not satisfied if you can not identify the movie.


The dump needs to contain the data that is associated with your user and unless they store "user has rated Minions (2015) with 5" they don't need to tell you that. (Quite the opposite - it would be illegal for them to do so unless they also tell you what they'Ve stored exactly the way they've stored it)

Do note that nobody actually knows* if what I'm saying is true, but that's just another problem of GDPR


If you're persistent you can get them to provide better data


I don't live in in the EU so I feel like that might be pushing my luck.


Not entirely true. It hugely depends on the company. I have no luck retrieving even my personal info from some companies re. my interviews (I asked for internal email notes where I was the subject) and they simply denied providing almost nothing.Some even quoted GDPR as the reason in response stating the info I requested contains other people's details too. Some correctly provided info I asked redacting other peoples details which is fair. But there are so many exclusions to GDPR (https://ico.org.uk/for-organisations/guide-to-data-protectio...) and the companies can use one or another exemption stated to ignore your request.


It's sad that companies act so immorally that GDPR must exist.

But I guess it's good this guy found yet another way to recover lost data?


In that regard, which law isn't sad?


Very true. But GDPR is pure annoyance. I wish they would 100X consequences of offenders and quit wasting development time on the worthless permission prompts.


>GDPR is pure annoyance

Is it? I'm the marketing manager of a European app publishers with around 5 millions active monthly users (and many times more if we account for the SDK that we license to other developers). We operates cloud services as part of our offering too. We find it very easy to comply with GDPR.

As for me, I'm very glad as the end users to be protected by GDPR. I've had my data deleted or unpublished about a dozen times since the law has been enacted. And all the people around me are far more careful with how they share their data (and mine! e.g switching to Signal vs Facebook Messenger, or ProtonMail vs Gmail...).


Permission prompts are the wrong way to approach a generally good feature. It would be much more useful if you could configure the permissions you want to give once and for all in your client (browser), and there would be a standard to communicate that transparently to the server.

I keep making the same settings over and over again with different websites. That should not be necessary.


Having browser-wide permissions is actually allowed for if the constent string is stored in the consensu.org cookie. But website owners choose to use their own cookies.


Meh, I want notifications on for whatsapp but I definitely don't want them on news sites. This is the wrong way to go about this IMO. Just a default, should definitely be overridable.


It's also sad that humans act so immorally that laws forbidding rape and murder must exist. Or maybe it's good that we can come together as a society and agree that these things are bad? Who knows


Well the big difference is that even without laws against rape and murder it is generally still a big exception. On the other hand tracking, carelessness with personal data, privacy violations etc. have been very much the norm with all companies (and arguably still are) even before the GDPR.


More than two things can be bad at once. What is the point of this hot-take?


>But I guess it's good this guy found yet another way to recover lost data?

He wanted a backup he could control in case he loses his account, he gets banned or the company dies.


Is it mission critical?

Sometimes a data loss can be quite liberating.


I am not the author, but it could be important (all your past TODO) even if is not life threatening.

Also the service provided the active tasks as an export so same workflow could be copy-pasted and you update the SQL to include completed tasks too. The issue probably is that the priorities are different and implementing this would mean managers, designers and other "experts" would need to agree first.


> The data subject shall have the right to receive the personal data concerning him or her, which he or she has provided to a controller

IANAL, but this sounds to me like you are entitled to receive only the personal data of yours in a machine-readable format, not _everything_ you entered.


> " and have the right to transmit those data to another controller without hindrance from the controller to which the personal data have been provided "

puts this in context. Personal data is everything that is connected to the person requesting it.


I had GitHub claim my private git repos were not personal data. Because it was code, forgetting that my name and email are attached to every commit. Until there are massive fines for not following GDPR, it's pointless. GDPR was meant to be a serious threat and it is in some regards but it is ignored so many times with stuff like this and the fines honestly, aren't that much. I've seen GDPR violations result in no fine even after they admitted the violation in court.


As others stated, as long as data is associated with your account (regardless of what content) it is personal data.

When companies like github.com do not follow the GDPR and do that notoriously I am sure the fines can be raised by the administrators without changing any laws. As you can see here https://www.enforcementtracker.com/ there are actually some juicy fines already. I mean 50.000.000 EUR is not much for Google but still better than nothing considering "Insufficient legal basis for data processing" was not even a real issue before GDPR.

BTW this is also a nice GDPR fine reason: "Insufficient technical and organisational measures to ensure information security" (e.g. British Airways was fined with 22 million).

I never did that, but did file a complaint at your national Data Protection Authority? https://ec.europa.eu/info/law/law-topic/data-protection/refo...


> I never did that, but did file a complaint at your national Data Protection Authority? https://ec.europa.eu/info/law/law-topic/data-protection/refo...

Yea, it got forwarded to the Dutch authority because they're based in Holland. It took almost 2 years for the entire process. Basically the dutch couldn't really care and didn't understand the techincal facts of the matter and just believe Github's legal team when they said code is not personal data without knowing each commit has my name and email. The account has my photo and name. And that I was doing a personal data request and export request. Because my national agency had to forward it I couldn't file an appeal because my national agency just forwarded it and didn't actually do anything or make any decision they just relayed the decision.

For me, the key take away was I asked for all information they had that was relating to me. They said no and the dutch authories thought that was a-ok.


Mmh in this case the third bullet point from https://ec.europa.eu/info/law/law-topic/data-protection/refo... is relevant ... :/

> take legal action against the DPA - If you believe that the DPA has not handled your complaint correctly or if you aren’t satisfied with its reply or if it doesn’t inform you with regard to the progress or outcome within 3 months from the day you lodged your complaint, you can bring an action directly before a court against the DPA.


It's not that clear-cut.

There are debates as to what "relates to" (wording of GDPR) means, and that seems to be open to interpretation depending on context and data. Unless there is a definitive decision on this, I think it is reasonable to claim that source code is not personal data.


> I had GitHub claim my private git repos were not personal data. Because it was code, forgetting that my name and email are attached to every commit.

This brings to mind a few questions.

1. If a site stores multiple copies of a particular piece of personal data, let's say an email address, do they have to give you every instance when you ask for your data, or just tell you that they have your email address?

For example, if I use email address as an account identifier, so it is used as the primary key in the Users table in my database, and as a foreign key in my Purchased table, do I have to say send something that says your email address is in 1 row of one table and 13 rows of another table?

2. If I have to give back copies of your personal data that is in content you uploaded, what if that content contains personal information of other people, too?

If you had let others commit to your private GitHub repo, for example, their personal data would be in there. If GitHub has to give your commits to that repo in response to your GDPR request, do they have to filter them so that they only return the commits you committed?

What if I submitted an issue, you committed a fix, and in the commit message you thank me by email address for diagnosing the issue? Does GitHub have to remove my email address from the copy of the commit message when they respond to your GDPR request?

3. What about services that provide storage but don't process the content of that storage except to keep redundant copies or backups to protect you from hardware failure, such as Dropbox or Amazon S3? If I ask Amazon for personal information on me, do they have to figure out that you uploaded your contact list to S3 and my name, email, and phone number are there and tell me about it?


IANAL

1. How the data is stored is irrelevant. Again personal data refers to data connected to your account. So if they store a history when and where you logged in, they have to provide it. When you upload stuff, they have to provide it. When you star a repo, they have to provide it.

2. They have to filter data out that is not ought to be seen by you. A Repository is a special case since it is not simply personal data. Think about giving a contractor temporary access to your repo etc. GDPR tries to enforce reasonable data compatibility between platforms ("Right to data portability"). This is orthogonal to personal data collection.

3. No. It's the responsibility of the services that use S3 to manage this. The operators are the controllers in this case. They also have to ensure that Amazon does not process the data they store on AWS S3. Eventually they have to make this agreement even part of the contract with the persons who they provide the service for.


It depends on how data is stored — many companies use user identifiers with attachable profile objects so when you delete your account your account is done random id. Hard to say if that’s a personal information anymore.


From article 4. of the GDPR:

‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;

AFAIK a personal to-do list would be "relating to an identifiable natural person" as the database will have a relation from this data to the account, which will likely have a name, email address or other PII (directly or indirectly).

IANAL


If it is a free text form, it's safer to assume that the data is personal.


Even timestamps can count as personal data. For example when they can be used to track working hours.


Yeah I think this is a massive loophole. In a B2B setting, there is no GDPR obligation of another company to export all the data that a customer (e.g. a doctor) has uploaded (e.g. data on their patients). We've had some companies quoting hundreds of pounds to do this and have had to get the competition authorities involved.


> In a B2B setting, there is no GDPR obligation

I think this is not correct.

AFAIK it's not transitive, but you can request any company directly. I think there are even people trying this randomly.

As long as it's data about you, you are allowed to request it. When you sign an agreement that your data may be transferred to another controller (and there is no other way this data may be transferred), you are totally entitled to ask this controller for your data.


That's when you have two separate controllers.

For subprocessors it's different - they should send you back to the top-level controller for the data request. The subcontroller might not even know which data they have is actually connected to you, eg AWS is not going to figure out the schema of an RDS instance. But the controller is required to have an agreement with the subcontroller to be able to get them to cooperate to processing your requests.

(that's partly what all those data processing agreements that subprocessors and controllers have to sign are about)


There are some aspects of B2B AFAIK that are excluded. For example, b2b doesn't need the employees permission for the data processing. But for clients of the b2b they do.


Of course they need the employees permission if it is personal data (see also https://www.dickinson-wright.com/news-alerts/the-gdpr-covers...). At least under German law if the times when I work are logged and processed I have to agree to this as employee. This is usually part of the employment contract. Also companies have to rightfully store legally relevant data usually 10 years so there are of course exceptions. For the storage clause there is even the right for data access blocking (when deletion is not possible because of a retention period).


In most countries there is a time limit measured in years on suing for breach of contract (for instance in England it's 6 years) so this can always be a legitimate reason to keep personal data for years after their use has ended: GDPR say that data must not be kept longer than necessary, but keeping records in case of legal action seems like a very 'necessary' reason.


> Without consent, there are only a number of other ways an employer can process data, and those are identified in the GDPR as “legitimate basis”, which include, in relevant part: (1) to perform an employment contract; (2) to comply with legal obligations; and (3) to further a legitimate interest of the employer.

You'll find nearly every time they use that.


> legitimate interest of the employer

I would claim this paragraph will side with the employee in dubious cases (see the article): "To use the legitimate interest allowance, employers must perform a privacy impact assessment balancing their legitimate interest against the employees’ privacy interests. The hard part, this must be documented to demonstrate that the employer’s legitimate interest does outweigh the employees’ rights. The next step that employers cannot overlook is that, even if the employer has a basis to process employee data, the employer must then provide notice to the employee that spells out exactly what data the employer is going to collect and what the employer is going to do with it."


There are some limits. For instance your Internet Protocol address(es) are considered to be personal data, yet AFAIK the various routers that have to store them in the normal process of doing Internet connections don't have to comply with the GDPR ?


This “one neat trick” means that some high level employee probably had to do it by hand.

Making a request like this is a borderline unethical waste of someone’s time.


It puts companies on the spot to have internal tooling to deal with requests like this; which is not a bad thing. Nothing unethical about that; just the price of doing business.

The mindset change that needs to happen in our industry is that companies should build this into their products by default. "Download my data" should be a feature that is simply planned and built. Just like "permanently delete my data" is not optional either. It's not even that hard to build mostly. It's only hard if it catches you by surprise, which these days is poor planning more than anything else.

In Europe, and Germany especially, you can just expect people to do GDPR requests just because they can. We've had that happen right after GDPR became a thing. And you are legally required to be ready for that and respond in a timely fashion. If you want to do that manually, that's your problem. Small startups get a way with that. At some point it becomes annoying and you just fix it properly. Up to you when you do that.


How much do you expect that “price of doing business” to cost? Probably at least $10k plus opportunity cost which for my small app business would be a serious kick in the nuts.

These regulations just further enable monopolies and make it more difficult for diverse entrepreneurs to get into the game.

I’m just saying to be mindful that you might be costing a small business an enormous amount by making these types of requests.


I'm CTO for a small startup. 10K seems like a wildly high estimate for such a simple feature that you could have known for years that you are required to offer. You do need a certain level of competence on your team to do it. But then perhaps lacking that competence, why should people trust you with their data?


Unethical to use your right to access your data? Interesting take.


It’s your right in Europe. It’s offensive that some would expect a non-European organization to comply.


As long as company services customers from EU they have to comply and support their laws. This whole GDPR drama was caused in majority by huge American companies like Google and Facebook, doing whatever they like.


I mean the company should automate this if this is wasting too much time.


I did an export tool like that for one of my apps and it’s not that complicated. Keeping the exported files secure and authenticated was more work than actually generating them




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: