Hacker News new | past | comments | ask | show | jobs | submit login

Do you know of a good place to read about the GDPR and backups?

My concern has been the account deletion provision. Does the GDPR expect us to be able to go back and modify past backups? Years-old tape archives?




Start by reading the actual regulation, provided here in a easy online format without the extraneous formatting: https://gdpr-info.eu/

If you want a shortcut, you can try the EnterpriseReady site which has a great overview specific to SaaS companies: https://www.enterpriseready.io/gdpr/

As always though, your best resource is to talk to a lawyer. Do not trust any internet comments about legal decisions for your business.


I get the impression the GDPR expects you to not keep data around forever, as a general best practice.

When you collect data, you have to tell your users at collection what your retention policy is (this is part of Right to Transparency). So, right there, you should probably have a retention policy, and "forever, always" isn't really a well-thought-out policy.

The Right to Erasure is not as far-reaching as some people seem to think it is. If the Legal Basis of the data collection is Consent, then that consent is revocable and processing (including storage) pretty much has to end as soon as consent is revoked. But if the Legal Basis of collecting the data is something else, and I really feel like 90% of the time in practice it's going to be Legitimate Interest, then the Data Controller gets to balance their own needs against the rights of the Data Subject when handling a Right to Erasure or Right to Object request. And you can probably make a good argument that you don't need to modify back-ups. Your argument is stronger if a) your restore-from-back-up procedure can ignore or delete the user's data during/after restore b) your data retention policy eventually deletes the back-up.


I do I remove data in a WORM store?


Is it truly a WORM store that cannot delete any data ever never? If so, you'll need to encrypt the data in a way that allows you to make records inaccessible.

If the WORM store rotates out old data (webserver logs, tape backups with retention and rotation, etc.) then you simply inform the user of that and that's it.


> If the WORM store rotates out old data (webserver logs, tape backups with retention and rotation, etc.) then you simply inform the user of that and that's it.

Can you point me to where that's allowed? What if retention is reasonably long (a year)? or not (10 years)?

> s it truly a WORM store that cannot delete any data ever never? If so, you'll need to encrypt the data in a way that allows you to make records inaccessible.

So now I can't perform impromptu analysis of my own data in any computationally easy way? Security analysis? Analyzing shipping information to optimize in the future?


Acronis, a german corporation, is implementing the GDPR too [0] and they recommend that if possible, you split backups per customer, if that is not practical atleast do your best to protect the data and don't keep it for unnecessary time frames. You should have a retention policy and encrypt your backups.

[0]: https://www.acronis.com/en-us/blog/posts/backups-and-gdpr-ri...

[A0]: http://www.gdprarticles.com/gdpr-articles/data-subject-right...

[A1]: GDPR Art. 5 §1 a, b, c and f, §2

[A2]: GDPR Art. 17 §1 b and c, §3 b and e

>So now I can't perform impromptu analysis of my own data in any computationally easy way? Security analysis? Analyzing shipping information to optimize in the future?

Any analysis will have to be done in a way to make sure you're not exceeding the bounds of network security or you're outside legitimate interest.

Analyzing shipping information is the same, as long as you do everything to make sure the data is pseudonimized or not otherwise in risk of leaking personal data, it's fine or alternatively you ask customers about it.

>What if retention is reasonably long (a year)? or not (10 years)?

Use your own judgement of what is reasonable, worst case you get a letter from the EU asking you to reduce the retention timeframe as long as you made an actual effort to implement the regulation.


My question wasn't so much about doing the analysis, but about being unable to do it without fetching keys and decrypting on a per-log-entry basis. Not only would this be insufferably slow, I've not seen a feature like this in any COTS software and quite frankly seems incredibly difficult to write properly and securely, specifically the key management portion.


> Not only would this be insufferably slow

Why do you think that's a given? It seems like an implementation detail with a couple of easy solutions such as caching or batching, and it should encourage better system design in many cases where the analysis doesn't require PII and thus it's better from a security perspective not to have access to it there to begin with.

There have been a ton of breaches over the years where reporting or test systems had data which they didn't even need but which had been loaded anyway since it was less work than subsetting the data.


> analysis doesn't require PII and thus it's better from a security perspective not to have access to it there to begin with.

Unless I'm pulling from a raw dump of shipping I've bought, which would contain the address so that it can be cross-checked if there is an issue and I didn't know ahead of time that I wanted to perform this analysis.


Handling delivery problems is normal and expected usage. As long as your lawyer is remotely competent, your ToS will cover that and no government on earth is going to disagree.

If you’re trying to do analytics, you don’t need PII - anonymized locations, sizes, bucketed prices, etc. will cover that and usually makes the process faster, too.

Look at it from a different perspective: does ignorance of food handling procedures or electrical wiring codes remove your obligation to follow safety regulations? This is the same thing for data: yes, it requires you to act as if you care about users’ privacy but that’s another way of saying that you’re no longer being subsidized by being allowed to fob the cost of negligence onto the users rather than being responsible. Everything which people have been talking about in this thread is already covered by accepted security best practices.


If you want this analysis you should plan for it. Mozilla does this for example. Any kind of profiling or monitoring goes through several layers to ensure the minimum amount of data necessary is collected.

If you want shipping analytics you'll have to decide that ahead of time. That way you reduce the risk for your customer in case you don't want to do this and if you do want it you still make an effort to reduce the data necessary.

You should keep in mind that the basic premise of the GDPR is that the shipping address isn't yours to begin with. It's personal data of your customer and ultimately belongs to them.

If they don't allow you to use it for analytics, tough luck.


> If you want this analysis you should plan for it.

Yes, I should be omniscient. Thanks for clearing that up.

> Any kind of profiling or monitoring goes through several layers to ensure the minimum amount of data necessary is collected.

Yes, because they need to collect it. It's not about looking at what they have.

> If you want shipping analytics you'll have to decide that ahead of time.

Again, I'm not omniscient. I can't figure out what my company will be doing in a year, and waiting another year to collect the data I already have could see me hemorrhaging money.

> You should keep in mind that the basic premise of the GDPR is that the shipping address isn't yours to begin with. It's personal data of your customer and ultimately belongs to them.

Which is an absolutely silly notion. It is the company's data, not the users.

> If they don't allow you to use it for analytics, tough luck.

Which is silly. It's the company's data; they should be able to use it to improve their business.


>Yes, I should be omniscient. Thanks for clearing that up.

Not omniscient but being able to plan ahead does help a lot, yes.

> It's not about looking at what they have.

Yes, because they only collect what's necessary and if they don't have that they ask if it's necessary and collect it.

>I can't figure out what my company will be doing in a year, and waiting another year to collect the data I already have could see me hemorrhaging money.

Then simply ask your customers to hand over data with consent to use it for analytics, problem solved, no?

>Which is an absolutely silly notion. It is the company's data, not the users.

No. Under GDPR this is no longer the case. The data belongs to the user now because corporations have shown time and time again that owning the user data is too much responsibility for them.

You do not own the customer data anymore, the customers own it. And they can decide what you're allowed to do with it.

End of story.


> You do not own the customer data anymore, the customers own it. And they can decide what you're allowed to do with it.

Which is entirely silly and basically contrary to everything else, e.g. data retention regulations that assume the company owns the data.


It's perfectly in line with existing German Data Regulations (although they get a minor update too with the DSGVO coming along with the GDPR). Data retention laws in Germany supersede the GDPR. The GDPR itself also mentions that any regulation and law in your jurisdiction may supersede anything in it.

Even that data isn't owned by you. You are merely responsible for keeping it safe while you have to store it. Ultimately it's the customers data. End of story.


Another point, what about "personal data" that isn't really? Webserver log, for instance, contains an IP, which is covered under the law as personal information I believe. This is could be part of carrier grade Nat serving thousands (or even just regular Nat of 2 or 3 people), must I delete everyone? Who's keep would these be encrypted with in your solution?


Webserver logs should for most intents be covered under legitimate interest as part of securing your network. As long as you rotate your server logs, which is default for any distro installation (AFAIK), you don't have to delete those when a user requests them.


Most companies collect and centralize logs, making logrotate irrelevant. What prevents a company from having decade long rotations? Also, who decides what is a legitimate interest?


Even centralized logs can have rotation and retention.

The company will have to decide for themselves, primarly, if some interest is legitimate.

This means you weigh the data you collect by the single user against the continued function of the company, the great good and all other users. The company should then be able to demonstrate this process to the regulatory body.

There is no nailed process but keeping logs for a short amount of time to ensure network security and keeping some logs longer for legal compliance will most certainly pass as legitimate interest.

Network security benefits the user themself, the company and all other users by ensuring their data is secured against breaches. It goes beyond simple self-interest of the company and protects the users too.

Similarly having an email address to contact a user can be legitimate interest. If you only send them informative mail, ie "Someone changed your password" and "We had a databreach" or even "Someone tried to login from Uganda using your password, check if that's alright please" it serves primarly to protect you, the customer and the relationship you build up.

IMO that means it's legitimate.

On the other hand, of course an adcorp could claim their personal tracking data is legitimate. The data collected does not benefit the user other than showing them ads and selling it to others. Of the three groups, only one benefits.

Or keeping a webserver log for 20 years including usernames and emails.

IMO that would mean it's not legitimate.

If you are wrong in what you think is legitimate, you get a sternly worded letter from your favorite regulatory body asking you to fix it.

If you think they are wrong about that, the best option is to write them back and explain why you think it's legitimate. You can work out a solution with them that satisfies both sides.


> Even centralized logs can have rotation and retention.

That was a response to the comment about the default installs in most distros, not the ability of centralized services to rotate logs. It was pedantic and I regret derailing the discussion with it.

> The company will have to decide for themselves, primarly, if some interest is legitimate.

Until a regulator comes and makes a separate decision, and you have to plead with them that you're not wrong even when they think you are.

> If you are wrong in what you think is legitimate, you get a sternly worded letter from your favorite regulatory body asking you to fix it.

From a regulatory body that has no real authority over me, except it might?

I think my biggest issue is that I don't deem data a company has on me _my_ data or that they have to explain everything they do with _their_ data about me. I was never under the impression that it was my data, and in fact, I assume anything I put on a computer I don't control or have a paid, contractual agreement around is public. I fundamentally don't agree with or understand the premise that the situation is otherwise.

(The biggest exception being that I do expect companies to honor their contractual obligations under their credit card processing agreements, but that's not really about _me_ or data about me.)


Encrypt data with per-user key. Drop key.


So, now I can't run analysis of my logs because they're encrypted?

Also, that data is still associated with the user. Is it to the letter of the law to keep it, even if it's unreadable?


I think you're starting to get the point of the GDPR, yes.


The point of the gdpr is that I can't analyze past shipping costs? To optimize how I ship in the future?

Also, how does this mesh with pci retention rules. (Yes, they are not law, but it's still an awkward place to be.)


You can most certainly do that...Unless you've been incompetant with your data organisation and scattered PII where it really has no business being. In which case, how about sorting out your poor data practices before worrying about how you can optimise your shipping costs?


Not pii, but personal data, which is more broadly defined in the gdpr than pii is.

Second, I think you're missing the context here. If I need to encryption each log entry that pertains to a user, even if it doesn't contain pii, then adhoc analysis is nearly impossible to do.


"Legitimate Interest" is pretty easy to claim if PCI requires it.


Ah yes, somebody being overly pedantic and trying to come up with examples to spread FUD.

You couldn't possibly anonymize the data and then do the analytics, unheard of.


You're assuming that I knew I wanted to do said analysis, or that I would never want to go back to an order's record for more information. (What was ordered, or perhaps the US county someone is in that I need to figure out from the shipping address.)

It's not that the analysis can't be done anonymously, but to do so requires foreknowledge of everything you would like to analyze.


A reasonable approach that we're following is to have a documented backup retention policy and a procedure to re-delete data for any users who have asked to be deleted when those backups are restored. That retention policy can be longer than 30 days as it's impossible or infeasible to delete individual user data from all the backups.

One easy way to do this is with an expiration policy on s3 objects. You need to have an independent backup of those deletion requests though.

If you have a years-old tape archive you probably have a massive legal team who is much better equipped to answer this question.


There is no need to go back and modify backups.

You decide on a timeframe for deletion of backups (ie. X days). You keep a record of deletion requests you receive for X days. If you need to restore to a backup, you delete data again for the users that requested it.

Then you delete the backups and records of deletion (or the tables in it that contain personally-identified information) after X days.


> You decide on a timeframe for deletion of backups (ie. X days). You keep a record of deletion requests you receive for X days. If you need to restore to a backup, you delete data again for the users that requested it.

All of which requires a good deal of development work.


Good question..

I would guess there is no requirement data is wiped. Your file system doesn't wipe data, it just marks it as deleted. At some point it's a technicality, the important part is that you stop using the data.

I suspect intent matters more than technicality.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: