Plausible Analytics Isn't GDPR Compliant

raverbashing · on Oct 23, 2020

I think the article might be reading too much into it

Is Plausible actually tracking users? I mean actually allowing you to get a user's history (or IPaddr history) on your website across multiple days? (or a subset of this?)

If it does, then yes, it is not compliant without the user agreeing. If it doesn't, then no.

markosaric · on Oct 23, 2020

Everything is isolated. There's no way for us nor for our customers to get visitor history across days, across websites or across devices. See https://plausible.io/privacy-focused-web-analytics

raverbashing · on Oct 23, 2020

Thanks for clarifying

donohoe · on Oct 23, 2020

Plausible Analytics is GDPR compliant - with one possible exception - the IP address which if they dropped the last 3 digits would probably be enough.

The blog post conflates general data points with PII. The IP address is considered PII.

While other info can be used for fingerprinting, it’s ok to use in some capacity as long as you don’t.

For background, I’ve done GDPR implantation a in the past, an a privacy advocate in that sense, and spent more time with lawyers in this subject then I’d care to admit.

(Pardon brevity/typos, on phone with unreliable connection)

nscmnto · on Oct 23, 2020

The IP address, on its own, should not considered PII.

There was a ruling in Breyer vs. Germany that IP addresses can be considered PII – in certain circumstances.

The case was brought against an ISP, and the court ruled that the company had enough correlating data at its disposal to make an IP address de facto PII for any of its customers. The court limited its ruling, saying that with just an IP address alone, the protections associated with the directive wouldn’t apply.

fogihujy · on Oct 23, 2020

GDPR simply classifies "personal data" as any piece of information that can be used to identify an individual. A static IP used by one person could therefore be considered personal data while a public IP shared between thousands of people behind carrier-grade NAT would not.

The problem is that you can't tell the two apart and decide when it's safe handle the IP.

magicalhippo · on Oct 23, 2020

Indeed. My dynamically allocated public IPv4 address, given to me by my cable company, has been the same for as long as I've lived here, over four years now.

Ironically, my IPv6 prefix can change several times a day...

yorwba · on Oct 23, 2020

IP addresses IP addresses are never PII. PII means information about a person who can be identified. In that context, IP adresses are an identifier, not the information itself.

If you store IP adresses in your customer database, the information is that a person with that IP is one of your customers. This information is considered PII if it's possible to use the IP to identify the person the information is about, e.g. using a government database of everyone's IP address. If the data never reaches someone with access to such a database, it's not PII.

(This is a somewhat pendantic distinction, but it matters legally. Data protection law doesn't care about which identifiers are being used, but about the data associated with it and whether it tells you something about a specific identifiable person.)

mikehall314 · on Oct 23, 2020

I was under the impression that they did not store IP addresses, though I could be incorrect.

Their docs suggest as much https://docs.plausible.io/excluding/

"Most web analytics tools do this by excluding certain IP addresses from being counted. However, we do not store the visitors’ IP addresses in our database for privacy reasons"

markosaric · on Oct 23, 2020

We never store IP addresses in our database or logs. See the full details of our data policy: https://plausible.io/data-policy

frollo · on Oct 23, 2020

GDPR doesn't care about storage. Even if you just acquired personal information without processing it, you still had to be GDPR compliant.

In fact, the solution suggested above (only using a truncated IP address) would still require you to acquire and process the IP address and thus be subject to GDPR.

ukutaht · on Oct 23, 2020

Thanks for clearing this up. The general data points and metrics we store are not personal data.

IP address is the only piece of data that we touch that is considered PII under some regulations including GDPR.

The IP address is fully anonymized by hashing it together with a daily changing salt. Old salts are deleted to as to prevent re-identification: https://github.com/plausible/analytics/blob/master/lib/plaus...

According to GDPR Recital 26, anonymized data does not fall within the GDPR at all because data is no longer considered “personal data” following anonymization:

> The principles of data protection should therefore not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable.

corentin88 · on Oct 23, 2020

GDPR states “For data to be truly anonymised, the anonymisation must be irreversible”. So dropping 3 digits is clearly not enough to anonymize PII, it’s more pseudonymization.

dbbk · on Oct 23, 2020

How can an IP address without the last 3 digits possibly ever identify someone? That surface area is just way too large.

M2Ys4U · on Oct 23, 2020

By using other information to narrow the pool of possible people.

lez · on Oct 23, 2020

Aren't the biggest corporations doing the same on orders of magnitude larger datasets? They get away very well with merging data from quite a few acquired companies.

If small companies are called upon compliance with such vehemence, the big ones who know so much of us should be brought up, at least 100x times more.

M2Ys4U · on Oct 24, 2020

> Aren't the biggest corporations doing the same on orders of magnitude larger datasets? They get away very well with merging data from quite a few acquired companies.

Yes, and it's worth noting how few data points one needs to identify an individual.

>If small companies are called upon compliance with such vehemence, the big ones who know so much of us should be brought up, at least 100x times more.

Absolutely, no argument from me here.

that_guy_iain · on Oct 23, 2020

I am curious, how are you going to unanonymise an IP to something that could have 255 combinations (and that's just if you drop that last part on an IPv4). Nevermind that an IP alone is not PII. How can you reverse something that has many possibilties?

donohoe · on Oct 23, 2020

>> IP alone is not PII

It is in Europe, despite some regional rulings (Germany?). It is not considered PII in the USA.

fmajid · on Oct 23, 2020

IP addresses are also explicitly considered PII by California’s CCPA.

https://leginfo.legislature.ca.gov/faces/billTextClient.xhtm...

(o) (1) “Personal information” means information that identifies, relates to, describes, is capable of being associated with, or could reasonably be linked, directly or indirectly, with a particular consumer or household. Personal information includes, but is not limited to, the following: (A) Identifiers such as a real name, alias, postal address, unique personal identifier, online identifier Internet Protocol address, email address, account name, social security number, driver’s license number, passport number, or other similar identifiers.

donohoe · on Oct 23, 2020

That was true once. Longer answer "it depends":

“[I]f a business collects the IP addresses of visitors to its websites but does not link the IP address to any particular consumer or household, and could not reasonably link the IP address with a particular consumer or household, then the IP address would not be ‘personal information.”

Source: https://iapp.org/news/a/are-ip-addresses-personal-informatio...

fmajid · on Oct 25, 2020

You missed the paragraph:

"However, when the attorney general revised its draft regulations for a second time March 11, the guidance was struck without explanation."

that_guy_iain · on Oct 23, 2020

Just to be that guy. There is a slight difference between Personal Identifying Information and Personal Information.

that_guy_iain · on Oct 23, 2020

GDPR is EU law. So the regional rulings are extremely important for deciding what you think you can and can't do.

And I think we're missing the main point. How can it be reversed if there are hundreds of possibilites.

donohoe · on Oct 23, 2020

True. I was thinking more about how it drops some location level information.

I can't presume what Plausible does (have not read their docs in awhile) but they have commented here to provide more specific clarification that address IP usage (TLDR: what they do is fine and compliant)

scoot_718 · on Oct 23, 2020

Actually with CGNAT IP (and arguably before then) IP addresses aren't personally identifiable information.

That said, the GDPR is deranged and might define things differently. Blocking the EU is safer.

Of course there are research exceptions that you could drive a truck through, and logging is still valid, so none of this matters.

ramboram · on Oct 23, 2020

I've been looking into GDPR and when a cookie consent is needed. In fact, there's no thing called "cookie consent". If you track a user, you have to get his consent before doing it, whether you use cookie consent or now. Ever since I joined HN, there's a lot of marketing going on here from privacy-first Google analytics alternative guys. I found this review showing Plausible and similar products using browser fingerprints and CName cloacking for user tracking, and they still promote those features.

I'd like to know your opinion on this. Do I still need to use a consent banner if I use these services?

Thanks.

franky47 · on Oct 23, 2020

> If you track a user, you have to get his consent before doing it

This would mean any server-side analytics (looking at access logs, which include IP address and user-agent) cannot be used for analytics or tracking, since there is no way for a user to give/deny consent to a page that already has logged information on them.

Nextgrid · on Oct 23, 2020

You obtain consent and then you log only if consent was provided. You can essentially use two logs, one for technical purposes (under legitimate interests you should be fine logging as long as those logs are only used for technical/debugging/abuse prevention purposes and the data is not kept for longer than necessary) and one for marketing/analytics purposes. You only log to the second one if consent has been given, and you only ever do your analytics on that second log and not the first one.

matthewmacleod · on Oct 23, 2020

It's also probably a legitimate interest to retain data for marketing and analytics purposes, so long as that retention meets the same sort of guidelines. Marketing is explicitly highlighted as one of the applicable uses for legitimate interest.

guillem_lefait · on Oct 23, 2020

Have you any specific document or decision in mind ?

matthewmacleod · on Oct 23, 2020

Recital 47 (https://gdpr-info.eu/recitals/no-47/) explicitly states:

"The processing of personal data for direct marketing purposes may be regarded as carried out for a legitimate interest."

It's also mentioned in Article 21 describing the right to object to processing using legitimate/public interest:

"Where personal data are processed for direct marketing purposes, the data subject shall have the right to object at any time… etc."

The ICO has some useful guidance on when it is an appropriate basis: https://ico.org.uk/for-organisations/guide-to-data-protectio...

guillem_lefait · on Oct 24, 2020

One could argue that analytics purpose is not direct marketing purpose. My understanding is that as analytics can be considered as a usual/expected business process, it may use legitimate interests as far as it fulfill requirements (information of the process, the right to opt-out, ...). However, the problem is that analytics may be advanced analytics. Is the retrieval of Adwords parameters from a glcid allowed/expected ? Is the injection of historical behaviour or marketing segment allowed/expected ?

mrweasel · on Oct 23, 2020

I would like to see more software having the option of just logging the users country and not the IP, and perhaps just as generic a user agent as possible (Just, is this Chrome, FireFox, Edge, whatever, but nothing else.)

for example for Nginx something like:

log_format logfmt '$remote_country - [$time_local] ' '"$request" $status $body_bytes_sent ' '"$http_referer" "$http_generic_user_agent" "$gzip_ratio"';

That would assume access to a GeoIP database, but it would be helpful.

cuu508 · on Oct 23, 2020

$remote_country is interesting idea, you classify visitor into per-country "buckets". Although the buckets would not be of equal size. If you have a single regular visitor from a tiny country, $remote_country could uniquely identify them.

A similar idea would be to have built-in $remote_addr_hash8, $remote_addr_hash16 variables which hash IPv4 and IPv6 addresses down to 8-bit or 16-bit numbers.

There are hacky ways you can do some forms of anonymization already:

https://www.supertechcrew.com/anonymizing-logs-nginx-apache/

franky47 · on Oct 23, 2020

FWIW, CloudFlare can inject a cf_ipcountry header that does that. User-agent parsing is unfortunately more complex, with lots of false readings (not counting bots & crawlers).

ralfn · on Oct 23, 2020

The reality is that GDPR is not strongly enforced at the moment. This is not uncommon for Europe and may be a cultural differences with other places.

Those who have the intent to comply and are at least complying in spirit are not at any legal risk. Attitude matters.

And the spirit is obvious: get consent if you enable a third party to unique identify a user in reality. I.e. if it's private data or if you enable correlation across websites.

It's correlating and sharing you need consent for. Don't worry about a server log.

It is not about what you make possible. It's about what you do. Technically any sysadmin can access some information they should not. It's unavoidable.

But that's quite a far way from commercially exploiting databases of people without their consent.

Honestly they should just ban the sale of personal information. Most internet marketing vendors are not actually in the business of selling personal data.

Now the good ones suffer because of the bad ones. And the bad ones either pretend they have consent or find a way to get it.

XCSme · on Oct 23, 2020

I think that overall the GDPR law was good for privacy but a disaster for usability.

It was good for privacy, not because it's enforced or not and not because sites are showing cookie consents, but because it made the public more aware of centralization/privacy issues on the internet and companies a bit more careful with data processing. This law also resulted in many "privacy-friendly" alternatives for various services, which in the end led to a healthier market and improved data decentralization.

threatofrain · on Oct 23, 2020

If you're tracking an amorphous profile, how do you match the right person to the right data? Do you have to match the data to a unique person?

mrweasel · on Oct 23, 2020

I don't have the answer, but the consent banners are interesting.

I have two browser plugins: "I don't care about cookies" and "Never Consent", I'm not sure what Never Consent doesn't technically, but the other one just hides the DOM element with the cookie thingy.

That means that I never see the "consent" banners so I can't click the "Okay" buttons. I should test to see how many sites just assumes OK to cookies because I didn't click "No".

On a positive note I do see more an more sites making it just as easy to say no to tracking as saying yes. Though sites are better at remembering a yes to tracking, compared to a no.

luckylion · on Oct 23, 2020

Not sure whether you mixed up I Don't Care About Cookies and the other one, But IDCAC does not just hide the DOM elements - it always gives full consent.

From their website [1]: By using it, you explicitly allow websites to do whatever they want with cookies they set on your computer (which they mostly do anyway, whether you allow them or not).

Which is fine for me, I use it with Cookie Autodelete, but if you don't, you should be aware of that.

[1] https://www.i-dont-care-about-cookies.eu/

mrweasel · on Oct 23, 2020

Thanks, I used one at some point that just hides the element... Now I just use I Don't Care About Cookies and flush cookies when I close the browser.

But yes, something I need to be aware of.

Semaphor · on Oct 23, 2020

Just FYI, tracking is so much more advanced than just cookies. Using IDCAC means you consent to them using any method of tracking you.

lucideer · on Oct 23, 2020

I think a lot of the confusion around the consent banner stuff arises from the 2002 EU ePrivacy Directive (ePD)[0] which long predates GDPR.

ePD introduced the idea of the cookie consent banners we see today.

While it was enacted in 2002, ePD didn't really start to come into broad legal force in many member states until ~2010ish (EU Directives are not like federal laws; instead they're implemented & enforced by individual member states separately).

GDPR's focus on prior consent makes consent banners in their popular format largely useless, but when GDPR came along, the intent was that PD should have been replaced by the accompanying EU ePrivacy Regulation (ePR)[1] to clarify this. ePR has been delayed, so we're in this ambiguous place.

[0] https://en.wikipedia.org/wiki/Privacy_and_Electronic_Communi...

[1] https://en.wikipedia.org/wiki/EPrivacy_Regulation

donohoe · on Oct 23, 2020

Not a lawyer, but you do not need a consent banner with their services.

This is as much about what information is available AND what you do with it. Browsers send information whether you ask/use it or not.

At a high-level (and not necessarily speaking about Plausible here cos I don't know the inner workings), it is ok for a service to use personal information (looking at the IP address here) if in a form that is not traceable back to a user, and not used for tracking individuals.

In this case the use of CNAME is fine, its just to stop the blunt blocking of JS etc that happens as a reaction. Its worth noting that GDPR does permit data collection for essential services and (there is some dispute/debate on this) basic site analytics can be considered essential services.

In regards to Plausible, they are commenting directly here and seem to be address all these concerns.

IMHO the blog post author sees a problem at the surface level but is not an expert - but for those of us more familiar with the legal framework behind this, the exceptions, and the distinctions of how information is used (and supporters of GDPR), what Plausible doing is good and compliant.

(To be clear; I'm not affiliated with them - am just supportive of GDPR friendly alternatives like this one)

M2Ys4U · on Oct 23, 2020

Cookies aren't regulated by the GDPR[0] but instead by the ePrivacy Directive.[1]

Article 5(3) of that directive states that

"Member States shall ensure that the use of electronic communications networks to store information or to gain access to information stored in the terminal equipment of a subscriber or user is only allowed on condition that the subscriber or user concerned is provided with clear and comprehensive information in accordance with Directive 95/46/EC, inter alia about the purposes of the processing, and is offered the right to refuse such processing by the data controller. This shall not prevent any technical storage or access for the sole purpose of carrying out or facilitating the transmission of a communication over an electronic communications network, or as strictly necessary in order to provide an information society service explicitly requested by the subscriber or user."

In other words, unless the cookies are strictly necessary to providing you with the service then you must provide users information about what the cookies are used for, and you must offer an opt-out.

(It's also worth pointing out the generality of this Directive, too: It doesn't only apply to cookies, but also to things like localStorage).

The ePrivacy Directive is, as its name suggests, a Directive which is addressed to member states of the European Union which have all written it in to domestic law.

In the UK, for example, it was implemented as PECR[2].

[0] The ePrivacy Directive does reference the old legislation that the GDPR replaces, so you should consider the reference in the ePD to Directive 95/46/EC as a reference to the GDPR. This means the standard of "consent" is the GDPR's standard now.

[1] https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A...

[2] https://ico.org.uk/for-organisations/guide-to-pecr/what-are-...

KingOfCoders · on Oct 23, 2020

Cookie consent is (mainly) a different EU directive and not part of GDPR. It will be newly regulated by the - long delayed - ePrivacy directive.

"Cookies are an important tool that can give businesses a great deal of insight into their users’ online activity. Despite their importance, the regulations governing cookies are split between the GDPR and the ePrivacy Directive." https://gdpr.eu/cookies/

sarnowski · on Oct 23, 2020

The cookie banners come from the ePrivacy Regulation and are supposed to inform you that the website is storing data on the your device and that you can opt out (not in) of it.

Consent is required by GDPR but not for the technical circumstance that you store a cookie but that you use it for profiling. Some lawyers argue that basic web performance is legitimate interest especially in e-commerce, others don’t risk it and ask for consent (which is strictly opt in).

bmcn2020 · on Oct 23, 2020

If you're tracking a user in the EU, you need consent. The GDPR doesn't cover the 'how' -- just that it needs to be done. So, if there's tracking of any kind, you'll need consent.

Applies off site as well -- pretty much every cold email tracking software, like Yesware, is in violation of GDPR, since you didn't get the recipient's consent to track their opens and clicks.

mpitt · on Oct 23, 2020

Consent is one of the legal bases for processing personally identifiable information[1]. There are five more, among which "legitimate interest" can cover a variety of cases.

[1] https://ico.org.uk/for-organisations/guide-to-data-protectio...

guillem_lefait · on Oct 23, 2020

Yeah, but the "legitimate interest" implies that the processing is necessary (because it override your consent). In which context and what kind of analytics is really necessary ? Analysis of the incoming channels ? Understanding if there are some technical problems ? Comparing engagement from different marketing solutions ?

I'm working on that market and find that interpretation is quite difficult as soon as you have multiple actors around the table. Example: because recommendations from DPAs are not exactly the same, then you may have different requirements of the same company from different country legal department within the UE.

dwheeler · on Oct 23, 2020

One interesting thing about consent under the gdpr is that users can later withdraw consent, and if that is your only legitimate reason, then you have to get rid of all the related data. It's best if you can show that there are multiple legal bases.

_the_special_ · on Oct 23, 2020

doesn't the GDPR protect against storing "Personally identifiable information"? Plausible does use the IP address for the visitor to create a unique visitor ID, but it does not store it, so I am not sure how can you use that information to link it to an individual.

M2Ys4U · on Oct 23, 2020

The GDPR regulates the use of "personal data", which is broader in scope than "personally identifiable information":

"‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person"

Nextgrid · on Oct 23, 2020

If the algorithm for turning an IP address into a visitor ID is reversible then that ID is equivalent to the IP address as far as the GDPR is concerned.

_the_special_ · on Oct 23, 2020

I could not easily find it on the website, but I remember reading about how they do it, basically the ID is generated by hashing the IP + user-agent + a salt key that is changing on a daily basis.

So, no, I do not think it is deterministic.

markosaric · on Oct 23, 2020

We generate a daily changing identifier using the visitor’s IP address and User Agent. To anonymize these datapoints, we run them through a hash function with a rotating salt.

hash(daily_salt + website_domain + ip_address + user_agent)

This generates a random string of letters and numbers that is used to calculate unique visitor numbers for the day. Old salts are deleted to avoid the possibility of linking visitor information from one day to the next.

Full details are here: https://plausible.io/data-policy

jonahbenton · on Oct 23, 2020

It depends on whether they retain or can reproduce the salt for a given date.

The rule in effect is- a person knows the IP their ISP granted them on the dates they were granted. They ask- do you have any records of me from these IPs on these dates.

Assuming Plausible keeps the record of salt by date, the answer is yes, we have records of you, because they can retrieve the salt, recreate the ID, and locate the records.

If they do not retain the salt, in contrast, they cannot respond to individual requests for their records and that would also imply they are not able to do day over day returning visitor calculations.

markosaric · on Oct 23, 2020

Old salts are deleted to avoid the possibility of linking visitor information from one day to the next. So yes, there's no way for us to know whether the same person returns to a website on another day. See https://plausible.io/data-policy

jefftk · on Oct 23, 2020

That is deterministic, but the key thing is that it is not reversible

wizzwizz4 · on Oct 23, 2020

Technically, you could enumerate all four billion IP addresses (multiplied by all common user agents) to reverse it. This is, however, prohibitively expensive for tracking, so I think it does the job.

jaywalk · on Oct 23, 2020

Not without the salt, which they delete every day. Pretty much impossible.

Nextgrid · on Oct 23, 2020

Is the salt key stored, or is it discarded?

markosaric · on Oct 23, 2020

Old salts are deleted to avoid the possibility of linking visitor information from one day to the next. See https://plausible.io/data-policy

kevincox · on Oct 23, 2020

Note that anything deterministic on IPs is reversible. There are only 4 billion IPv4 addresses so brute forcing is trivial.

It is more complicated for IPv6 but enough of the internet is IPv4 that you can't ignore that case.

gspr · on Oct 23, 2020

Nitpick: if it's reversible, determinism doesn't matter.

Nextgrid · on Oct 23, 2020

Yep indeed, deterministic isn't really the right word here. Reversibility is all that matters, although am I correct in saying that it would imply determinism?

gspr · on Oct 23, 2020

> am I correct in saying that it would imply determinism?

I don't know, because neither "reversibility" nor "determinism" are precisely defined (this is not criticism of your comment in any way).

Here's one semi-reasonable interpretation of the two words for which reversibility would not imply determinism: Imagine a "process" (I, too, am being imprecise and calling this a "process" instead of a function) that takes as input an integer between 1 and 6 inclusive. Its output for the input n is a dice roll with a dice that is biased in favor of n, but is otherwise fair. Now, this is not a deterministic process, but if you are allowed to feed it the same input multiple times, you can probablistically reverse it.

Anyway, sorry for the tangent – your original point was the important one.

donohoe · on Oct 23, 2020

The point to note here is "if". Happily, they (Plausible) don't.

dbbk · on Oct 23, 2020

It's not reversible, it's hashed with a daily salt.

lez · on Oct 23, 2020

I have the feeling that GDPR and Cookie consent laws themselves, ironically, make harder for the services to provide privacy.

cuu508 · on Oct 23, 2020

How so?

Storing a "user has opted out from tracking cookies" binary flag in a cookie is not the same as storing an unique identifier in a cookie.

nodex-alex · on Oct 23, 2020

Most websites are not GDPR compliant, if you don't like it then lodge a complaint with the relevant regulator.

KingOfCoders · on Oct 23, 2020

a.) The term "GDPR Compliant" does not exist. All software can be "GDPR Compliant" and still do fingerprinting it there is consent or necessities (hard to do). What they mean is that you do not need to get consent from your users to use Plausible.

b.) They don't store IP addresses. Information they gather are not stored in a way to build user profiles or do fingerprinting.

It doesn't look like the articles author took a look a the Plausible documentation or source code.

KingOfCoders · on Oct 23, 2020

I've was implementation lead for several GDPR implementations in Germany. Only on HN would a comment with facts that clarify a subject where a lot of misinformation exists get downvoted.

If you've downvoted that comment you have done the community a disservice.