Amen. And wouldn't it be more informative to be told how many browsers were found to be indistinguishable from the user's? Pretty sure my iPhone is completely generic. That should be a detectable category for them by now.
This could also be extended to include WebGL extensions/capabilities. Lots of data there: http://webglstats.com/ - various WebGL extensions, stats like max texture size, max varyings etc. dependent on hardware.
I wonder how it would fare if it included WebGL stats but excluded all plugin data, e.g. stuff from Java or Flash. Seems to be a direction browsers are moving in.
There is a big flaw in using browser information as an identification token in this way as browsers change.
I visit this site at least once a month and every time it picks me as 'unique'. That is because each time i've either installed a plugin, installed a font, my browser version changes every 5 days with automatic updates, installed a new browser, installed a different browser, am using multiple browsers, am accessing from my phone, ipad, work computer, etc.
This makes it useless for actually tracking people.
To make browser fingerprinting anything more than useless in tracking you need an algorithm that fingerprints but doesn't capture everything - such as how hardware-locked licensing works and Windows Genuine Advantage - you can change your soundcard or any minor device and your overall ID/fingerprint is still the same (i'm struggling to remember what this technique is called).
When reading the NSA revelations I dug through everything with an eye on wanting to know if they used browser or machine fingerprints to track users. They don't. There is enough unique information in IP address, cookies and email accounts alone.
And I bet most people who are concerned by tacking online and impressive demos like this are still using their real IP address online, still accepting cookies, etc.
> There is a big flaw in using browser information as an identification token in this way as browsers change.
> I visit this site at least once a month and every time it picks me as 'unique'. That is because each time i've either installed a plugin, installed a font, my browser version changes every 5 days with automatic updates, installed a new browser, installed a different browser, am using multiple browsers, am accessing from my phone, ipad, work computer, etc.
> This makes it useless for actually tracking people.
Not at all. This site isn't a demonstration of tracking you, it's simply a measure of how unique your browser is. Somebody who actually cared about tracking you could trivially track changes: a profile with a set of 191 particular fonts and a later profile that is identical but with 190 fonts is probably the same person.
In the fraud detection and ad tracking spaces there are companies that use this technique. They are able to adapt to changes in the browser (using statistics, cookies as bridges, your favorite shopping site's userId, etc). In the tracking world there are no silver bullets, so the more ways to connect different sessions, the better.
This site uses the same code as http://browserspy.dk/ uses to identify your fonts installed. Its a flash plugin, so disabling flash will make it unable to list fonts.
Yet another reason to not use flash. Its good to know that this functionality is not built into the actual browser.
HTML5 has been good enough to switch flash off for a while now. Probably longer than two years. My main browser hasn't had flash in that long or longer and I haven't missed anything.
Most video sites revert to html5 video. You are also telling developers who track their flash install base to update their site to be compatible by voting against flash with your browser.
More important, i've been disabling flash on the computers of non-tech friends and they don't seem to notice it either.
If you need Java for banking, setup a second or third browser where you only access that site, and by only entering in the URL directly.
With a VPN or some other kind of proxy possibly? Traffic is completely encrypted between you and a VPN, and the websites you are visiting only see the VPNs IP.
Back then, the proponents of that plan insisted the privacy concern was largely academic. I somewhat wonder what would've happened post-PRISM leak had they ever implemented it.
What does something like PRISM have to do with an individual privately run site using information provided by the client software in order to prevent abuse?
Expecting privacy-minded people to opt-out of using telephones or emails for fear of government monitoring isn't reasonable. Opting out of Stack Overflow is nowhere in the same league.
> What does something like PRISM have to do with an individual privately run site using information provided by the client software in order to prevent abuse?
I feel like I explained the issue pretty clearly in the preceding sentence:
> Back then, the proponents of that plan insisted the privacy concern was largely academic.
We now know via the various leaks that have occurred recently, that government agencies will use any information available to them and that there is largely nothing the public can do to stop it or be made aware of it except through the occasional one-off leak like PRISM.
That is to say, the privacy concerns related to any private company tracking PII to this degree aren't academic or abstract. Information kept to prevent abuse can just as easily be repurposed for surveillance. I suspect (and would hope) many of the same people pushing for this feature back in 2011 would reconsider their support for a feature like this knowing what we know now.
> Expecting privacy-minded people to opt-out of using telephones or emails for fear of government monitoring isn't reasonable.
I'm not sure what part of my comment lead you to believe I would think otherwise, but I don't.
> Opting out of Stack Overflow is nowhere in the same league.
In the comments of the post I linked, several people were insisting that there wouldn't need to be an additional disclosure because it's information that's either public, available to other third parties (like browser makers), or already available to Stack Overflow in a different form. Not knowing that a site like a hypothetical Stack Overflow—where a feature like this was implemented—is keeping records on this level and therefore not opting out isn't informed choice.
Indeed, a significant portion of the outcry with respect to the NSA leaks is that nobody knew that the government had such a direct connection to the information stored by private companies that people could've easily opted out of using (e.g., Facebook, Apple, Google, or Paltalk).
But all I was attempting to do here was mention a past experience with Panopticlick and muse how it might've went today. I guess I've been bitten by something pg once said[1], that "you can't be concise on forums because if you leave any possible room for misinterpretation, someone will reply with it".
The response to that concern at the time was that most of the people who would know how to get around it wouldn't need to sockpuppet[1]. I suspect, if it were ever implemented, there'd be some attempt to obfuscate or hide what was actually tracked like they do for their other fraud algorithms. But issues like that and the privacy issues were pretty indicative of its lack of value and it doesn't surprise me it was never implemented and didn't even warrant a response from an SE employee.
> one in 506 browsers have the same fingerprint as yours.
So is this good or bad?
Some time this year, I'm planning to write a browser add-on which will send random (legitimate, from real browser versions) header combinations of the user agent (+OS) and accept headers. It can be semi-random, e.g. send the same headers to the same host during one visit. Combine it with the NoScript addon, use the RequestPolicy addon, block 3-rd party cookies, tell the browser to delete the cookies and local storage on exit, use plugins only in "on-click" mode (or don't use plugins), don't send "referer"s (or send fake "referer"s), use Tor for HTTPS sites (and sites that don't need authorization), and this will make hard to track you.
Thank you for your support. I'm sorry, but I use Firefox, and I have already created some small add-ons using Mozilla's Addon SDK, so I'm already familiar with it. If I manage to write it (and it will be under GPLv3), I will try to port it to Chrome too.
Yeah, I've thought about this too. I was going to try and use a system-wide proxy, rather than a browser plugin, to capture all outgoing HTTP traffic and sanitize it.
Making it more widely useful would require a lot of thought, because much of the functionality of the current web is predicated on using these same vectors. There are interface issues, and fundamentally, the web would be significantly less useful for a large number of people if the privacy situation were ameliorated.
When it's done in proxy level, you can't change the JavaScript objects, so the global objects like "navigator" (which can be used to extract your browser and OS version) will still be available for trackers (if JavaScript is enabled).
that's good 1:506 means you're not very unique. Unfortunately for me it says appears to be unique among the 3,229,643 - probably because I'm Chrome/Linux, my ratio basically means I can be trivially tracked across the web.
It sounds like it would absolutely work for images as well, as the article notes that your browser will permanently remember the redirected location, and use that in the future.
For the moment, there is no protection from fingerprinting. Browser extensions that change some of the parameters of your browser’s snapshot make you even more identifiable because there are often other ways to check the values of these parameters. However, some of your parameters change by themselves, for example, after your web browser updates, or simply when you travel or use external monitors. Panopticlick does not take it into account, however effective fingerprinting libraries are able to identify you because they monitor your consequent visits to the websites.
"Browser extensions that change some of the parameters of your browser’s snapshot make you even more identifiable because there are often other ways to check the values of these parameters."
For example, use Useragent Switcher in FF, this is effective as long as you keep JS off. With JS the site can interrogate the client and report and any discrepancy would be found, and distinctive.
I've been looking at this from the avoiding-tracking and from the server side (identifying clients independently of cookies). The client is very limited in options with JS on.
Users need more control of this in the browser.
In an application you could in principle track users even across some changes with a sort of "preponderance of the signals" confidence value - weighting several things like Ip, platform etc..
I tested the browser I normally use for untrusted sites, which has adblock, ghostery, and lots of other custom components bolted odd. Unique.
Next, I tested the browser I never use: IE10. It was installed with the OS. I presume it's been kept up to date, but I haven't used it let alone modified it in any way. Also unique.
This is fishy.
----------------
Update:
Okay. It makes sense now.
Part of this test gathers a list of system fonts installed. I have some pretty weird ones installed which seem sufficient to uniquely identify me.
System fonts can betray your identity online... Who knew?
Just recently there was an article posted on hn about identifying computers by their render rates of various components in the browser. There is an entire sea of ways to identify people.
You should disable flash and java on the untrusted browser - those two are popular channels for exploits, and that is how your installed fonts are leaking.
However even without the fonts the combination of screen size timezone, browser and plugins list seems to be pretty unique.
First of all, when you get a 'unique', turn off javascript, then go back and click the 'Test Me' button again. You'll see much of your 'uniqueness' go away, and many boxen say 'no javascript'... no plug-in or font details sent.
Get a user-agent switcher and try several different browsers. (Use the 'Test Me' button after switching.) In the past I found IDing as IE8 made my browser 1 in 3000 or so.
Another good site for testing your settings security is grc.org.
My designer friend was watching a livestream, where he clicked a link that infected his OSX Computer at work. He was worried and asked me to inspect the payload, which was funny, because I only got a blank page. Only modifying my UA-String gave me access to the java exploiting payload, but imagine how they could go undetected by sniffing for more than just the UA-String!
After some investigation I found out that one guy has infected many thousand OSX and Windows PCs and turned them into drones. Now I know how they make money, selling their bot-nets. Kinda disappointed, I thought there is more thought and work required. But you see the point, anybody with enough patience can do that today, the tools are available.
You might be surprised to hear that many tens of thousands of clueless kids, teenagers, and adults pay for all-in-one kits that host Java drivebys (malicious self-signed Java applets, and Java exploits), redirect visitors of compromised sites to one's Java driveby, infect them, and manage such botnets through fancy Web 2.0 interfaces.
And they don't even need (and often do not have) a shred of basic IT knowledge to do any of this, let alone programming knowledge.
Wow, I had no idea that my system fonts were trackable with Flash. That's like an instant fingerprint, as many developers and designers have at least a few custom fonts installed -- if not a few dozen.
You can probably do something similar with javascript, either by comparing a <span>'s width, or by using canvas etc sampling pixels, to see if a given font is installed or not. You'd need a pretty long list of fonts to check, though.
Flash makes it easier since you can simply enumerate all the fonts.
I'm unique again every time I've come here (probably 12 times over 3 years). I've never been not unique. I don't think that's very trackable if I appear to always be a new person every time.
These numbers don't make much sense to me. Only 2.5% of the site's users are running 1920x1080x32? Only 4% of people going to an EFF website are in the Pacific timezone?
Well, only 14% of the US's population is in the Pacific time zone [1], so that would imply that about 2/3 of the visitors are non-US, which seems reasonable.
And yes, 1080p is still a fairly uncommon resolution in this laptop era. StatCounter estimates 7-8%, and it's somewhat likely that StatCounter and EFF are going to have different skews [2]
At the moment, 15% (1:6.8) are in the UTC+2:00 timezone, which covers most of mainland Europe using Daylight Saving Time.
Sounds ballpark reasonable to me - Europe is twice the population of USA, and USA is split over more timezones. And Asia has more people than USA and Europe put together.
I think it mostly goes to show that it doesn't take much information to reduce the number of possibilities by 1-2 orders of magnitude.
In my case, Safari on iPad makes me 1:67,000. Chrome on the same iPad makes me 1:1.6 million.
It took me a moment to realize that means I am completely trackable.
A little surprising since I just built this computer from scratch a couple weeks ago. I'll take this more as a commentary on the popularity of Windows 8 more than anything.
This is what I think: this "service" is rather old, and I'm sure it is not being used regularly by lot of people with recent versions of browsers, so the majority of those 3.2 million fingerprints were made in older times (when it was being Slashdotted/HNed several times, like today) that's why we get many unique results.
That's not uncommon for (custom) dev builds, or if you have a plethora of addons that manipulate core browser behavior. I have a user and dev profile on firefox.
But that makes you more unique and a better target.
"My browser firefox fingerprint appears to be unique among the 3,230,950 tested so far and 21.62 bits of identifying information, mostly acceptable to be shared." But this is just a demo, a real attacker could get much more info out of it, I'm not protected against this.
Why do you think that you have gotten that result? Did you compile Chromium with some special flags?
Surely if you want to track users over a much longer (obviously this works very well for even a large number of sessions) period of time you wouldn't rely on plugin and web-font data staying the same.
I suppose you might have a confidence equation based on overlap, since changes would likely be small and gradual.
I have always wondered if there is a "master list" of things that can be used to uniquely fingerprint a browser. For example, I don't see system time offsets[1] being used here.
See the bugs under "Depends on" for examples of changes they've made, or Dave Garrett's comment #59.
I've seen this link several times before and it always says I'm unique. A new discovery this time: Ubuntu is patching Firefox to gratuitously add itself to the user agent string, giving it more visibility, but also making all Linux users more trackable. That's pretty shitty, and they seem to do the same thing to the packaged version of Chromium. I don't know what to do about it, so I'm afraid my complaint isn't actionable.
Is there anything preventing a website that has access to your email address or name for that matter (ie because you created an account on that site) to sell the information between email and browser fingerprint to advertising networks?
Nothing at all. Working with the data industry, I've been blown away by the links every advertising network is trying to make between the crumbs we leave all over the place. Expect your TV (not even a smart TV) to be displaying ads based on your browsing history in the near future, all linked by your IP address. The data industry will absolutely whore your data all over the place. Datalogix, BlueKai, and many others have raised millions to do exactly this sort of thing.
Is there already a fix for that? Just a plug in that blocks all unique identifiers and responds with a windows-out-of the box configuration should solve the problem.
What we really need is a modified ``privacy'' build of FF that reports spoofed js/css data (screen res etc), but I'm not skilled enough in web stuff to make that.
Upvoted you even so it doesn't bring a solution. They clearly state
For the moment, there is no protection from fingerprinting.
This seems just a wrong statement. If Firefox and Chrome are open source there must be a way to modify the fingerprint. Even if it comes in form of some "pre-loader" that blocks the sending of one fingerprint and overwrites it with another.
My suspicion is that sites coming across as super sneaky (e.g. LinkedIn) are actually using browser identification / fingerprinting to make their otherwise impossible suggestions.
As a side note, I've been impressed with French academia lately. Between INRIA and IRCAM, I see a lot of quite practical stuff that I like coming from there.
Glad to help. Yes so do I, I've seen a lot of INRIA projects that stunned me. I didn't know about CCRMA, would you mind sharing some stuff you found over there?
Hah sorry, I edited my comment before you replied, but I got a brain mix-up between the international computer-music centers' acronyms: CCRMA is an American one at Stanford, but I meant IRCAM, the French one at Centre Pompidou. The third of the "big three" is CNMAT at UC-Berkeley. All are 5-letter acronyms, so it is a bit easy to confuse...
It will not be very interesting if you don't care about computer music, but IRCAM has an ethos of producing many projects in that area. For example, the Max system that later became Max/MSP (and later the open-source version, Pd) was originally an IRCAM project. They also have a Common Lisp based visual-score system (http://repmus.ircam.fr/openmusic/home), a system for data-based resynthesis using musical corpora (http://imtr.ircam.fr/imtr/CataRT), and a number of other things, including many projects more on the music/composition side.
What? Wow, that sounds really dangerous. The same fingerprint is available to any site out there, so anyone could just capture it and try to replay it on a "SecureAuth" service, no?
Here's how it works. As a user I want to connect to a web resource, I am redirected to my idp (SecureAuth) where we ask you your name, then I look at your account and ask you to enroll, then I ask you your password. Now before I generate a claim/token or what ever I take your fingerprint so for the next day/30 days/90 days or what ever your fingerprint is checked between your username/password and the strong authentication is transparent to the user.
The variables can all be weighted differently and the fingerprints can easily be revoked but it is very cool technology.
I also used a similar concept to help us investigate charges of of cheating in an asynchronous online strategy game where sock-puppet collusion would destroy the balance. Having an extremely high probability that two accounts are actually the same browser made it easier for us to warn|block|ban cheaters.
The user signups up, connects their phone to their account and you use the fingerprint as the ID for future validation?
How do you handle changes in fonts/plugins/browser versions/timezones while travelling.
All of that data seems to be muteable over time. Unless it's a 99% match kind of thing. Also what about companies that hand out laptops with the same base OS/browser-installs (or is that the goal)?
Client side changes are handled in a weighted model. If the heuristics match between 85 and 95% we might update the stored fingerprint.
While we can use this for SSO it's usually tied to another factor of authentication. Something like a user/pass, social id or maybe saml assertion from another claim provider.
We are just scratching the surface of what this type of technology can be used for.
sounds awesome and at the same time it's shocking how unique and identifiable we are online. Did you consider adding _buzz_and_words_here_ 3-Factor Auth (4-Factor for Gov. clients)? You could solve it in a way that makes it very comfortable to the user, yet very hard to crack for crackers.
1. Your Tracking Technique + DRM-based-License Tracking
2. One Password / ID (No username required, it's optional)
3. Typing Speed Tracking 99.5 percent accuracy [1]
4. (Bio-metric Data like Fingerprint or HD Iris Scans for Gov. Clients)
DARPA is also working on it, definitely watch this (Google alredy uses this!):
Yeah we provide the solution to a lot of government clients. Strange enough, our international clients like the 3 or 4 factor authentication. It's very very easy for us to do though as there's no limit to chaining the work flows together.
I know a couple groups that use this technique to track iOS app downloads. Although iPhones aren't very unique, looking at the somewhat unique fingerprint + timeframe from browser to app d/l works out pretty well.
I came across this research when I was trying to fingerprint mobile devices. Apple has a seriously tight handle on making sure its iPhone users are not uniquely identified via mobile safari.
From what I gather going by this site that I consulted (http://www.m-w.com/dictionary/unique) it seems that anything less than 2 would qualify you as unique.
You may feel that as the sample set grows others may join your party but you have not told us why you believe this to be the case. I wouldn't be too sure, I would be more hesitant in reaching that conclusion.
As it stands you are a unique snowflake and so am I.
if we can't hide the plugin info, can we instead generate some random string to cause this fingerprint to be unique (and different) on every http request? (e.g. fake font, fake plugin name)
My browser is apparently unique among all those tested so far (around 3 million) and gives you, with all the odd setup options, a little bit under 22 bits of information.
Okay... so what does that actually mean to you?
Perhaps the best way to proceed here is by comparison:
To pick out one person uniquely among the 7 billion people on Earth requires ~33 bits of information.
Each bit of information divides the search space in two, (just like with computers =p )
Which is why you can just go:
log2(searchspace) = required bits
You can also work the idea backwards. (i.e. 2^(available bits of info)) to find out what size of search space you can be found in.
It so happens I give up about 22 bits so... I can be found in a population of about 4 million.
Put another way, there are, best case scenario assuming that the distribution is random (which in practice it almost certainly isn't), around search space / uniquely identifiable pool people with a similar fingerprint to myself.
In this case we're dividing seven billion by 4 million which should give you around 1,750 similar people to myself.
So... that's pretty darned accurate - but not that worrying yet perhaps. At least if you needed to uniquely identify me.
But I'd bet a heck of a lot they do have other info.
Every bit of information under that needed to identify you uniquely in the search space doubles the potential group size you have to hide within. Every bit of extra information they have, halves it.
Not everyone in the world is online. Only 39% were predicted to be so this year, I believe.
Suddenly you're looking at only having 683 people like you on earth.
And it gets worse.
The really relevant search pool is going to be how many people connect to the sites they know about - which are probably going to be multiple sites since they can store and trade your IP address which probably isn't going to change that often. I bet most of the sites I connect to don't get anywhere near 39% of the world's population connecting to them.
How unique am I for the sites I visit? The sites I visit, if they're niche like HN, probably gives a HECK of a lot of info on me. To the point where I suspect I can be absolutely uniquely identified here by my browser fingerprint. You'll notice that of the 3 odd million people on the EFF site I provide more than enough info to be uniquely identified.
And even if they don't, the cumulative probability is the product of the sum of the individual probabilities. If someone goes on two niche sites, or three... The search space gets cut up again. Snipety snip.
So, yeah, in connection with other databases and the usual attack vectors people use when they start getting info on you - targeted adds, security profiling etc, that's pretty worrying. It's especially worrying for applications where people aren't going to have a high cost from hitting the wrong target.