> Canvas Fingerprinting. They draw an image in the background using vector graphic commands. Afterwards they save the image to a rasterized PNG. This data is quite unique among different devices depending on settings and hardware.
> They also use audio fingerprinting to identify visitors. This doesn’t mean they actually use your microphone or speaker. Instead they generate a sound internally and record the bitstream, which also differs from device to device.
This really blew my mind. Correct me if almost all of them are doing this. If it is so, the congress hearing last year, all those privacy suits, all went into vain didn't they. (PS. bad english)
99% of websites we visit do not need canvas or sound. And the few websites that do can explain why you should click "Allow" when they prompt you for access.
What's a charitable reason that stops even a supposedly privacy-concerned niche browser like Brave from implementing opt-ins for these things?
I suppose one reason is that you would immediately unleash opt-in spam on your users that don't know what these pop-ups mean since so many major websites use these hacks, so the average user is just going to be conditioned to mindlessly click "Allow All" every time they pop up like UAC on Windows Vista, punishing the average person while doing nothing to enhance their security.
Doesn't help that legislators are absolutely clueless here. For example, demonizing cookies when cookies are the most fair and transparent way to implement tracking, and leading to the first wave of pointless opt-in spam that plagues the internet. I'd rather they leave the internet alone and for browser vendors to step up for us.
Of course, the problem also includes native apps like in TFA. I'm just more optimistic about clients that run in web browsers.
In Firefox, setting privacy.resistFingerprinting = true in about:config fixes the canvas leak and possibly the audio leak. It's part of a push to bring into Firefox privacy features from the TOR project.
I really want to do this, but last time I checked it also leads to FF sending UTC time back to sites (or something like that), resulting in showing non-local time for your own and other interactions on pretty much all sites like github/slack/... Fairly annoying. I hope this becomes a separate setting at one point.
I enabled it and Todoist would show me a dialog on every page reload to set my timezone to UTC. After I realized that resistFingerprinting was causing it, I had to disable it.
It would be nice if the settings inside resistFingerprinting were configurable. I understand that the point of it is to make every browser look the same to analytics engine, but having timezones and zoom levels reset is not the best thing.
- for web, stop using chrome, install firefox (or firefox mobile) and in about:config set privacy.resistFingerprinting on true then add following addons:
They will not only prevent fingerprints but also screw with the data (add random noise to audio/webgl sample, return random fonts,...).
- the most important rule, don't use applications like tiktok, fb,.. if your phone is not rooted, with xprivacylua (https://github.com/M66B/XPrivacyLua, for added kicks https://github.com/M66B/NetGuard) installed and you have basic understanding what you allow there (disallow everything for new apps and work permissions one by one). The sole purpose of those apps and their bussiness model is to steal your data. This is most sane advice I can give, sorry :(
Voila. Solved.
Those methods of fingerprinting are few years old and well known.
> What's a charitable reason that stops even a supposedly privacy-concerned niche browser like Brave from implementing opt-ins for these things?
In my experience, blocking these everywhere globally as a default will result in being banned from websites, trigger bullshit "fraud" invasive analytics, and all sorts of obnoxious fail-closed problems by invasive trackers.
You will also be banned from most Distil-hosted sites for refusing to enable WebGL or return a canvas fingerprint. Ticketmaster, etc.
If a large enough chunk of people do it, then it becomes unfeasible to block them all. If Safari, Firefox and Internet Explorer all implemented a block, it doesn't really matter that Chrome doesn't. That's ~30% of browsing traffic, and no sane company with a large web presence is going to throw away 30% of their traffic.
Agreed! There's a lot of moaning about how things should be better.
Be the change you want to see. Do it. I'm always amazed at how people will complain but then let themselves be pushed around. I wonder how many people reading this article will change their or their browser's behaviour afterwards. Very few I guess.
Sounds great - especially if chrome could roll out default blocking of it.
The EU (via GDPR) made a collective decision to disallow invasive privacy, if TickTock's response to that is to ban everyone in the EU and entirely lose the market that sounds good to me - teenagers will complain and people will shrug.
This is being done in EU anyway. IIRC I've seen several third party vendorssay that this type of tracking is legal under GDPR to justify 'anti fraud activities' like tracking users and sharing access to their data with other paying customers.
I personally have EasyPrivacy enabled on uBlock (because these tracking scripts using canvas, webgl, etc spin up my GPU and absolutely shit all over my battery life) but they are getting more and more aggressive with first-party CNAMEs, shipping a list of dozens or hundreds of hostnames to try progressively if any are blocked (and probably honestly risk score you higher for "trying to evade them" even though it's a default uBlock list).
I think it's honestly rather stupid, and I am not a fan of hearing my fans spin up for no reason accessing an otherwise plain website while it tries to open media players, try video elements, draw to canvas, download several megabytes of obfuscated garbage to evaluate.
I've worked in anti-fraud software and it isn't necessary. Fingerprinting is extremely convenient and sales loves it, but I would hope that necessity could be broken down by reasoned arguments in the EU - I'm outside of the EU and not familiar enough with GDPR but there are a number of clear and accessible counters to that point - especially if payment is involved. If payment comes into play then fingerprinting becomes irrelevant as anyone making a payment can be trusted up to the level of that payment and that payment method tied to their actions.
> And the few websites that do can explain why you should click "Allow" when they prompt you for access.
You missed it, this doesn't require an Allow, because it's not actually accessing the microphone. It's using the audio APIs to generate a waveform, and depending on your computer/DSP, the waveform will be slightly different. It's not using any of your computers sensors
> You missed it, this doesn't require an Allow, because it's not actually accessing the microphone.
Presumably, without an allow the API doesn't function at all, so it works perfectly well. When denied access, the API should either not exist, or (more appropriately) just return errors.
We're talking about a theoretical situation where we gate the canvas and audio api behind a permission prompt. My "presumably" is about how that would be theoretically implemented.
Except prompt spam is a real issue too. Very quickly you can end up like Windows' UAC that basically trains people to blindly click "yes" on any dialog they see. Putting a permission prompt for every tiny little thing a browser can do isn't a good experience for the user, and most people will have no clue what "allow access to canvas/audio api" means.
Except that with a prompt for a specific API, you can get specific on when it's appropriate. E.g.
"This website has asked for permission to use the Audio API. If they aren't actively doing audio processing, this may be an attempt to identify and track you. If you haven't been presented by the site with a good reason why to allow this access, and it's not immediately obvious (such as for a music service), consider whether you want them to have access."
This accomplishes a few things. First, it's very clear about the implications and when it's obvious it should be allowed. Second, it gently suggests the default behavior for people should be to deny unless needed. Thirdly, it communicates to sites (through users the appropriate way to ask for permissions if hey have a legitimate reason, which is to let users know why they need access. Lastly, if sites lie to users about why they need access, they can be called out on that. People don't like being lied to.
Now imagine getting 10 of these, as a user who has no idea what those words mean, nor care, because you just want to see the latest meme on Facebook, and you just blindly press Allow on everything and quickly become desensitized to all prompts.
Gating the API behind a permissions dialog or permission setting. The difference being that an end site either uses it and gets valid data (because the user allowed it, the default without a dialog today), or gets errors letting them know the data can't be relied upon.
If you visit a random internet site and get a prompt for the audio API are you going to allow it? Probably not, especially if you think it's just so the site can be obnoxious with sounds or listen to you.
The canvas api might be a little more likely to be allowed by the average person, but enough might disallow or ask about it that sites that have a valid use for it will put up a banner noting why it's important, making sites using it for tracking stand out all the more.
The bottom line is that we're coming to an understanding (well, it's been known for years) that each additional browser feature has downsides, so making them enabled by default has repercussions.
> Gating the API behind a permissions dialog or permission setting.
Those dialogs and settings are controlled by the Permissions API today.
> If you visit a random internet site and get a prompt for the audio API are you going to allow it? Probably not, especially if you think it's just so the site can be obnoxious with sounds or listen to you.
You and I might disable it. Overwhelmingly, the majority of the public click "enable" or "yes" on permission pop-ups. And tend to make sure "enable forever" is ticked.
> The bottom line is that we're coming to an understanding (well, it's been known for years) that each additional browser feature has downsides, so making them enabled by default has repercussions.
I don't disagree. The problem here is that the permission model itself has been broken, and won't be repaired because of backwards compatibility, and a lack of incentive.
This is a genuinely great idea. Other stuff like recording audio, looking at your location, and even annoyances like sending you notifications all require a permission request that appears in the browser chrome. Why doesn't Firefox put other features that impact privacy behind the same barrier?
> Doesn't help that legislators are absolutely clueless here.
You're a world-leading computerist and you had no clue either (nor I), how in the world are legislators supposed to do this. It's a fundamental problem. And once it's finally legislated the sme's are already on the next thing.
Brave is just snake oil. They couldn't ever implement true canvas or audio protection or prevent font or window decoration fingerprinting even if they wanted.
If you want that stuff you use Firefox and enable resistFingerprinting its the only possible way
This kind of fingerprinting is used across the industry for anti-fraud purposes.
The problem is that it used to be good enough to block "known-bad" IPs but now with AWS and cloud services it's very easy for cybercriminals to get around IP blocks.
For normal users, tracking can be done with cookies, so fingerprinting isn't really needed for normal users anyways (not entirely true if you're a totally bad actor, which is why browsers have been trying to block some of the more common ways to fingerprint).
But for a script that spawns a thousand AWS instances to sign up 1000 bot accounts that can then be used to sell likes for example, it's pretty easy to tell that it's a script instance because all of them will behave in almost exactly the same way (processor performance, installed plugins, reported screen size, etc. etc.).
An "alternative" to using fingerprints would be to use captchas instead, but bots have gotten much much better at solving captchas. So in fact, ReCaptcha will also use a number of fingerprinting techniques, which is why in many cases you can just click the check box instead of solving a captcha.
> it's pretty easy to tell that it's a script instance because all of them will behave in almost exactly the same way (processor performance, installed plugins, reported screen size, etc. etc.).
Processor performance is variable based on the particular instance you are running on and how much load it is handling. At least at the level the remote side can see.
Screen size is easily configured and/or randomized to some degree, or shifted between 10-20 common values.
Plugins reported or actually allowed to run can be changed per instance.
All this stuff is trivial with headless chrome and puppeteer, and even abstracted away using the stealth plugin for puppeteer[1]. And headless firefox through puppeteer is experimental
All this fingerprinting is ridiculous and trivial for someone with any incentive to do so to defeat.
The fingerprinting signal does not necessarily need to be revealed to the fraudulent actor in order to be useful. It’s a cat and mouse game, but it’s worth it because of the cash at stake.
> Screen size is easily configured and/or randomized to some degree, or shifted between 10-20 common values.
I don't want my browser to keep changing its screen size.
> All this fingerprinting is ridiculous and trivial for someone with any incentive to do so to defeat.
I disagree. There are a number of efforts to defeat fingerprinting underway already and I think the fact that they haven't been able to eliminate it says volumes about how difficult of a problem this is.
> I don't want my browser to keep changing its screen size.
Neither do I. I'm taking about how useless it is to use these metrics to identify bots, sine any bot can easily circumvent them, and with far more less hassle than a user.
The fingerprinting is "for" fraud, but is not very useful in that context. The fallout is that all our privacy is worse though.
> I disagree ... the fact that they haven't been able to eliminate it
They haven't eliminated it because it still works well enough on end users, even though the narrative is that it's to prevent fraud.
Fingerprinting works for users, but for any halfway competent adversary it's close to useless. Just keep that in mind when it's brought up as a solution to fraud and that's why it's worth allowing.
TOR makes the most usability compromising security choices of any browser and even they don't have a fixed window size to avoid any fingerprinting here.
They do prevent manually resizing the window because the exact width could be used to rack you across sites in the same windows.
Ah yes, so trivial. That must explain why millions of people are doing it and we have perfect privacy online. Who knew perfect privacy was one hackathon away the whole time?
The context of this is that the tracking helps prevent fraud. It's trivial for someone to circumvent it that puts some effort into it, so it doesn't work well for any but the most simple instances of fraud detection.
We all, as end users, pay for this because as an end user it's much more onerous to work around because site specific tweaks to make a site work require a lot of effort.
I thought the IP blocks that most cloud service providers & VPNs have are well known, and for services that don't need their customers to talk to cloud servers, they widescale ban the IP ranges. Like netflix and many others.
Not defending anyone but I have to point out that fingerprinting sounds like finding a fingerprint that could identify an individual, but that not true.
AFAIK browser fingerprinting (eg: Fingerprintjs2[0]) at least is nothing like fingerprints at all. It's not accurate and cannot identify a specific person. For websites like TikTok, the same fingerprint might points to 10k different people, instead of an individual. A most common use case is like 'Remember me' option when logging in if they haven't switched their device, but they actually can find other devices that have the same fingerprint.
For example, the most widely used library Fingerprintjs2 uses all the factors that can be grab from the browser (Browser agent, CPU, OS, resolution, etc), and generate a hash from those factors. Even their 'pro' version can only claim 99.5% accuracy, though I doubt the number is actually even lower. For 1M people userbase, it's 5000 people for 0.5% of the population.
I’m not too sure what “99.5% accuracy” is supposed to mean, but sites like https://amiunique.org usually tell me that my browser fingerprint is unique. Seems rather like a real fingerprint to me, except that I have more than one.
Combined with other data points like usage patterns (how the user behaves on the site, which features they use, etc), IP address I'd be confident the resulting fingerprint can be unique.
Seems obvious once you read how it works. I suppose once you get your mind around how browsers leak and/or persist info, not much of it is surprising anymore.
I've tried using addons like uMatrix in the past and always gave up, but this just convinced me to deal with the (minor-ish) annoyance they add. Canvas fingerprinting is somewhat well-known and there are browser extensions to block it. But if TikTok is doing audio fingerprinting then you can bet FB, Google, and everyone else are doing it too. JS is too feature-rich to be safely used & allowed.
I haven't found a way to block canvas fingerprinting. For some reason, I haven't been able to default to block in firefox.
Even with CanvasBlocker the "check your fingerprint" sites still show a unique fingerprint (but canvasblocker has a ton of really obscure options that I don't understand)
> Even with CanvasBlocker the "check your fingerprint" sites still show a unique fingerprint (but canvasblocker has a ton of really obscure options that I don't understand)
It always be unique, because it's randomly generated. But every visit will have a completely different unique value, which means that the fingerprint from the canvas is a useless value.
The setting in Firefox about:config privacy.resistFingerprinting includes mitigations for some canvas fingerprinting attacks. It's part of a push to bring into Firefox privacy features from the TOR project.
> They also use audio fingerprinting to identify visitors. This doesn’t mean they actually use your microphone or speaker. Instead they generate a sound internally and record the bitstream, which also differs from device to device.
I don't understand. Can anyone unpack this concept for me? How does one generate a sound without a speaker or record without a mic.
And I assume that different audio drivers and software will produce minutely different outputs. It's also possible that they're queueing a sound to be played then canceling the sound after reading the raw computed signal out of the buffer.
It was my understanding that these methods profile performance of the API which will execute at different speeds on different devices. The samples themselves shouldn't be different if they're using AudioBuffer and typed arrays.
True, but I think the technique points to the way that other timing attacks used as fingerprinting vectors can and will work. Profiling performance of network requests, image rendering time, etc will always be risks unless all Javascript features employ that kind of mitigation.
One thing that I can tell you is that they are using this to sign the request to their private API. They might be using for tracking it too (I never looked at that), but it's used to prevent their platform from being scraped.
Those techniques described here were pioneered by Facebook and Google. Basically they are (automatic) CAPTCHA.
I still remember the news articles refering a patent Facebook filed about audio fingerprinting as the evidence of Facebook eavesdropping on the people.
Also technologically with some encryption argorithms, a company can claim that they are not capable to identify user with the fingerprint. But in theory given enough data they still can.
This is interesting because a cookie with a uuid is probably a better tracker, specially with shared computers.
I know cookies can be deleted but expect the majority of their users wouldn’t be deleting cookies or using ad blockers
Three of the main things he calls out are caused by embedding of the Facebook SDK, the Google Analytics SDK and AppsFlyer SDK. The most worrying one IMO is actually AppsFlyer - I doubt they have the resources to properly protect the data they're collecting.
It might be more effective to go after the companies providing the SDKs rather than individual apps, to have a real impact. But OK this was for a news story about TikTok and that's what readers can relate to.
Going back to Google's knowing and willing abetting of hardware ID abuse.
I heard it many times that Google knows that a lot of Chinese companies violate their play store policy against using hardware IDs for advertising purposes.
If they stand against it, why they added APIs for accessing them in the first place?
I thought about that for a long time, and finally it struck me: Google very well knows that GDPR prohibits them IDing people if they refuse, so they just added those APIs to let other companies do it for them!
I would lightly recommend avoiding these addons and turning on fingerprinting protection in Firefox instead.
> about:config
> resistfingerprinting -> true
> webgl.disabled -> true
Firefox's fingerprinting protection will block canvas fingerprinting by default (put it behind a prompt). It will also spoof your installed fonts. The second setting I listed will take care of webGL, although it won't be behind a prompt, so it's annoying to re-enable for the (very few) sites that need it. I think resistfingerprinting also handles audio fingerprinting, but I'm not completely sure. I do know it reduces timer precision and a few other things.
Firefox's fingerprinting tools are being uplifted from Tor, which means people who care a lot about this and have a lot of experience in what actually helps are working on it. I am cautious of random extensions even if they're not malicious; it's very easy to get this stuff wrong and accidentally open up a new fingerprinting vector instead.
If you are going to install a new extension, install UMatrix and block Javascript by default. That won't help you with a site like Tiktok's since you'll need to turn Javascript back on for them. But a nontrivial portion of the web works without Javascript, and it really does reduce the number of attack vectors you have.
I keep my extension list very small: Ublock Origin, UMatrix, DecentralEyes, and HTTPS Everywhere. Extension sandboxing is very bad, so I don't like to install new extensions if I can help it. Firefox profiles are very cool and potentially very useful, but I don't currently use them. Maybe that will change in the future.
> I would lightly recommend avoiding these addons and turning on fingerprinting protection in Firefox instead.
A word of caution, this will tell all websites you're on UTC so it will break several things (fitness trackers, github reports, etc..). If you're banging your head against the wall (like I did) because you can't figure out why everyone thinks you're on UTC, it's because of this.
That's true, it can be really annoying, especially when you don't know the cause. On the plus side, it means I've now picked up a new skill that I didn't originally need or want.
I've also found, like, 3 date bugs in an application where I was assuming that that the server and the client would be in the same time zone.
I may be wrong, but these add-ons seem shady to me. The website pointed to as the homepage is an add-ridden site that is supposedly a community for open-source development, but all the links to 'fork me on GitHub' don't go anywhere. I can't find the source at all. This topic comes up often but these have never been posted and the add-ons have relatively few users (4k, 1.5k, 2k and 0.7k).
I don't know anything about browser add-ons. Perhaps someone who does could take a look and see what's up?
I am extremely cautious about installing any sort of browser extension; especially ones that request such intrusive permissions that could so easily be severely exploited with any given update.
Particularly an extension that wants access to all websites an d their data. Since the browser has very open permissions from the firewall, and these run under those rules it is open season for an extension dev to send out data.
The problem with that, though, is that you are one of the few people doing that, which makes you easier to target, which defeats the purpose.
I guess a real solution would have to come from a collaboration between all browser vendors, like always returning a fixed value when these tricks are detected.
These sorts of features are added to JS because they enhance user interaction. The audio API is an obvious example, and one of the big reasons canvas is used is that it has good 2d performance and lets you draw arbitrary stuff easily - good for game. You can't decouple this from the hardware; audio will need to go through the sound card and removing GPU acceleration from canvas kills it. The only thing that can be done is slightly fuzzing inputs the program collects (the saved PNG or the listened audio) - but this again interferes with real use cases, such as web audio editors and image editors, where this method forces lossy editing and importing.
Already I have to use another browser to use games or other 3D content because I have WebGL disabled in Firefox, and for 2d stuff the resistFingerprinting option reduces timer accuracy, which messes with game loop timing and makes the game totally unplayable. Making audio and video even harder to use is not good.
The only solution is to not allow this stuff to happen in the first place by using a permissions system. Adding more permissions popups is bad, so permission needs to be implicitly allowed by the user. Autoplay is a good example of this, calls to play() are blocked unless the call is being made in response to a click handler. Permissions for really recording audio/video through a webcam/mic also work well.
> You can't decouple this from the hardware; audio will need to go through the sound card and removing GPU acceleration from canvas kills it.
What you can do is require that the hardware produce bit-identical output for given input no matter how it's actually implemented, a bit like how HTML5 exactly defines rendering for any stream of input characters nowadays. Sure, this level of exactness might impose a performance cost, but it would improve privacy.
OTOH, you still have timing fingerprinting, so maybe you just can't win.
I'd like properly anonymous stuff as well, but I agree and understand you argument. Given the performance hit we'd need to take to accomplish that's it'd be almost akin to paying a tax or protection fee in terms of cycles to avoid compromise - "It'd be a real shame if someone fingerprinted you..."
It's part of the reason why I get excited about toolkits like unity that remove the direct hardware interaction, but so far the limitations they impose always have come with serious costs as well.
It's an arms race, and the problem is that there are two groups of "bad guys."
One group of "bad guys" are the ones who want to track what you're doing even after you've cleared your cookies or track you across different domains or different apps. I would argue that this group is the smaller group, but browsers have been taking actions to make their jobs harder, including the "Facebook fence" that Firefox implemented.
There's a second group, which is the cybercriminals (or maybe just grey-market). Think Russian bot farms and purchase fraud bots, but also the guys who sell clicks and followers and rankings, and the ticketmaster bots and black friday bots and such. Those guys do their business by creating thousands or millions of fake accounts, and then funneling their transactions through them. They generally use scripts and cloud farms (sometimes even physical device farms). So app makers and websites need a way to detect when they're being attacked by one of those guys, and the way they do it, is more and more, through fingerprinting.
So every step browser and OS makers take to make the first group of "bad guys" jobs harder also makes the second group of "bad guys" jobs easier, because it makes it easier for them to pretend to be legitimate users.
Haha, true. If Sundar ever decided to screw somebody and was willing to use all of Google's data without any care of the legal ramifications, that person would be broke and homeless within the hour. For a lot of people, probably in jail within a couple of days.
That's why I consider Google, FB, and these other surveillance-capitalists data-Superfund sites.
There are massive amounts of jail sentences, international incidents (probably including potential wars), murders, divorces, and destroyed relationships in those silos just waiting to come out.
>Tiktok is breaking the law in multiple ways while exploiting mainly teenagers data. This should be regulated quick and rigorous. We have all necessary laws. Don’t let them break society like 10 years of FB. Journalists should find a better place for vertical video.
Well done. I think that after this research, journalists will better understanding of how TikTok actually breaks the law and they can cover this story using the information from this article as a reference,.
privacy.resistFingerprinting is even better since it covers more than just canvas fingerprinting, but anyone who uses it should be aware of all of the side effects.
The only side effect I really notice in everyday browsing with privacy.resistFingerprinting enabled is time. It spoofs the system time zone as UTC which makes it a bit confusing looking at things like sports and TV schedules.
I'm using a 1920x1080 monitor with default panel sizes. So are tens of millions of other people. I totally understand fingerprint resistance, but the maximization restriction is silly in a lot of cases and should be controllable separately.
Exposing the window size is needed for many festures. CSS media queries change what's displayed based on the screen size, and pure css can cause effects that can be independently measured (set a css property and the read it with JS and log the result, or have the css load a background image with tracking data embedded in ths url). Webapps that manually position elements using JavaScript use the API as well.
It's also he extremely hard to prevent any way of getting at it. For example, I could measure if a line of text overflows and by how much.
CSS media queries don't get handled by anything that has the capability to send that information back, so window size reporting isn't needed for that.
The only place that needs it, are those JS apps that manually position items.
Measuring text overflow is only possible by the APIs exposed by the CSSOM set [0], which also happens to include the window sizing elements. If we only allowed a subset of that group, all those problems might evaporate or become extremely difficult to successfully use.
You could add as many of these media queries as you like to increase the resolution of your tracking. Combine this with the min-height media query and you can get the absolute size of the view port.
That's still more finite than what JS can report, and can be easily broken by browser prefetching. (And could have performance implications that could drive users away.)
Whilst it's better-than-nothing fingerprinting, it is still far less effective than JS having access to the window and height properties directly.
Notice the file left in there by a previous explorer of the internet. This gentleman has even found RCE exploits with TikTok, and they simply do not and will not respond to their security line. Olivia Newton at NBC I believe even reached out, and she could not get them to get back to her. I forwarded this (really another bucket of equal content still lurking out there) several times to Brian Krebs but he never responded. I only mention his name because it has happened before and find it somewhat damaging to the community and it needs to be called out (with fill acknowledgement that journalists get hit up by PR ppl all the time and it’s a tough job).
Just because the bucket says "tiktok" in the name, doesn't mean it's in any way associated with them. This appears to be ~1000 videos, anything particularly interesting about it?
(My company gets many such reports; sadly researchers often strongly insist otherwise)
It is associated with name. A domain held by them had a link record that pointed to this bucket. Also previous acquisitions of TikTok have buckets, currently open, and those have metadata which shows ownership.
Either way, TikTok won’t even respond, which is very sad and absolutely deserves a response so the few researchers don’t have to waste time following up.
NOTE: Your point makes sense though and I’ve run into this before. 100% agree with you and I should have mentioned the anchor link found.
Do you want to email me? You can find it if you have SecurityTrails subscription, but that is like $500 a month also. I assume you mean the CNAME record right??
The other day I saw my young nephew on Tiktok. There is this thing that happens once in a while when you see two pictures of fashion and the child has to point with its finger which one they prefer. This thing goes on for a while. I found it weird and was wondering if other also noticed this?
Just don't use TikTok. The whole purpose of this app is to gather a huge amount of behavioral data (about various themes and topics and how you react to them), enabling China to do political message targetting on par with what Cambridge Analytica did.
I'm curious what device data is sent from the app. Are they using any private APIs to extract data from mobile users as well? That would be even more insidious, since it's hard to analyze that.
I tried decompiling the Android version of the app, but I'm not a mobile dev and don't know where to look to analyze its data collection behavior.
I also considered using mitmproxy (like the OP) to analyze transmitted data from the app on my phone, but I'm on a university network that blocks inbound connections to devices (so I can't connect to my laptop from my phone). Hope somebody else can publish an analysis.
If you’re interested in exploring this, the university network shouldn’t be a problem if you create a private network between your laptop and phone and connect the phone to the internet using NAT on the laptop. The Internet Sharing feature in macOS makes this pretty easy.
Yes, they don't give the data they harvest to an authoritarian government that already has proven they will use said data to target and ID people for arrest/detention/torture.
I was answering the question about following someone in the other silo (not possible), not implying that the siloing makes data exfiltration absolutely impossible.
Note however that for the Chinese silo, they can just open the front door, whereas doing the same for the international version would endanger their profits. Profit is an incentive the CCP responds to very well. For the surveillance agencies, hacking the database is probably easier than setting up an official channel that too many people would have to know about and agree to.
"They draw an image in the background using vector graphic commands. Afterwards they save the image to a rasterized PNG. This data is quite unique among different devices depending on settings and hardware."
Why the fuck does this still work? People are complaining about all those websites that use it, but ignore the fact that it can be mostly fixed by changing 2 applications (Chrome and Firefox).
Well, I don't know the details. But AFAIK canvas drawing is GPU accelerated.
So I would guess you are effectively fingerprinting the combination of browser and GPU. And that does not sound like its easy to fix on the browser side.
How would you make it not work besides disabling canvas altogether? It is the staple component for interactive graphical stuff in the browsers. I work with it extensively for interactive content and I very frequently read pixel data from it. Let's say I'm going to blur an image, I need that pixel data. The fingerprinters do the same thing, but the pixel data gives something unique about your GPU.
If anyone wants to work on a project that works against this in a completely different way, shoot me an email.
We are growing a community of people who don't trust our fragile governments to figure this out, and instead want to democratize these tools and level the playing field rather than attempting prohibition yet again..
Does this violate any of the Google or Apple's Play Store TOS? People are calling for the government to step in but a quicker move would be to remove them from the app store; at least if there's a violation.
Furthermore, sending PII data to a non-EU country is also allowed under GDPR as long as the company in question obeys the GDPR rules. Like I said, those rules are complex, and there could very well be some technical violations by TikTok, but that's not demonstrated here.
Browser/device fingerprinting for anti-fraud is a well established industry practice. Browser makers don't like this practice and have taken steps to make it harder, but the truth is that it's used across the industry.
The open source license violations could be actual civil, but not criminal, violations. TikTok does maintain a list of open source licenses here: https://www.tiktok.com/legal/open-source?lang=en It looks like it's only for its app and not its website though. Violations of the MIT/BSD license by using a npm package and forgetting to include it in the documentation, unfortunately is pretty common across the industry. That doesn't make it right, and we should hold big companies to a higher standard of compliance, but if anybody wanted to make a complaint it would have to be the copyright holder.
TL/DR: I don't think the author demonstrated anything illegal here or out of line with normal industry practice. You can argue about the morality of certain industry practices (like fingerprinting) but TikTok is far from an outlier here.
As someone who worked in a field that necessitated some significant anti-fraud measures nope to
> Browser/device fingerprinting for anti-fraud is a well established industry practice. Browser makers don't like this practice and have taken steps to make it harder, but the truth is that it's used across the industry.
If it's that important to you switch off of the web into an App, require sign-ons against an internal system for authentication and policy people actively. Falling back on fingerprinting is a BS excuse used by folks that want to minimize user barriers and maximize the profits they're extracting by push authentication and identification off onto public resources - it isn't ever necessary and it isn't okay.
I bet you've never worked on an e-commerce system then because none of your suggestions work against e-commerce fraud, and you'd literally lose all of your money:
switch off of the web into an App: Can't just shut down your website. Also, device farms and VM farms are super common so it won't even help.
require sign-ons against an internal system for authentication: Sure, you can require your users create an account. Accounts can be created by the thousands by bots. Even if you use captchas, captchas don't work, and even if they did I can find you 100 people who will sign up for accounts manually and sell them to you for 10 cents a piece.
policy people actively: I assume you mean police people effectively. Kind of hard to "police" your customers when a huge percentage of them sign up once, buy something, and maybe only come back 3 years later. In the meantime, their super simple passwords may have been hacked and leaked 10 times already. Maybe you should require your customers all use 2FA. Let's see how many customers you have remaining once you turn that on.
But here's the thing - it does cost money and customers to properly authenticate people. 2FA will lead to less sign ups but it will give you a more secure user base - in a world where DAU is the number to live and die by then security is compromised in order to help float that DAU stat. For sign-ons collect a per account activation fee or subscription fee - if your goal is to only have real users then enforce that with money, if your goal is to allow people to freely browse your site unless they're abusing your site then yea - that's where fingerprinting comes in, and it comes in because that isn't a solvable problem. If you want to know who your users are you need to be upfront about collecting that information securely and if you want any old joe who gets a link to immediately get sucked into browsing your site and looking at ads then just stop basing your business off of dark user patterns - deliver value, charge fairly for that value, realize that lots of potential business ideas would never be profitable because people simply can't be bothered to actually put out money for that service.
This whole fingerprinting debacle is part of the ad-support web assumption, and the assumption that websites can be entirely ad supported is false outside of exceptional circumstance and certainly highly limiting and concerning for free speech - expecting a business to be ad supported, that's pretty much an impossible dream, we're living in a bubble where advertisers and marketers continue to sell lies about the ratios of converting views to actual sales.
Like many things in life, the fight between good actors and bad actors online is a state of dynamic equilibrium.
Everybody loses money to fraud, but as long as they invest enough money and resources they can keep those fraud losses to an acceptable level. Because criminals are infinitely creative, the problem will never be "solved." There will always be new moves and new countermoves.
Total security is an illusion. Everyone who's worth hacking or worth defrauding will get hacked or defrauded sooner or later. The people who have made the necessary investments are able the contain the damage. The other ones, or the really unlucky ones, end up dying off.
Total security is possible but unrealistic. In our modern world we have terrible baseline security, we can do better with some trivial adjustments that the market is countering with a strong disincentive because we as a society haven't placed a clear value on security (outside EU where GDPR has flaws but is an attempt to reward good actors).
This is essentially equivalent to a tragedy of the commons mixed in with a race to the bottom - companies are currently penalized for practicing good security, they are voluntarily accepting lower profit margins in exchange for something nobody cares about, they're also losing access to some supplemental revenue through reselling customer data. If we add decent incentives and make it economical to follow a "good" path we can increase our baseline of security, hacks will always happen but we can minimize the costs of those hacks and their frequency with best practices.
Heck - my standard line with companies w.r.t. PII is that "Your proposal is essentially to collect everyone's alarm code into your safe, your safe has gone from something nobody is interested in to something that, if compromised, could lead to a bunch of people being burglarized." the issue is that over-collecting PII and then, shucks, losing it in that completely unavoidable security compromise, doesn't lead to appreciable punishment for the company - in the real world it sure does (if your locksmith copies your key an extra time then gets burglarized and the burglar uses that extra key to burgle your house the locksmith is absolutely liable and may be found to be a conspirator). It's anomalous that these two worlds are in contrast.
All that said, I absolutely agree that it's a balance and there aren't super simple answers here, but it's important to reject the thought that being as vulnerable as most businesses are is acceptable.
I'm the author and even if I know GDPR quite well as professional journalist I know I can't interprete a law on my own, so I additionally asked an legal expert in this field. So this is what he explained:
1. Sending data to Appsflyer is OK in general, but you have to declare to which parties the data will be sent afterwards. As most of the partners will be joint controllers of the data you have to lay open the arrangements with ALL joint controllers: "The arrangement may designate a contact point for data subjects. The arrangement referred to in paragraph 1 shall duly reflect the respective roles and relationships of the joint controllers vis-à-vis the data subjects. The essence of the arrangement shall be made available to the data subject."
Tiktok just denied to show any arrangements.
2. Sending data to Facebook is ok in general, but here it's without consent, so it must be covered by legitimate interests. This has to be balanced with the interests of the user and two crucial points are how invasive and how transparent a data procesing is. Sending your search terms to a company you don't even know it's involved and de-anonymising you in the same time (in case you have a app of facebook inc. on your smartphone) hardly can be legitimate interest.
3. Sending data to a non-EU country is OK, but only if the country is secure. This is indeed complex, but: ECJ ruled, that if public authorities have access on a generalised basis to the content of electronic communications, it's not ok, it's even a FUNDAMENTAL violation of the privacy of the users.
4. Browser/Device-Fingerprinting is legally ok. But I doubt that it is used for anti-fraud/security. If it's used for tracking, they probably need consent.
Even though these practices may be legal you'll agree that 99% of the people here are not ok with them. It's not needed to send personal data around the world for tracking. The core functionality of tiktok doesn't need that at all.
People on HN don't complain about TikTok sending data to Facebook primarily because TikTok is Chinese, but because Facebook is evil.
People on Facebook may hate on TikTok for being Chinese, but I wouldn't know because I don't have a Facebook account. (I don't have a TikTok account either, so it doesn't actually matter much to me whether they snitch to Facebook. But it's about the principle of the matter.)
After seeing all those sketchy or even fraudulent mobile ads by TikTok's company, I won't be surprised if that's a bait.
But pretty much everybody does it in China, Baidu etc, like "You phone has 8GB of garbage, download us to clean it", "Download us to boost your signal by 4 times immediately", "This cutie just sent you a message, download us to repsond", basically anything to make you download their apps, and only political problems go punished.
Interesting although I'm having trouble unpacking GDPR discussion from what the app actually does on a granular level and what that could mean for privacy. GDPR is not exactly how I think of those things.
As more time passes I begin to fundamentally believe the modern Internet isn't compatible with the GDPR or privacy as a whole.
Simply the act of enabling JS within the browser is enough to have your privacy violated in thousands of different ways and data sucked up by everyone who wants it.
Simply by installing an app on your smart phone you invite SDK's that are happy to report back all the information the OS freely allows access to because why not? Data storage is cheap and collecting that data is free of charge.
But yes, data about children is collected in the millions, and the truth is there is no possible law that can prevent this from happening because it will happen anyway. One example: If FB detects a baby photo you upload, should it be deleted? I mean that baby cannot possibly consent, and you'd have thought the GDPR or some law meant uploading baby photos is impossible, but that's not the case, FB/Google WILL perform facial recognition on that baby.
Your data will be processed, used, sold and manipulated for as long as you generate it.
The GDPR helps, a little, in some ways, but it's really had very very little effect overall (apart from some damn annoying "we respect your privacy" pop-ups on websites).
If the GDPR was serious, it wouldn't be possible to collect this data at the OS level, like, at all, JS would return nothing, Android apps would return nothing (or fake data, at least).
But the GDPR is not serious, at least in some ways.
Hmmmm, I think it's a bit of an unreasonable expectation to think technology should prevent companies from breaking the law.
Most meatspace laws are expected to be followed despite there being little to no physical barrier to breaking them (e.g. there's no physical barrier preventing me from throwing a brick through a window, but it would still be vandalism).
Why should we expect laws in software to be different? Why should the burden of compliance be shifted from the individual/company to the infrastructure?
Crime isn't automated. Someone has to write the script that commits the crime (or I guess ultimately design the AI framework that generates criminal bots). At the end of the line there's still a human.
> The GDPR helps, a little, in some ways, but it's really had very very little effect overall (apart from some damn annoying "we respect your privacy" pop-ups on websites).
That's because—as far as I can tell—the EU has not become serious about enforcing the law. At least not yet.
It is absolutely possible to pass a law that says "you can't track people", and that's what the GDPR does. It has a semi-loophole for people who explicitly provide knowing consent to be tracked, but there are several big caveats—it must be opt in, you can't trick people to opting in, and you can't punish people who don't opt in. (And really, after all of those caveats, what percentage of your userbase will agree to be tracked?)
Unless there's some aspect of GDPR which I don't understand—and please educate me if there is!—95% of the "cookie notices" currently on the web are obvious GDPR violations.
(working in the field and away from my computer so using a throaway)
Basically you have two different laws that apply here. Eprivacy and the GDPR:
* Eprivacy has been updated in 2009 and says that you need consent of the user for any read/write operation in the user terminal, unless it has been specifically requested by the user. The wording is strange, but it has been interpreted mostly as being about cookies and any other sorts of tracers. That includes fingerprinting. The thing is, it is a directive, so each European country transcribed it slightly differently! Sometimes, the national DPA (data protection agency) does not even have the authority to apply this law. This is the infamous "cookie banner" law. Note that all login cookies, cart cookies, consent cookies are generally regarded as exempted since they are needed to provide the service asked for by the user.
* The GDPR that everybody knows.
The thing is, since the GDPR, the consensus among DPAs has been that the consent used in Eprivacy is in fact the "GDPR consent" (freely given, no negative consequence, easy to withdraw, requires a positive act, and so on). This is a major change, because most cookie banners used now don't have those characteristics. Keep on scrolling to consent won't fly. "I consent"/"more option" buttons won't fly. "Use you browser settings to block cookie if you don't like it" won't fly. Cookie walls won't fly.
This is the current interpretation of the law among most DPAs. And yes, most website are violating it, but in the sense of Eprivacy, not the GDPR. The non harmonization on the European level makes it harder to engage in repressive action but I have faith that this will soon change.
In the case described by the article, you would need a GDPR consent to carry on fingerprinting, so TikTok is in violation of EPrivacy. When DPAs will start repression, they could be targeted for that.
Sorry about that. They are violating Eprivacy, not the GDPR, because they are dropping tracers without proper consent. But they are violating it because the consent they are collecting is not valid in the definition given in the GDPR. The previous understanding of Eprivacy (before the GPDR) admitted a "soft consent", ie "keep reading and you consent". That is not longer the case because of GDPR.
What they do with the data they collect is subject to the GPDR, but the use of tracers without consent is subject to Eprivacy.
Are cookies violating if they don't leave the website? It doesn't seem to be a problem as long as the cookie is only used within the context of your site. It's when they're used on other websites that the tracking capabilities exceed what you can otherwise glean from the server logs.
Plus, they're kind of important for sites that provide logins, or have shopping carts, or a variety of other legitimate uses for cookies.
By the same logic, are cookie violating if they are stored _somewhere else_ but the service provider are not selling those cookies to other people? what if the service provider have another service provider that provide analytic services to help with serving the website in question?
What's the difference of say, a dedicated cookie storage service company storing the cookie vs. an internal IT team that builds the wheel and store the cookie with analytic services? Especially when the usage of the cookie in both case being limited to the site in question?
This seem to negate the entire idea of programming - do one thing well, and have another thing to another thing well. Which applies to the business world at large.
Cookies aren't inherently against GDPR if they're used explicitly for necessary site functionality—you don't even need to tell users about them in that case.
So why display the cookie warning? I assume it’s an attempt to obtain “consent” to something that would otherwise be prohibited by the GDPR, and in relation to consent the GDPR says:
“Consent should not be regarded as freely given if the data subject has no genuine or free choice or is unable to refuse or withdraw consent without detriment ... Consent is presumed not to be freely given if it does not allow separate consent to be given to different personal data processing operations despite it being appropriate in the individual case, or if the performance of a contract, including the provision of a service, is dependent on the consent despite such consent not being necessary for such performance.”
I agree with the grandparent that many cookie warnings seem at odds with the GDPR in this respect.
Arguably any programming language allows companies to break the law. It's not a JS thing, and I feel like there is no going back from the world of web applications.
I think it can only be fixed with legislation. There is too much attack surface to the point that avoiding tracking is hard even if you take extraordinary measures.
The GDPR IS the legislation, at least, I thought so.
If you're serious about this, the network shouldn't be making these requests unless you've explicitly allowed it (meaning the request is blocked at the network/OS level).
I've always assumed I have zero privacy on all social media apps. I am more worried that if the next few major social media apps are all Chinese, then the Communist Party of China will control what the world sees and believes.
They seem to only be able to do this by copying existing platforms that were successful. TikTok is essentially a carbon copy of the discontinued app Vine. I cannot imagine that the pool of discontinued apps that otherwise would be runaway successes is large.
GDPR allows you to capture logs as long as there's some reasonable business case explanation. Security is the easiest because you can easy all identity tracking is for catching fraud, bots or hackers. The explanation can always be hand-wavy and legislation is not specific on the details.
Even without any analysis it's quite clear what a company based in Beijing who censors everything on the party's whim is doing, though reading about the methods of fingerprinting was quite fun.
Aren't they censoring all political content, not only the china based political content? At least that is what i was told/remembered not sure which source it was.
Recently, I can't help but think this way everytime I see controversial comment/post in HN and reddit.
Maybe TikTok is not collecting data and operating ads in any way worse than Facebook or Google but we can expect companies from democratic countries to face some level of scrutiny from free press and government accountable to people. The thought of Chinese government having access to companies with reaches like Facebook/Google to global population worries me.
If I could prove I'm not associated with China in any way, and in fact don't like what they do, could it be proved that the wave of Tiktok-related posts has nothing to do with Facebook/Google?
I'm 99% sure that you are not affiliated with the Chinese Government. I just wanted to show that accusations like these are not really helpful. It's not the responsibility of Goole/Facebook to prove that they aren't sponsoring bad faith anti-TikTok articles. It's your responsibility as a commentator to show at least some proof (beside a convincing motive). I think the bar for that proof should be rather low, but there needs to be something. Otherwise we are switching from a commenting system based on a reason to a commenting system based on feelings/opinions.
"Personal Identifying Information (PII) is transfered to a server that is under control of a company in an unsecure noneuropean country. The server location doesn’t count, it is about where the company deciding about the data resides."
I find problematic that such a restriction is in place, what of PII was sent to a European company, is that a problem? What if the ownership of the company changes?
This kind of data localization requirements go against the concept of an open internet.
I posted the following to /r/ but it was removed... (hmmm)
There was a statement that was made that China has been seeking to build the largest face recognition db... (obv FB has that embedded not only in their name, but their userbase -- and what China wants to do is compete with FB on this front for their own means...)
---
TikTok is a face recognition harvesting platform WITH sentiment!
Hear me out.
So TikTok is literally focused (on multiple levels) of the users face being in a very contrived space and detail - its largely wide with younger ppl... however
> They also use audio fingerprinting to identify visitors. This doesn’t mean they actually use your microphone or speaker. Instead they generate a sound internally and record the bitstream, which also differs from device to device.
This really blew my mind. Correct me if almost all of them are doing this. If it is so, the congress hearing last year, all those privacy suits, all went into vain didn't they. (PS. bad english)
:/