I found a demo[0] via this old forum thread from August[1].
Obviously there are privacy concerns. That being said, this looks like a boon for anyone interested in bot detection, as you can periodically challenge your users' humanity without getting too much in their way. Nice one, Google.
From the thread:
Implemented it successfully for a website. I have to say, it works great!
it also checks if html pages are changed at runtime and how many times you "reload" the page where the captcha is. When it thinks you are a bot a captcha popups, when entered, it got checked on googles servers if it's right and fills in a hidden input. When the user submits the form, the filled in captcha coded, again, will be verifed. [sic]
"Since it goes through Google's servers, they can verify a lot of things. Whether you are logged in currently to google, have you been logged in the past, verify your activity on your IP address, etc. Even if you signed in from the same ip or ip range like a year ago, they can still tell it's you based on your previous actions."
The normal captchas have been getting increasingly user-hostile over time. The only limit on them is what users are willing to put up with, and now that Google's most profitable users don't get them that's less of an issue. In fact, having nearly unsolvable captchas is actually an advantage because it encourages users to let Google track them.
No, this is likely done with machine learning trained on real vs fraudulent user data. So they are going to be watching for much more subtle features than just being in a different region. Tons of people travel all of the world. Less people manually resent their MAC addresses or use datacenter ISPs.
Beware: that second link launched a popup in my browser to a "Super Mario Game" which, in turn, pushes you to install a spammy Chrome extension called ArcadeYum.
Why does Google bother with so many minor script-related security enhancements in Chrome that will barely affect anyone (such as extra HTTP headers allowing for bonus layers of XSS protection just in case the site's developers weren't smart enough to cover all possible injection angles) if they are going to also let random untrustworthy developers abuse their extension installation API to achieve over 750,000 installs of a mysterious/shady/useless browser extension that inexplicably asks for permission to read and write to the DOM on every single page of every single site the user ever visits in the future, and which very obviously only exists for the purpose of doing the exact same kinds of terrible things that XSS prevention was conceived of in the first place in order to stop?
I'd personally love them to do that. I guess the arguments are basically the double-edged sword of dictatorship. You have a paradise if the ruler is wise, just and benevolent, as you can escape pretty much all of the stupid coordination problems that pester democracies - but on the other hand you risk getting totally screwed up if the dictator goes evil (which can, and probably always will, happen over time, when a good dictator gets succeeded by a bad one).
Thanks for all the evidence, but Microsoft's primary revenue stream isn't advertising, and Facebook is getting success by suing spammers that commit fraud against Facebook.
This seems to be following Cloudflare's (and Incapsula's and all the other competitors) approach to bot detection. Basic automatic, silent bot challenges (non-invasive Javascript and DOM tests) which, if failed, give a one-time captcha prompt.
The Tor browser doesn't block ads. Just javascript and flash. His point is the internet is becoming increasingly hostile to privacy. It's already extremely difficult if not impossible to create anonymous accounts with tools like Tor. Which discourages things like whistle blowing, or people from areas with oppressive governments.
The problem is that every website does the same thing, and now it's impossible to use the internet anonymously. But actual spammers can spend a few bucks on IP rotating services. IP discrimination causes far more harm than good.
Actually I don't think Tor disables javascript by default anymore, but even when I do disable it I still see ads.
I think people don't fully appreciate what you mean. I do. This site is literally information-free. It contains a picture of a yawning cat and two buttons that do nothing. This is on chrome/iOS.
I mean this with utter sincerity: try tapping harder. I too was confused by the landing page and its silly two button-like things that don't work. Then I tapped it again, harder, and the link followed.
It's a trick I learned from industrial SCADAs. Sometimes buttons simulate how a literal contractor works by watching for down-debounce-up events. No idea if that's the case here, but it seems to help to dwell slightly on the button to let the event thing really catch the event fer sure.
I had the same problem. Requiring users to modify their clicking habits seems excessively onerous -- especially to "prove you're not a bot." The fact that you had to explain to me why a bloody web button didn't work proves the point; it's like throwing out the baby w/ the bathwater.
Could this be a geolocation thing? I'm in the UK here and I see a picture of a cat, and two buttons, one of which is to redirect to the old recaptcha website, the other is a useless "sign up for more information", and there is no actual information on the page.
It sounds like others are seeing something different however?
The site thought you were a bot (probably because you were browsing privately), so rather than assuming you are a human, it decided to challenge you traditionally.
So google-users are non-bots and non-googlers are bots? That sounds extremely poor. Especially considering you solved a captcha to create that same google account.
In 99% of cases, whether or not you're a bot can be determined before you interact with the box. It's mostly just a vector to download the script that does the actual detection based on mouse and keyboard patterns while you're using the rest of the page.
They absolutely can. This is what bot developers did in Runescape after Jagex implemented the same system. It blocked bots for a few months, but after that it was pretty much useless.
By now the helping-out-with-OCR part of ReCaptcha is entirely unrelated to the actual captcha. In some cases now you're just identifying street numbers for Google Maps.
If captchas got simpler, you could still do Mechanical Turk jobs if you wanted to.
The easy house number tests are for users with established sessions because the system already has a high degree of confidence you are a human. Delete your cookies or use an incognito window and I expect you'd see the traditional captcha with two words.
It used to be there were two variables one that was uncertain to the machine and the other that was known. The known variable was used to ensure you weren't making up your answer.
But for at least a couple of months now I haven't seen a two variable captcha. I can only assume every captcha has been solved and verified to a reasonable degree of certainty. If Google, who is probably most able to benefit off of captcha solutions is willing to move past it I can't really argue with them.
I think a lot of the single-challenge CAPTCHAs they're using now are house numbers on Street View. Even if Google doesn't know the exact answer, they can rule out a lot of wrong answers (e.g, the number entered is on the wrong side of the street, or is completely out of range for the current block).
Also, if you look really suspicious (particularly if you fail an easier CAPTCHA, or ask for lots of challenges), Google will still give you one of the old two-scrambled-words CAPTCHAs. Except both of the words are ones they know the answers for, and you'll have to enter them both correctly to pass.
Do we get anything back for that?
I mean, millons of humans contribute to solving captchas, and does google give anything back (like the OCR tech they're developing) to the community, or does it end up as part of some propietary product of theirs?
We've been running the beta of this captcha on https://account.oneplus.net/sign-up and while it's certainly a much better experience we also still do get some spam sign ups.
I'm not sure if these are manually solved from people hired to just solve captchas or if perhaps it's a bit too lenient. Ultimately I think the improved usability is more important than spending a bit more effort deleting spam.
It's likely manual signups by people paid to do captchas. It's a thing. A kind of large thing. When you run a website with a public forum with a couple million unique visitors a month, you get familiar with it.
There's an annual contest for spam community. It's organized by Botmaster, the producent of Xrumer software.
Basically, whoever answers most of these questions correctly, earns $15,000. People submit MILLIONS of answers.
This list is then incorporated into the Xrumer itself.
I can tell you that these things are easily broken:
a) Answering questions and building global list of answers, as in this case.
b) Reading image captcha - spammers send it to Pakistani manual solvers for dirt cheap.
c) More complex puzzle captchas can also be broken in software if a lot of websites implement them.
What works the best is using non standard html form field names. Also, try to not use text labels for the fields ( no "password", "captcha" etc. ) - because the software will try to match the best field by text surrounding it. It is better to use image for labels.
This solution will stop spambots because they simply match form field names. Unless someone specifically targets your website, in which case there's not much you can do.
From the most common of captchas, Mollom seemed to be the biggest pain in the ass for spammers. Mainly because it banned suspicious IPs ( you could solve the picture correctly and it wouldn't authorize it. )
> What works the best is using non standard html form field names. Also, try to not use text labels for the fields ( no "password", "captcha" etc. ) - because the software will try to match the best field by text surrounding it. It is better to use image for labels.
Except browser autofill breaks and anyone who needs a screen reader will go elsewhere if the screen reader can't parse the images.
That works on a small scale. But, when your site gets enough visitors, it is worth it to a spammer to spend five, or ten, or fifteen, minutes figuring out all of your questions. So, it's a potentially useful trick for sites only targeted by very dumb bots. But, the bigger your site, the less likely it is to work. My sites aren't even that big (anywhere from 40k to 250k visits per month), and this technique isn't tenable for most of them. I do use it on our wiki (mostly as an experiment, since CAPTCHA isn't as accessible as I'd like), which is the 40k visits/month site, and it needs new questions every couple of weeks, or we get hammered. And, all of our sites have other mechanisms for preventing spam, as well.
"Either one or probably infinity, depending on whether the universe is infinitely sized and whether unobservable parts of it are considered to 'exist'."
It varies greatly on the project you're working on. There are people out there trying to automate account creation on web based email's for the sole purpose of email spamming (same with social networks). And with complex botnet proxies and the TOR project, false accounts are getting harder to detect.
They have created interesting methods to combat Captcha images. They even have outsourced OCR services, where people in other countries are paid to solve ocr 1 image at a time. [0]
It doesn't work. All you have to do is solve 1% of these questions manually, and voila, you got a bot that can spam, all it has to do is load the captcha 100 times on average.
That's the hard thing about captchas, bots don't have to solve them perfectly to completely break them.
OK, so what Google is pushing is something where they track lots of stuff about your web site in exchange for a CAPTCHA that looks like every other CAPTCHA. That's so Google. Everything comes with a privacy intrusion.
Amusingly, the examples they give are actually readable. Most of the time, when I see a CAPTCHA displayed, it's not a word, or anything close to a word. I've seen ink smudges, math symbols, and Cyrillic.
Besides, machine learning is good enough now it can beat most people at CAPTCHA solving. Look on Black Hat World for the software.
This is a bit worrying. If CAPTCHAs start becoming easier for "real" users (those who are logged in a Google account, run the Analytics JS, etc.) and harder for "suspicious" users (who block ads, who use Tor, etc.), it may eventually become very hard and unpleasant to be a suspicious user, and non-suspicious users will not notice it.
Cloudflare also flags every Tor exit, which makes quite a bit of the web rather difficult to visit. I shouldn't need to fill in a captcha to load a gfycat animation.
That matches up with what they claim on their site:
CloudFlare does not actively block visitors who use the Tor network.
Due to the behavior of some individuals using the Tor network (spammers, distributors of malware, attackers, etc.), the IP addresses of Tor exit nodes generally earn a bad reputation. Our basic protection level issues captcha-based challenges to visitors whose IP address has a high threat score.
From personal experience: most activity you're likely to see from Tor exit nodes is fraudulent. Absolute bottom-of-the-barrel cesspool traffic trying to probe for vulnerabilities, commit fraud, scrape content, avoid IP blocks, and generally abuse your site in ways that the attacker wouldn't be comfortable doing with their own IP address. It's really tragic - given the potential of the network as a privacy tool - that it's mostly used for evil, not good.
> CloudFlare does not actively block visitors who use the Tor network...
...Yet all visitors who use the Tor network are blocked. There's a certain contradiction here. (Yes, I know what they mean, but the end result remains the same.)
It most likely is just automatically doing it to known Tor exit nodes. Spammers use Tor quite a bit for scraping and other bot activities. Companies like Cloudflare just take the easy route and throw all exit node IPs into the garbage bin under the assumption that legitimate Tor users are used to such treatment.
google's recaptcha is not that hard to beat as it only checks one word (the computer generated one) while the other is google using you to OCR for their google books service.
Just fill one word and put some random crap for the other and you just halved the annoyance.
Not really, botters just haven't had a reason to automate it yet. If they did have a good reason to automate it, it seems like it would pretty easy to do so.
The trouble is that a well-trained AI can currently slightly beat humans on this kind of classification test, and it also requires a lot of pictures if you don't want it to be trivially broken just by enumerating them all.
You would spend more effort wording the questions than a spammer would paying people to solve them on mechanical turk, then storing the answer if it ever comes up again.
You need a huge database of images classified as dog, car, man etc, otherwise a spammer can download all your images and classify them manually. If you grab images from Google Images, the spammer can do the same thing.
Previously, reCAPTCHA evolved to show easier (street address) CAPTCHAs to users who have already passed a few hard ones. I guess the next step is to skip it completely.
You're mistaken, the reason street addresses first showed up in recaptcha is that google needed to have address numbers filled in for their street map service.
Your site needs more detail about how your system functions and compares to existing captcha systems. All I could find on your site is that basically existing systems don't work and that yours does.
I have noticed this. You get words for a while, once you get a good enough score you go to house numbers, if you fail a house number you go back to the words. Playing with the demo of it if you fail the checkbox (letting it time out is one way) they throw you back to the house number again[1]. As other people have pointed out, what they're promoting here has been used on the Humble Bundle website for a while now, I guess it's the next logical step to show "good" enough users no captcha at all. Not sure about the privacy implications of that though.
I was implementing this yesterday and discovered there is absolutely no way to customise the style / layout of the captcha. You either use the light theme or the dark theme and that's it, and it's inside an iframe so you can't manually hack the css.
The old version used to be customisable so I really hope Google adds the ability to customise this soon.
Another trivial but important oversight: the captcha has a background color of f9f9f9 but the fallback captcha has a background color of ffffff. So even if you try and style around it unless you manually detect what kind of captcha is showing and change the background color on the fly one of them is going to look off.
Honeypots and timestamps would work in many cases. There are folks who want to captcha anything because they couldn't care less about users. But then when you take a step back and question if it's really necessary, it's frequently not.
Here is a key difference from prior CAPTCHA services :
"reCAPTCHA offers more than just spam protection. Every time our CAPTCHAs are solved, that human effort helps digitize text, annotate images, and build machine learning datasets. This in turn helps preserve books, improve maps, and solve hard AI problems.
"
They're using captchas to solve text analysis and digitizing problems.
reCAPTCHA has been around in that form for a long time already. This post is about a new system they're planning on rolling out to replace that one. This one won't require any text input for (most) users.
Personally, I'm no longer solving reCAPTCHAs after I noticed that Google uses it for free labor. (Google sometimes knows very well that I'm no robot, yet it still shows a reCAPTCHA). So far it affects the Chromium issue tracker, which presents a reCAPTCHA to post more than one comment per day.
Obviously there are privacy concerns. That being said, this looks like a boon for anyone interested in bot detection, as you can periodically challenge your users' humanity without getting too much in their way. Nice one, Google.
From the thread:
Implemented it successfully for a website. I have to say, it works great!
it also checks if html pages are changed at runtime and how many times you "reload" the page where the captcha is. When it thinks you are a bot a captcha popups, when entered, it got checked on googles servers if it's right and fills in a hidden input. When the user submits the form, the filled in captcha coded, again, will be verifed. [sic]
[0] http://www.google.com/recaptcha/api2/demo
[1] Edit: don't go to this url without adblock (see comment below). http://forum.ragezone com/f144/googles-captcha-recaptcha-1023607/