Hacker News new | past | comments | ask | show | jobs | submit login
“No-CAPTCHA” reCAPTCHA (google.com)
171 points by abstractcoder on Nov 24, 2014 | hide | past | favorite | 123 comments



I found a demo[0] via this old forum thread from August[1].

Obviously there are privacy concerns. That being said, this looks like a boon for anyone interested in bot detection, as you can periodically challenge your users' humanity without getting too much in their way. Nice one, Google.

From the thread:

Implemented it successfully for a website. I have to say, it works great!

it also checks if html pages are changed at runtime and how many times you "reload" the page where the captcha is. When it thinks you are a bot a captcha popups, when entered, it got checked on googles servers if it's right and fills in a hidden input. When the user submits the form, the filled in captcha coded, again, will be verifed. [sic]

[0] http://www.google.com/recaptcha/api2/demo

[1] Edit: don't go to this url without adblock (see comment below). http://forum.ragezone com/f144/googles-captcha-recaptcha-1023607/


The key post from that page:

"Since it goes through Google's servers, they can verify a lot of things. Whether you are logged in currently to google, have you been logged in the past, verify your activity on your IP address, etc. Even if you signed in from the same ip or ip range like a year ago, they can still tell it's you based on your previous actions."


So if you are in a remote location or do not fit a specific demographic you are basically a robot.


Assuming people who don't fit a demographic are robots is a step better than assuming everybody is a robot


In that case you get a normal captcha, which is no worse than the current situation.


The normal captchas have been getting increasingly user-hostile over time. The only limit on them is what users are willing to put up with, and now that Google's most profitable users don't get them that's less of an issue. In fact, having nearly unsolvable captchas is actually an advantage because it encourages users to let Google track them.


No, this is likely done with machine learning trained on real vs fraudulent user data. So they are going to be watching for much more subtle features than just being in a different region. Tons of people travel all of the world. Less people manually resent their MAC addresses or use datacenter ISPs.


i think the parent was using "demographic" to mean "people using computers currently tracked by google", not a regional population.


That makes sense.

If I click from a normal tab I don't see a captcha, but I click from a privacy tab I do.


Beware: that second link launched a popup in my browser to a "Super Mario Game" which, in turn, pushes you to install a spammy Chrome extension called ArcadeYum.


Why does Google bother with so many minor script-related security enhancements in Chrome that will barely affect anyone (such as extra HTTP headers allowing for bonus layers of XSS protection just in case the site's developers weren't smart enough to cover all possible injection angles) if they are going to also let random untrustworthy developers abuse their extension installation API to achieve over 750,000 installs of a mysterious/shady/useless browser extension that inexplicably asks for permission to read and write to the DOM on every single page of every single site the user ever visits in the future, and which very obviously only exists for the purpose of doing the exact same kinds of terrible things that XSS prevention was conceived of in the first place in order to stop?


I'd like to hear arguments for why it would be unfair competition for Google to put spammy ad agencies out of business.


I'd personally love them to do that. I guess the arguments are basically the double-edged sword of dictatorship. You have a paradise if the ruler is wise, just and benevolent, as you can escape pretty much all of the stupid coordination problems that pester democracies - but on the other hand you risk getting totally screwed up if the dictator goes evil (which can, and probably always will, happen over time, when a good dictator gets succeeded by a bad one).



Thanks for all the evidence, but Microsoft's primary revenue stream isn't advertising, and Facebook is getting success by suing spammers that commit fraud against Facebook.


Thanks for the heads up, I missed it due to adblock. I made the link non-clickable and added a warning.


This seems to be following Cloudflare's (and Incapsula's and all the other competitors) approach to bot detection. Basic automatic, silent bot challenges (non-invasive Javascript and DOM tests) which, if failed, give a one-time captcha prompt.


Which has the side-effect of making the site inaccessible to TOR users with JS turned off.


Those people will five 9s likely be blocking ads too, so who cares?

They can enjoy my content for free no problem, but I really don't care what they have to say in regards to how I run things or have things set up.

"Fuck you, pay me" comes to mind


The Tor browser doesn't block ads. Just javascript and flash. His point is the internet is becoming increasingly hostile to privacy. It's already extremely difficult if not impossible to create anonymous accounts with tools like Tor. Which discourages things like whistle blowing, or people from areas with oppressive governments.


That is a fair point, in my experience Tor is just used by bots to spam comments up with junk.

This is probably different for larger sites of course but on our scale there's no worries blocking Tor

Edit: Although thinking about it I don't know any ads that aren't served up without some form of Javascript


The problem is that every website does the same thing, and now it's impossible to use the internet anonymously. But actual spammers can spend a few bucks on IP rotating services. IP discrimination causes far more harm than good.

Actually I don't think Tor disables javascript by default anymore, but even when I do disable it I still see ads.


If comments were blocked, that's one thing, but increasingly often access to the site as a whole gets blocked.


Isn't the example in [0] already used on various sites? I at least used it at least on the humble bundle site and saw it on othes sites too.


They seem to have done a small early beta over the summer, I guess they got in early.


This page has zero information. What am I looking at?


I think people don't fully appreciate what you mean. I do. This site is literally information-free. It contains a picture of a yawning cat and two buttons that do nothing. This is on chrome/iOS.


I mean this with utter sincerity: try tapping harder. I too was confused by the landing page and its silly two button-like things that don't work. Then I tapped it again, harder, and the link followed.

It's a trick I learned from industrial SCADAs. Sometimes buttons simulate how a literal contractor works by watching for down-debounce-up events. No idea if that's the case here, but it seems to help to dwell slightly on the button to let the event thing really catch the event fer sure.


I had the same problem. Requiring users to modify their clicking habits seems excessively onerous -- especially to "prove you're not a bot." The fact that you had to explain to me why a bloody web button didn't work proves the point; it's like throwing out the baby w/ the bathwater.


Looks like a button to click that you're not a robot: http://www.google.com/recaptcha/api2/demo


Where did you find that?

The problem is that the OP posted nothing at all.


Could this be a geolocation thing? I'm in the UK here and I see a picture of a cat, and two buttons, one of which is to redirect to the old recaptcha website, the other is a useless "sign up for more information", and there is no actual information on the page.

It sounds like others are seeing something different however?


No, they dug up the link elsewhere:

https://news.ycombinator.com/item?id=8655178


I clicked on that button, and got a popup with a regular captcha. Did I miss anything?

(As a note, I'm browsing using a private window)


The site thought you were a bot (probably because you were browsing privately), so rather than assuming you are a human, it decided to challenge you traditionally.


> The site thought you were a bot

Based on what? It assumes the same for me. There's zero information anywhere. When does it assume you're a human?


When I was logged in Google (I usually am), it didn't ask me to solve a captcha. When I tried from a private browser, it did.


So google-users are non-bots and non-googlers are bots? That sounds extremely poor. Especially considering you solved a captcha to create that same google account.


That's better than "everyone is a bot" though.


What makes that work against robots versus other things?


In 99% of cases, whether or not you're a bot can be determined before you interact with the box. It's mostly just a vector to download the script that does the actual detection based on mouse and keyboard patterns while you're using the rest of the page.


Why can't robots simply record actual humans' mouse and keyboard patterns and replay them in various combinations in this arms race?


They absolutely can. This is what bot developers did in Runescape after Jagex implemented the same system. It blocked bots for a few months, but after that it was pretty much useless.


Interesting! But while I find captchas annoying, I like helping out with the text-decoding that re-captcha does.


By now the helping-out-with-OCR part of ReCaptcha is entirely unrelated to the actual captcha. In some cases now you're just identifying street numbers for Google Maps.

If captchas got simpler, you could still do Mechanical Turk jobs if you wanted to.


The easy house number tests are for users with established sessions because the system already has a high degree of confidence you are a human. Delete your cookies or use an incognito window and I expect you'd see the traditional captcha with two words.


It used to be there were two variables one that was uncertain to the machine and the other that was known. The known variable was used to ensure you weren't making up your answer.

But for at least a couple of months now I haven't seen a two variable captcha. I can only assume every captcha has been solved and verified to a reasonable degree of certainty. If Google, who is probably most able to benefit off of captcha solutions is willing to move past it I can't really argue with them.


I think a lot of the single-challenge CAPTCHAs they're using now are house numbers on Street View. Even if Google doesn't know the exact answer, they can rule out a lot of wrong answers (e.g, the number entered is on the wrong side of the street, or is completely out of range for the current block).

Also, if you look really suspicious (particularly if you fail an easier CAPTCHA, or ask for lots of challenges), Google will still give you one of the old two-scrambled-words CAPTCHAs. Except both of the words are ones they know the answers for, and you'll have to enter them both correctly to pass.


I do not. Maybe they can make a site that is solely for the purpose of helping decode text for you and your ilk.


Do we get anything back for that? I mean, millons of humans contribute to solving captchas, and does google give anything back (like the OCR tech they're developing) to the community, or does it end up as part of some propietary product of theirs?


That sounds familiar. Apaprently Google have been using a similar technique to detect botting in Ingress for a while now.


We've been running the beta of this captcha on https://account.oneplus.net/sign-up and while it's certainly a much better experience we also still do get some spam sign ups.

I'm not sure if these are manually solved from people hired to just solve captchas or if perhaps it's a bit too lenient. Ultimately I think the improved usability is more important than spending a bit more effort deleting spam.


It's likely manual signups by people paid to do captchas. It's a thing. A kind of large thing. When you run a website with a public forum with a couple million unique visitors a month, you get familiar with it.


This stuff kind of seems like an overkill to me.

I think asking really simple questions that only a human could understand seems to get the job done most of the time.

Perhaps something like: "What is the opposite of bad?" or "How many planet Earths are there?"

I've used things like this for a handful of projects and have never had any problems: https://github.com/kiskolabs/humanizer


There's an annual contest for spam community. It's organized by Botmaster, the producent of Xrumer software.

Basically, whoever answers most of these questions correctly, earns $15,000. People submit MILLIONS of answers.

This list is then incorporated into the Xrumer itself.

I can tell you that these things are easily broken: a) Answering questions and building global list of answers, as in this case. b) Reading image captcha - spammers send it to Pakistani manual solvers for dirt cheap. c) More complex puzzle captchas can also be broken in software if a lot of websites implement them.

What works the best is using non standard html form field names. Also, try to not use text labels for the fields ( no "password", "captcha" etc. ) - because the software will try to match the best field by text surrounding it. It is better to use image for labels.

This solution will stop spambots because they simply match form field names. Unless someone specifically targets your website, in which case there's not much you can do.

From the most common of captchas, Mollom seemed to be the biggest pain in the ass for spammers. Mainly because it banned suspicious IPs ( you could solve the picture correctly and it wouldn't authorize it. )


> What works the best is using non standard html form field names. Also, try to not use text labels for the fields ( no "password", "captcha" etc. ) - because the software will try to match the best field by text surrounding it. It is better to use image for labels.

Except browser autofill breaks and anyone who needs a screen reader will go elsewhere if the screen reader can't parse the images.


There is better solution: http://xkcd.com/810/


That works on a small scale. But, when your site gets enough visitors, it is worth it to a spammer to spend five, or ten, or fifteen, minutes figuring out all of your questions. So, it's a potentially useful trick for sites only targeted by very dumb bots. But, the bigger your site, the less likely it is to work. My sites aren't even that big (anywhere from 40k to 250k visits per month), and this technique isn't tenable for most of them. I do use it on our wiki (mostly as an experiment, since CAPTCHA isn't as accessible as I'd like), which is the 40k visits/month site, and it needs new questions every couple of weeks, or we get hammered. And, all of our sites have other mechanisms for preventing spam, as well.



> "How many planet Earths are there?"

"Either one or probably infinity, depending on whether the universe is infinitely sized and whether unobservable parts of it are considered to 'exist'."


Zero is also a potential option, if we're running in some alien teenager's version of The Sims.


That type of solution really breaks down once you have a big enough target on your back for the botters to go after you.

What is the most widely used project you have successfully used it on?


Yeah, we tried this on our product forums and it didn't really help at all. :(


It varies greatly on the project you're working on. There are people out there trying to automate account creation on web based email's for the sole purpose of email spamming (same with social networks). And with complex botnet proxies and the TOR project, false accounts are getting harder to detect.

They have created interesting methods to combat Captcha images. They even have outsourced OCR services, where people in other countries are paid to solve ocr 1 image at a time. [0]

[0] http://deathbycaptcha.com


It doesn't work. All you have to do is solve 1% of these questions manually, and voila, you got a bot that can spam, all it has to do is load the captcha 100 times on average.

That's the hard thing about captchas, bots don't have to solve them perfectly to completely break them.


OK, so what Google is pushing is something where they track lots of stuff about your web site in exchange for a CAPTCHA that looks like every other CAPTCHA. That's so Google. Everything comes with a privacy intrusion.

Amusingly, the examples they give are actually readable. Most of the time, when I see a CAPTCHA displayed, it's not a word, or anything close to a word. I've seen ink smudges, math symbols, and Cyrillic.

Besides, machine learning is good enough now it can beat most people at CAPTCHA solving. Look on Black Hat World for the software.


This is a bit worrying. If CAPTCHAs start becoming easier for "real" users (those who are logged in a Google account, run the Analytics JS, etc.) and harder for "suspicious" users (who block ads, who use Tor, etc.), it may eventually become very hard and unpleasant to be a suspicious user, and non-suspicious users will not notice it.


I've already seen it implemented on humblebundle.com, using it is a delight compared to filling a captcha every time.

I hope Cloudflare soon adopts it for their anti-bots protection – my home connection is often flagged as malicious on a few websites.


Cloudflare also flags every Tor exit, which makes quite a bit of the web rather difficult to visit. I shouldn't need to fill in a captcha to load a gfycat animation.


I could totally believe that is based purely on observed activity coming from those IPs (as opposed to some intentional act against Tor)


That matches up with what they claim on their site:

CloudFlare does not actively block visitors who use the Tor network.

Due to the behavior of some individuals using the Tor network (spammers, distributors of malware, attackers, etc.), the IP addresses of Tor exit nodes generally earn a bad reputation. Our basic protection level issues captcha-based challenges to visitors whose IP address has a high threat score.

https://support.cloudflare.com/hc/en-us/articles/203306930-D...


From personal experience: most activity you're likely to see from Tor exit nodes is fraudulent. Absolute bottom-of-the-barrel cesspool traffic trying to probe for vulnerabilities, commit fraud, scrape content, avoid IP blocks, and generally abuse your site in ways that the attacker wouldn't be comfortable doing with their own IP address. It's really tragic - given the potential of the network as a privacy tool - that it's mostly used for evil, not good.


Good people don't care about hiding their IP that much.


"I have nothing to hide" is just plain wrong. The less Google and others know about me, the better.


> CloudFlare does not actively block visitors who use the Tor network... ...Yet all visitors who use the Tor network are blocked. There's a certain contradiction here. (Yes, I know what they mean, but the end result remains the same.)

I've often had problems when using a VPN too.


It most likely is just automatically doing it to known Tor exit nodes. Spammers use Tor quite a bit for scraping and other bot activities. Companies like Cloudflare just take the easy route and throw all exit node IPs into the garbage bin under the assumption that legitimate Tor users are used to such treatment.


google's recaptcha is not that hard to beat as it only checks one word (the computer generated one) while the other is google using you to OCR for their google books service.

Just fill one word and put some random crap for the other and you just halved the annoyance.



Not really, botters just haven't had a reason to automate it yet. If they did have a good reason to automate it, it seems like it would pretty easy to do so.



I don't think you appreciate the joke.


Hah! I guess I didn't look at it very close =)


I had an idea for a better Captcha many years ago for my Uni final project, but went with a different idea in the end.

Mine was a list of boxes with similar pictures in them, and ask the users a question.

* Choose the only dog

* Choose the cars

* Choose everything but the men in these pictures

Microsoft then created something similar a few years later but then killed it, I still think this was a better way than OCR type stuff. http://research.microsoft.com/en-us/projects/asirra/


The trouble is that a well-trained AI can currently slightly beat humans on this kind of classification test, and it also requires a lot of pictures if you don't want it to be trivially broken just by enumerating them all.


The image classification yes, but if the questions are worded intricately, then it might still be a challenge for bots.


You would spend more effort wording the questions than a spammer would paying people to solve them on mechanical turk, then storing the answer if it ever comes up again.


You need a huge database of images classified as dog, car, man etc, otherwise a spammer can download all your images and classify them manually. If you grab images from Google Images, the spammer can do the same thing.


Previously, reCAPTCHA evolved to show easier (street address) CAPTCHAs to users who have already passed a few hard ones. I guess the next step is to skip it completely.


You're mistaken, the reason street addresses first showed up in recaptcha is that google needed to have address numbers filled in for their street map service.


It's a combination of both, really.


Shameless plug - i am trying it from different angle :) https://hashcash.io/


Your site needs more detail about how your system functions and compares to existing captcha systems. All I could find on your site is that basically existing systems don't work and that yours does.


It's not always home street number I think. Once you proved you are not a robot, it will not ask you to fill a captcha next time on other websites.

Here is a demo http://www.google.com/recaptcha/api2/demo


I have noticed this. You get words for a while, once you get a good enough score you go to house numbers, if you fail a house number you go back to the words. Playing with the demo of it if you fail the checkbox (letting it time out is one way) they throw you back to the house number again[1]. As other people have pointed out, what they're promoting here has been used on the Humble Bundle website for a while now, I guess it's the next logical step to show "good" enough users no captcha at all. Not sure about the privacy implications of that though.

[1]: https://i.imgur.com/EnydTJs.png


I'm not sure why this link is trending. Hasn't reCAPTCHA been around since a while?


Read again.


Read what? All I get is a picture of a cat and a link to reCAPTCHA.


I did. Please enlighten me.


I was implementing this yesterday and discovered there is absolutely no way to customise the style / layout of the captcha. You either use the light theme or the dark theme and that's it, and it's inside an iframe so you can't manually hack the css.

The old version used to be customisable so I really hope Google adds the ability to customise this soon.

Another trivial but important oversight: the captcha has a background color of f9f9f9 but the fallback captcha has a background color of ffffff. So even if you try and style around it unless you manually detect what kind of captcha is showing and change the background color on the fly one of them is going to look off.


There's very simple API and all you need is to insert something like img src=recaptcha?challenge=C. Isn't it?


You only insert a div and the script places an iframe inside it. I don't think there is any way to only load the image.


there's a way of course - what can be done with JS can also be done on server side. But it's not a legit way to use, yes.


Certainly an improvement. But captchas remain user-hostile and generally unnecessary.


Have you ever had to deal with spam on, eg, a forum or wiki? CAPTCHAs are absolutely necessary in many cases.


Yes. At PayPal we had one of the very first. And then at Eventbrite as well. They are most definitely overused.


Interesting. If I may ask, what alternate routes did you take to counter spam without resorting to CAPTCHAs?


Honeypots and timestamps would work in many cases. There are folks who want to captcha anything because they couldn't care less about users. But then when you take a step back and question if it's really necessary, it's frequently not.


I've seen the "I'm not a Robot" checkbox in Humblebundle.com.


Why not use some kind of Bitcoin-like proof-of-work system for this? http://pixelspark.nl/public/pow.html has a working demo


Because bots will commonly run on stolen computer time. The bot doesn't care how much of your computer processor it uses.



Here is a key difference from prior CAPTCHA services :

"reCAPTCHA offers more than just spam protection. Every time our CAPTCHAs are solved, that human effort helps digitize text, annotate images, and build machine learning datasets. This in turn helps preserve books, improve maps, and solve hard AI problems. "

They're using captchas to solve text analysis and digitizing problems.


reCAPTCHA has been around in that form for a long time already. This post is about a new system they're planning on rolling out to replace that one. This one won't require any text input for (most) users.


Most of the stuff is already digitized. They moved over to leveraging it against their maps (building numbers, etc).


I've seen this new captcha plenty of times already, I'm surprised it's not yet "official".

It is much more convenient and painless from a user's PoV but I'm a bit surprised it actually stops bots.


I don't see how it's differente at all. Could you elaborate?


I honestly don't get it. How is this different from the same old reCAPTCHA? Click the checkbox, get a captcha, fill it in. What's the difference?


Personally, I'm no longer solving reCAPTCHAs after I noticed that Google uses it for free labor. (Google sometimes knows very well that I'm no robot, yet it still shows a reCAPTCHA). So far it affects the Chromium issue tracker, which presents a reCAPTCHA to post more than one comment per day.


I just get a picture of a cat.


So they're turning captchas into a mechanical turk for free?


That's what reCaptcha was from the very beginning. That's also the reason why a lot of people intentionally misspell the non-challenge word in it.


So they buy reCAPTCHA, then they self-deprecate it.


I couldn't help but notice the 2006 MacBook Pro with front-loading CD-ROM drive.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: