Hacker News new | past | comments | ask | show | jobs | submit login
Are You a Robot? Introducing “No CAPTCHA ReCAPTCHA” (googleonlinesecurity.blogspot.com)
694 points by r721 on Dec 3, 2014 | hide | past | favorite | 420 comments



"If we can collect behavioural data from you, and it matches closely the behaviour of other humans, you are a human. Otherwise you're not a human."

Does anyone else get that feeling from the description of what Google is doing? I've tripped their "we think you are a bot" detection filter and been presented with a captcha countless times while using complex search queries and searching for relatively obscure things, and it's frankly very insulting and rather disturbing that they think someone who inputs "unusual" search queries, according to their measure, is not human. I have JS and cookies disabled so they definitely cannot track my mouse movements and I can't use this way of verifying "humanness", but what if they get rid of the regular captchas completely (based on the argument that eventually the only ones who will make use of and solve them are bots)? Then they'll basically be saying "you are a human only if you have a browser that supports these features, behave in this way, and act like every other human who does." The fact that Google is attempting to define and thus strongly normalise what is human behaviour is definitely a big red flag to me.

(Or maybe I'm really a bot, just an extremely intelligent one. :-)


> and it's frankly very insulting and rather disturbing that they think someone who inputs "unusual" search queries, according to their measure, is not human.

Insulting? how is that insulting? You are entitled. You are entitled you block scripts, use Google's FREE service to perform any search query to search the web while blocking any program that attempts to identify you as not a bot.

But if while using their free service they cannot identify you as a human given the factors they measure(becuase you actively disabled the programs that measure such factors), then I see nothing wrong in them trying alternative ways (which were a standard before).

I think you are making a storm in a teacup. If you feel offended by the way their website work, just don't use them. I don't see any red flags at all.


You are entitled.

Has calling someone entitled ever been useful? For the last few years it's felt like nothing more than a petty "you're wrong" remark with bonus condensation built right in.


Been useful at changing their behaviour/beliefs? No, for the same psychological reasons any direct contradiction isn't.

Been useful at communicating that they feel someone is confusing expectations with rights? Yes, that's why there is a word for it.

Also, I get it's annoying to hear entitled, ad hominem, logical fallacy, privilege, and other currently trendy words being over used and often misused. But I'll take it over what there was before that, which was no discussion at all at that level of abstraction most of the time. The places where these words are overused are places where the community is learning these concepts.


> Has calling someone entitled ever been useful? For the last few years it's felt like nothing more than a petty "you're wrong" remark with bonus condensation built right in.

In this case, calling someone entitled is actually a compliment, not an ad hominem insult or put-down, because it acknowledges the poster's humanity.

The proper response in this case would be, "Why, yes, I probably am a bit entitled, like most people. Thank you for recognizing that I am human."


Do you mean condescension? Not trying to be a smart ass, I just got a chuckle out of the word choice.


There you go, being condensing again.


You need to chill out!


Is this Reddit? What year is it?


>Has calling someone entitled ever been useful?

Not useful, and more over, it's ad hominem.


It's only ad hominem if it's used as support for an argument not directly related to the other's personality. It's not ad hominem in this case, because "you're entitled" is being used to support the claim, or as a parallel claim, that it's unreasonable to be insulted by a computer program thinking you're another computer program.

Claim: Turing was wrong about the Church-Turing Thesis because he was a homosexual -> ad hominem

Claim: Turing was immoral because he was a homosexual -> not ad hominem (although still not a good argument)


Saying "just don't use Google/Bing/search engines" is like saying "don't use the airlines".

The security checks suck big time, but really, these services are must-have and there is no better alternative.

Complaining and hoping for some change is all that's left.



Sorry, as much as I tried to like it, the results just don't match what I need...


Because they don't make any money, because only the sorts of shitty practises people often complain about are what enables making money providing free services.

If enough people were actually willing to pay to use a search engine you could have an awesome search engine with none of that.


They insert their own referrals for various sites, they do make money.


https://startpage.com - they offer anonymized google results & don't log IP.


DDG is excellent most of the time, but way behind when doing video or map searches. I have it as my default in most places, but sometimes I just have to fall back to Google.


Agreed. Additionally, DDG is useless when searching for current news events. Google is great at flowing in headlines for recent events with searches, which often leads me to include the "!g" bang if searching for a recent issue/store.


I don't disagree, but it did trump the point of the comment it replied to.

DDG will get the job done.


And DDG makes it super easy to search google with a bang:

"!g hacker news"


Yup. The bang feature makes DDG amazing. I use it as my default because it's a great way to quickly do tailored searches. It's very easy to say, search Hacker News (!hn), python documentation (!py3), R sites (!rseek) or just pull up a wikipedia page (!w).


You can use Disconnect Search that make your google searches anonymous: https://search.disconnect.me/


I wish that were true. I tried to switch to them after they did their 'next' a few months back but the results were most of the time not nearly as good as Google's for my queries and I had to switch back.


Part of the reason google is so good is that they track you.. if you disable traffic, they aren't quite as good.


Google's services are not free.

You trade your data, screen real estate, and attention for their service. This is worth a lot - Google is worth a lot. They didn't do it by giving out services for free.


Google's services are free. By trying to redefine what free actually means into 'has no cost whatsoever to anyone' you ruin the word.


If you trade a fish for a service it isn't free. If you trade a gold bar for a service it isn't free. If you trade a service for a service it isn't free.

If you trade data for a service it is not free.

To consider "free" a function only of fiat currency is naive, both of history and of economics.

Google search is not free.

If it is I have no idea how they made so much money...

Or maybe you can tell me what 'free actually means'?


By that broad definition, there is practically no free website on the web (analytics, logs, etc.). Actually, by posting this on HN, I just "traded data, screen real estate and attention". Would you argue HN isn't free as well? I get what you mean but I don't think this is how the word free is commonly used.

Besides, I believe the original point made still makes sense if "free" is assumed to mean "non paid".


A note that screen real estate and attention here pertains mostly to paid impressions - be they advertisements or politicizing messages. When it comes to content sought by the user, it's hard to say that the user is giving the service provider real estate and attention. It is only when the service provider is showing content not for the benefit of the consumer but for their own self that the attention and real estate can be thought of as 'rented' to them.

I would agree with the assertion that there are practically no free websites on the web. Since when did we convince ourselves we can get things for free?

There are major exceptions. Wikipedia is for the most part free. It does not advertise to you, nor does it siphon and sell your data. It does not track you around the web, it does not sell your Wikipedia viewing behavior to urchin for cents. It is driven by donations.

HN also appears legitimately free to me. As far as I know YCombinator does not mine or sell your data or collect data other that what is required for the forum to be a forum. YCombinator makes its money by other means. It certainly benefits by cultivating a technical online community, which is why I think it does it - though what influence YC can/does project on the community could be thought of as a social cost (I know very little to nil about whether or how much this is done).

Google, however, is not one of these cases. Nor is most of the web.

I'm not sure if the original point still makes sense with 'non paid' (nor am I sure 'non paid' is right). The original point uses 'free' (in caps) to emphasize a sense of charity they use to inform their 'entitled' argument. First, their argument is essentially 'What you expect this to be free? You are entitled!' Second, I'm not sure that replacing the term will work, unless it also communicates charity.

The point here is that the exchange does not constitute charity. Google thinks the trade is a very good deal. Presumably internet surfers do too. But there is an exchange and that needs to be recognized.

Anyway this means that any term that communicates 'charity' will be ignorant of the conditions of how Google's service works - and I would have posted the same misgivings.


Google Search is free to search. It's not free to advertise on. Searchers are not Google's customer.


In this sense the searcher, her data, her screen real estate and her attention are the product Google offers to advertisers.

These are the things the searcher trades for the service.


If a fisherman gives you a fish in exchange for writing your name and time of your visit down in his logbook, I consider the fish to be free for all intentions and purposes.


I would agree with that.

I would also posit that Google looks and does nothing like that fisherman.


I know what you are saying, but actually this service is free. Blocking bots effectively is in both the website owner and Google's interests, because bots disrupt the value propositions of both. And you could argue that it is in the website reader's interests too, by extension.

Given their scale and resources, Google are able to provide a far more effective bot detector than any of us could do on our own. I for one am delighted they are providing this very valuable service.


Not sure what blocking bots have to do with the freeness of the service. Perhaps you'd like to reply on one of the comments further down to get into why you believe the service is free?

You may argue that the trade is in the website reader's best interest. This is a different argument than whether it is free.


My real estate and attention are given to them because I came to their service asking to fill my screen according to my query.

I can agree Google is not providing a free pure-search-results service, but they do provide a free search results + ads service. Whether getting relevant results + [relevant] ads is a worth anything to you - even $0 - is a separate question, but it's a stretch to frame as an exchange. It's like taking a free hot dog and complaining it's not free because you traded your time & taste buds eating the bun while you only wanted the sausage... [I'd buy it more for e.g. youtube pre-video ads, where you are forced to give attention and time to the ad first.]

Now my data is a better point. Very valid for the totality of google services; quite weak for logged-out search use. If you work answering questions, and recording the questions that get asked and where they came from, then yes I did hand you this data but it's almost inherent in asking the question.

[Disclaimer: I'm a xoogler. And all this is nit-picking.]


free as in beer: Free in the sense of costing no money; gratis.[1]

[1] http://en.wiktionary.org/wiki/free_as_in_beer


I don't understand how this clarifies the term free? Free as in beer is used to specify the freeness of a product or service, rather than the freeness of 'freedom', say from authoritarianism.

This conversation is about the meaning of 'money' modulo this understanding of free-as-in-beer - i.e. whether non-fiat scarce resources (user data/screen real estate) count as money.


free has 20+ definitions.[1] A discussion about what we mean when we say Google's services are (or are not) free is using free the same way we use it in free as in beer [thus its definition is relevant].

Colloquially we usually use free to mean not having a financial cost. Another word or phrase is usually used when referring to non-monetary costs. i.e. I would say "Google is free" but I would never say "Google costs nothing."

[1] http://en.wiktionary.org/wiki/free


The top sentence is granted - not sure it was ever in question.

The bottom part you use personal anecdotes to support the claim that a broader 'we' do something. I'm not sure, as my personal experience differs. But it does get to exactly what I was saying in the above comment - what the discussion centers about is what counts as 'money' (as you say "referring to non-monetary costs").

I think the place we differ is whether non-fiat scarce resources count as money. I think they do. Historically they have. In economics literature and practice they do.

Or perhaps the reservation is that the scarce resources in this instance are 'soft' resources like attention, screen real estate and personal data? Much of what is traded by financial institutions (for example) today are very virtual - trades of risks, credits (promises), futures, bets. Even real estate is traded on the idea that it occupies space of human attention and investment - not necessarily because it can be used as a means to 'produce' something. I'm hesitant to draw firm lines between these soft assets - I'm not sure where I could sensibly draw them.

Either way, I'm glad we agree that Google costs something. I do think that the OP intended their use of free (in capitals and context) to mean "Google costs nothing."


Perhaps the downvoter would be kind enough to clarify why they think these comments do not contribute to the conversation.


I didn't downvote but you may want to read HN's Guidelines[1], particularly: Resist complaining about being downmodded. It never does any good, and it makes boring reading.

[1] https://news.ycombinator.com/newsguidelines.html


The challenge (no complaint here, though I do believe it was down(modded?) merely because of disagreement and not for relevance or quality) was meant to incite more on topic discussion.

It's interesting I've never read the guidelines before now. Was refreshing to have taken a look, although it's mostly common sense and etiquette.


They are if you block ads, scripts, and cookies.


You're generalising. This argument only makes sense if Google's entire ecosystem of services was just like any other random, independent selection of sites on the Internet.

There is no equivalent to Google. Nobody else is doing this, particularly not to this extent. Not using all that computing power and AI to do it.

Yes, if Google thinks I'm a robot, I don't think it's so strange to consider that some sort of value judgement, even if it's done by a legion of machines. Definitely more so than if some random small-time website decides to make that call based on a couple of if-then statements.

Imagine if using a web service is like visiting a shop, and you get directed to the slow-checkout+ID-check lane because maybe you stammered your order, or because you know the store that well, your shopping-cart route through the aisles is deemed "too fast" (read: efficient, also avoiding the "special offers", cookies/candy/soda/junk aisles).

Amusingly, how I feel about that "judgement", varies. Sometimes it's annoying sometimes it's cool because I feel "hey I'm doing something clever that humans usually don't". Similar to how being ID-checked in a liquor store can be both annoying and flattering (depending on your age and how often it happens).


You'd have a point except for the fact that recaptchas have become increasingly impossible to solve (for humans!). And recaptchas aren't just on google sites, they're everywhere.


Which is what this is trying to help solve. They know they're getting harder, so they're trying to identify you before even hitting the captcha part so that you don't have to do it.


No one should complain about anything, ever.


Actually it would be great is someone has some ideas for ways to identify humans that don't require stuff like javascript. From the perspective of a service provider (and I'm one) the bots are a scourge, they consume resources and they are clearly attempting to 'mine' the search engine for something, but they are unwilling to come forward and just ask the search provider if they would sell it to them. And since they are unwilling to pay, but willing to invest resources in appearing more human like, it leaves service providers in a pretty crappy position.

So anyone have some CSS or otherwise innocuous ways of identifying humans I'm all for it.


On a small scale, it's not too difficult. Detecting form POSTs with a new session catches most comment spam bots, and if an empty input field hidden with CSS is submitted with content, that's also a giveaway.

And I wouldn't discount javascript - another hidden field populated by onSubmit() is simple and effective. A few vocal paranoiacs advocate browsing with javascript turned off, but they are few and far between - and I bet they get sick of adding sites they want to see to their whitlist. We have over three thousand fairly technically aware users, and none have been tripped up by the javascript test.

If your site is valuable enough for an attacker to manually figure out your defences, then you need to consider emailing a a verification token - or even better, use SMS if you can afford the cost. Because this gives you a number to pass to law-enforcement, it means an attacker has to buy a burner SIM card.

Back on topic, Google's initiative is a useful tool to add to your defences.


Isn't this just the cost of having a "free" product? Bots are not really a problem. Its just that their traffic cannot be monetized. If you could monetize-bot traffic your problem would be solved. Or put another way, if you framed the issue as a business model one, not a technical one, it might be a useful exercise.


   > if you framed the issue as a business model one, not 
   > a technical one, it might be a useful exercise.
That was kind of my point. Clearly most of the bots are trying to scrape my search engine for some specific data. I would (generally) be happy to just sell them that data rather than have them waste time trying to scrape us (that is the business model, which goes something like "Hey we have a copy of the big chunk of the web on our servers, what do you want to know?" but none of the bot writers seem willing to got there. They don't even send an email to ask us "Hey, could we get a list of every site you've crawled that uses the following Wordpress theme?" No instead they send query after query for "/theme/xxx" p=1, p=2, ... p=300.

On a good day I just ban their IP for a while, when I'm feeling annoyed I send them results back that are bogus. But the weird thing is you can't even start a conversation with these folks, and I suppose that would be like looters saying "Well ok how about you help load this on a truck for me for 10 cents on the dollar and then your store won't be damaged." or something.


You may try to contact scrapers through access denied page.

Did you try to explicitly state that your data is available for sale when denying access to p=300?


If you wanted to buy data from Google, how would you email? What is Google's email address?


Google posts lots of contact information on their contact page. You would probably want to reach business development. I don't think they are willing to sell access to that index however, we (at Blekko) would. I suppose you could also try to pull it out of common crawl.


It need not to be commercial service. For example, Wikipedia is a donation-only service. A bot visit is generally not different then most user visiting (I'd assume most users don't donate anyway). Wikipedia doesn't really mind serving users that aren't donating, but the bot, while generally not different to normal user, are stealing resources away from actual users.


That's why Google needs proper API or Pro edition where you could execute proper SQL queries, etc.

Instead, Google is making their search less functional. I don't get why.


They should, but the proper response would be a solution that solves what others can't not complaining about someone not solving something you decided yourself to try out.


Including about other people's complaining.


It was sarcasm.


see no evil. here no evil. speak no evil.


Don't be evil


TIL 'free' is an excuse for unethical behaviour.


(Disclaimer: I work at Google, but not on ReCaptcha.)

The point of this change is to make things easier on 90% of humans -- the ones who have JavaScript and third-party cookies enabled now get to tick a checkbox and be on their merry way, instead of doing a useless captcha when we knew they were already humans. Recall that when ReCaptcha initially came out, the argument was "humans are wasting all of this time, let's turn it into useful work to digitize books".

If book-based or street view-based captchas go away, I suspect it will be because bots/spammers got better at solving them than humans, not because Google thinks that the machine learning spam detection approach is fail-proof.

Recall that "reading" captchas already pose an insurmountable barrier to users with conditions such as illiteracy, low vision, no vision, and dyslexia. To accommodate these users, audio captchas are also provided, but a 2011 paper suggests that audio captchas are either easy to defeat programmatically or are difficult for users themselves to understand: https://cdn.elie.net/publications/decaptcha-breaking-75-perc...


I am visually impaired and can attest to both visual captchas being a pain and audio captchas being hard to understand. this change is nothing but an improvement as far as accessibility and usability goes. This is only a plus for people who implement these, as I have actually left sites that had insurmountable captchas for me.

Thank you.


Check out webvisum.com - from their website:

"WebVisum is a unique browser add on which greatly enhances web accessibility and empowers the blind and visually impaired community by putting the control in your hands!"

"Automated and instant CAPTCHA image solving, sign up to web sites and make forum posts and blog comments without asking for help!"


I was curious about the CAPTCHA solving, too, so I tested WebVivum out on ~8 reCAPTCHAs.[1] It solved all except 2 of them, taking 20-60 seconds each time. In 2 cases it reported failing to solve the CAPTCHA, but it never gave an incorrect result. That is, whenever it gave a solution the solution was correct (in my brief test).

So, while it's some way off their claim of "instant" CAPTCHA solving, this is definitely a very useful addon, especially for those people who cannot solve CAPTCHAs at all. Thank you for pointing it out.

[1]https://www.webscript.io/examples/recaptcha


> Automated and instant CAPTCHA image solving

How do they do that? This sounds like whitehat use of blackhat tools. Are they using captcha-solving farms?


There are ways to solve captchas somewhat reliably programatically. I suspect this plugin only works on certain computer generated captchas, not the street sign ones.

http://resources.infosecinstitute.com/introduction-to-automa...


They send the captcha to their servers and how they solve them is a secret.

http://www.webvisum.com/en/main/faq#q15


Is there a web service where one could purchase AI recognition of fuzzy text, e.g. a street sign or book cover in a photo?



Very helpful, thank you! I have a difficult OCR problem to solve, rather than identity. Interesting to see that the market price for "being human" is $0.00139.


For non-captcha OCR also consider Mechanical Turk. And there are a variety of services built on Turk too.


The fact that this works shows that distorted-text captchas are no longer effective.

From the Google's blog post:

> our research recently showed that today’s Artificial Intelligence technology can solve even the most difficult variant of distorted text at 99.8% accuracy


If book-based or street view-based captchas go away, I suspect it will be because bots/spammers got better at solving them than humans

But, wait. Isn't that what we want? It seems like bots and spammers have a relatively small cost to a company like google, while digitizing books and house numbers is relatively valuable. I don't have numbers for a detailed cost-benefit analysis, but if bots get good enough to do time consuming work accurately, that's a win right?


That's like flying because you like airline food. No one flies if they don't have a destination. No one will put a captcha on their site if it doesn't tell computers and humans apart; that's its primary job.


From you description, you do sound kinda' like a bot. Disabled cookies. Disabled Javascript. Irregular searches. I understand the frustration with saying, "You have to have these features supported to use the product," but let's face it: providing an experience to people who deliberately disable huge chunks of browser functionality is a tremendous pain in the ass. I think I can understand both sides of the argument using different strawmen:

"Can I read this paper, please?"

"Yes, of course, just put on these reading glasses."

"Why do I have to put on the reading glasses?"

"Well the font is quite small. If you don't wear the glasses, you probably won't be able to make out anything on the page. Your experience will be seriously degraded."

"I don't want to wear the glasses. Why can't I just read the page?"

"Well, we can fit a lot more data and make the page more robust by printing the text smaller. Why don't you just wear the glasses?"

"I have concerns about the glasses. I'd rather strain my eyes."

"We're not going to make a special page for you when 99% of the people are totally okay with wearing the glasses or wear the glasses anyways."


"I have JS and cookies disabled"

So imagine what bots often don't have.

Adding JS interaction and cookies takes more effort on the part of the programmer writing a bot.

So yeah, you'd look a lot more like a robot. How else would you quickly differentiate between human vs non-human based on a single request, or even a collection of requests over time? It's a game of stats at scale.


Here's a snippet of Python using the splinter library, to visit Google, type in a search query, and click 'search' (which is very Javascript heavy these days with their annoying 'instant' search).

from splinter import Browser b = Browser() b.visit('http://google.com') b.fill('q', 'browser automation') btn = b.find_by_name('btnG') btn.click()

Not exactly 'more effort'...


With Selenium you can open a full web browser such as Chrome or Firefox and have it run through. A Google search is six lines:

require "selenium-webdriver" driver = Selenium::WebDriver.for :firefox driver.navigate.to "http://google.com" element = driver.find_element(:name, 'q') element.send_keys "Hello WebDriver!" element.submit

https://code.google.com/p/selenium/wiki/RubyBindings

Writing a bot with js and cookies is trivial, but it definitely won't defeat these tools. They probably look for time between actions or track mouse movements, stuff that makes the bots super inefficient.


Yeah, but if you are trying to automate thousands of simultaneous requests, you'll have to use a lot of servers, which is costly even in the cloud.

Right now google and bing will run sites with JS enabled to see the DOM after any JS changes take hold. Usually these crawls aren't nearly as often as the general crawling, because there is quite a lot more CPU/Memory overhead to such utilities. I can't speak for splinter, but similar tools in node or phantomjs have a lot over overhead to them.


Still less effort than typical captcha.



That wasn't the point


It's more effort as less libraries support it, you need to execute unknown code on your computer, etc.


I think you're being a little hyperbolic. Google is classifying what is already normal human behavior. Having JavaScript disabled is definitely not "normal" human behavior. Of the stats I found only 1-2% of users don't get JS and the UK's Government Digital Service found[1] that only 0.2% of users disabled or can't support JS.

I don't think regular CAPTCHAs are going away anytime soon since any bot detection system is bound to have false positives.

[1] https://gds.blog.gov.uk/2013/10/21/how-many-people-are-missi...


Exactly. It's perfectly reasonable to present users who disable JS with a one-time CAPTCHA they have to solve to use the site. Many sites just (usually unintentionally) prevent users with Javascript disabled from navigating a site at all, so this is a huge step up from that.


The fact that Google is attempting to define and thus strongly normalise what is human behaviour is definitely a big red flag to me.

...But this is their core search competency and exactly what makes their search so powerful. Page rank is basically distributed wisdom of crowds, aka algorithm of how people behave (build their websites) based on a search term/imbedded link.

This seems like a perfect extension of this. Remember the vision of google: "to organize the world's information and make it universally accessible and useful." Human behavior falls squarely into a large segment of the "world's information."


>Remember the vision of google: "to organize the world's information and make it universally accessible and useful."

I'm sure that's why they got rid of the ability to search what people are saying on forum and blogs. Google still indexes everything, they just got rid of the filter.

Their results now give preference to SEO'd pages & adverts.

The old discussion filter returns an illegal request error https://www.google.com/?tbm=dsc


Their search is only "powerful" for finding the more mundane and widely disseminated information; I've noticed that it's increasingly difficult to find very specific information with it as it basically misunderstands the query and returns completely useless results. Maybe that's why I look like a bot, as I try to tell it exactly what I want...


Well this is exactly the point. Obscure information has a very low social/viral index and as a result a lot of people don't interact with it so it is hard to find with Google - which is why I don't think it is a particularly robust search engine on it's own in the grand scale of knowledge development.

Google seems robust because humans generally think pretty similarly, and generally look for the things that the people around them are talking about or also looking for. That breaks down considerably though across cultures and time.


When trying to use Google to find something obscure, I'm not so much bothered by the difficulty of doing so as I am by the implication that "real humans" don't use complex search queries. They used to teach in computer literacy courses how to use search engines, complete with complex multi-term boolean queries, to find exactly what you're looking for. Now try the same with Google and you're a bot? WTF? They're basically saying "humans are too stupid to do that - humans are supposed to be stupid."


Or that a lot of their target demographic has never been taught that, and so they've optimised their delivery to be accessible to the majority?


Well to be fair most of their users probably are too "stupid" (aka were never taught) to do that.


Their search IS powerful, even for obscure things. But when you disable JS and cookies, as you have done, you are taking a huge amount of that power away from the system. Of course you are going to get bad results for anything which is specific to you -- you have disabled their ability to make a better judgement!


> "I have JS and cookies disabled..."

Disabling essential parts of web functionality breaks web functionality. I'm shocked.

Dropping the snark though. I'm surprised that this is still a complaint. At this point in the web's evolution cookies and Javascript are essential. Disabling those will make your experience worse and complaining about that is like removing the windshield from a car and complaining that bugs get on your face.


Tracking cookies are certainly not essential.


Yeah, tracking cookies might not be. But cookies in general? They're essential for a large amount of sites to handle something as simple as logins.


I would suggest you're over-thinking it. The essence of it is "we think you're a bot because you haven't given us enough private information about yourself".

Exploiting that information is Google's core business, and it doesn't like people evading their panopticon. So they're no making life harder those who care about their privacy.

Not surrendering your data to Google? We'll treat you like you're not even human, and through reCaptcha we'll tell thousands of other websites to do the same. That will teach you to hide things from the all seeing eye of Mountain View.


Why shouldn't us bots be able to search or participate in forums?


As long as you abide by all the social norms including moving that damn mouse the right way, I have have no problems with you, dear bot.


We'll legislate inefficiency. If you cant be as slow as a human, then you will be restricted.


Bots are equal, but separate.


Anecdotally, I block cookies and tracking scripts from Google and even run some custom javascript to remove the link shim in search results. I have yet to encounter the "we think you're a bot" detection filter, except when performing numerous complex iterative queries or Googling from within Tor.

The above is to suggest that perhaps tracking bugs and cookies aren't a component in the bot-detection algorithm, though that remains to be seen.


Well, looking at an example, my behavior definitely trips the 'not not a bot' detection for NoCAPTCHAs for whatever reason. I'm not too shook up though - it's really no more inconvenient than before.


Can you suggest a way to tell the difference between you and a bot? Merely throwing flags around without offering anything better isn't very helpful.


There isn't a way. As AI improves bots become increasingly indistinguishable from humans. All this does is rely on the fact that bots tend to use different browsers and behave in different ways than humans. But that can be fixed.

But it doesn't matter. If a human user spams tons of links in the comments after creating 20+ accounts, who cares if they are a bot or are doing it manually? I believe that websites should instead use machine learning like this to detect the bad behavior itself, rather than try to determine who the user actually is.


"bot" means no profit from ads. There you have it.


We were actually just discussing the "what if I trip their filter" concern at our morning meeting. Full disclosure: my company (as the username implies) builds FunCaptcha, a CAPTCHA alternative. Your concern, to us, is a very valid one and has been a driving force behind our own design and mentality. Our lead designer is (understandably) passionate about this so he actually wrote a few words on the blog that dives pretty deeply into the topic, if you're inclined: https://www.funcaptcha.co/2014/12/04/killing-the-captcha-wit....


I've also tripped Google's bot filters. Frankly, I'm more offended that Google is discriminating against robots, seeing as they are one of the leading companies in automation and AI :-)


Though you were joking, it's worth noting they're certainly not discriminating against robots. They're discriminating against your robots.

Which is to say: they're perfectly willing to let your crawl-able content and internet use help train their robots, they just don't want their crawl-able content and internet use to train your robots.


Aren't we talking about spambots, mostly? While law-abiding bots should probably be allowed in most sites, nobody wants a spambot in their blog or forum.

Isn't it right to block spambots? And if so, how do you tell regular bots from spambots?


Use re-captcha to prevent the spam-bots from posting... the real bots will just crawl anyway.

A couple months ago, I implemented some regular expressions to try and block a lot of bad actors, and have that include smaller search engines... our analytics traffic dropped around 5% the next week... our actual load on the servers dropped almost 40% though. Unfortunately it was decided the 5% hit wasn't worth reducing the load 40%.

Which sucks, moving forward a lot of output caching will be used more heavily with JS enhancements for logged in users on top of the nearly identical output rendering.

Server-side React with some useragent sniffing will break out three rendering server side. "xs" for those devices that are "mobile" (phones), "sm" for other tablet/mobile devices ("android", "ios", etc), and otherwise "md" ... "lg" will only bump up on the client-side from "md". It corresponds to the bootstrap size breaks.

In essence, I don't care. Bots get the same as everyone else.. if you don't have JS, you can't login or fill out forms. Recaptcha should go a step farther in helping deal with bots...


^ Probably the most underlying comment in this topic.


Is there an anti-trust angle to this?


In terms of advances in automation and AI I welcome this development, because this is new offensive in bots vs advertisers/scrapers. Bots will of course adapt, it is only question of time, and adaptation is advancement in automation and understanding of human behavior.


Oh, I thought the parent comment was going to mention how Google might be using this to additionally train an AI to learn what human behavior is like, just like they did with the 411 service to collect voice data some years back.


I'm sorry the web cookie hegemony is oppressing you. Come up with a better solution to filter bots and Google will hire you. Nobody is pushing this down your throat.


You sound like a robot.


A robot permanently stuck on the "outraged" setting.


Replicants are like any other machine. They're either a benefit or a hazard. If they're a benefit, it's not my problem.


I used to use selenium to crawl Google webmaster tools data. Despite randomizations and such, they still had me marked as a bot.


As google gets stronger in AI, this becomes less of a problem, no?


The slippery slope has gotten a bit steeper.


They've been doing this for a while. With recaptcha, you got easier captchas (a single street number instead of two words) if the system thought you were human. There was an official post about this a year ago. [1] Now they have probably improved the system enough to be confident in not showing captcha at all. It's nothing revolutionary. If it thinks you're a robot, you still get the old captcha. [2] [1] http://googleonlinesecurity.blogspot.com/2013/10/recaptcha-j... [2] http://i.imgur.com/pCKS8p5.png


I've noticed that I get the house numbers when I leave 3rd party cookies enabled, and a much harder, often impossible captcha when they're disabled. Since leaving them off doesn't break much else, I do, and just fill out the harder captchas when I come across one. By the way, you only have to fill out the "known" word, the one that's all twisted and usually impossible to read until you refresh the image 10 times. Even completely omitting the 2nd word, which is the unknown word that OCR couldn't figure out, it will still validate.


Am I being paranoid when I think that offering this "free" service is a great way to track people over even more sites and usually the most important conversion pages that don't have the usual google display ads. I don't see a big technological innovation here as it appears that mostly they are checking your cookies to see if they recognize you.


So they can track you better and provided better targeting ads. That's where they get the money from. That's how you pay for visiting websites.


Every "free" service from Google has the end goal of serving you personalized ads. That's their business.


Certainly correct. I guess it is new for me that I'm requiring my users to give their data so they can be served personalized ads.


And there are a number of sites using Analytics, Adwords/Adsense, DFP and a number of other points of connection. They've already offered/bought recaptcha, all this does is make it easier for most people (who have cookies and JS enabled).


I think it also depends on the website? Some offer easy captchas all the time, some not.

One example: I have never seen a "hard" captcha here https://webchat.freenode.net/


It says that they do take mouse movement into account but the cookie part makes me feel a little uneasy:

> IP addresses and cookies provide evidence that the user is the same friendly human Google remembers from elsewhere on the Web.

If this becomes a trend then major commercial websites will become unusable for people who are not accepting (third-party) cookies. "Because those damn bots" is a straw man argument to make people trackable by assuming that there are no other useability improving methods that don't track the user (which I think is highly unlikely).


It's only used to generate a confidence level.

"In cases when the risk analysis engine can't confidently predict whether a user is a human or an abusive agent, it will prompt a CAPTCHA to elicit more cues"

So if you have cookies disabled, you'll probably just get a regular captcha


that is the bait part of the bait&switch strategy.

Just ask anyone older enough to have worked with Microsoft et all in the past.

yeah, the company is nice now, but nobody can say anything about tomorrow. So do you their sane offerings, but be aware that you may have to be on the line to change it at a moments notice. and try to not depend on it too much. (i.e. always have a 1% bucket with an alternative solution, least you find yourself locked in when you 'thought' you had an alternative if you 'needed')


Sorry, can you expand on their long game here? They've just made it easier, is that the bait? Captcha was already everywhere so it doesn't seem they needed bait to popularise it? It's also used by non Google companies who will presumably stop including it if Google decide to make it an impassable lock - which would I guess be the switch? Making it harder would just be back to before, so an actual effective switch would just be a lock? Which doesn't make sense for the obvious reason that Google make more money in their services are accessible. What are you getting at here?


if they drive all other are you a human solution out of market (they already own captcha which most sites can't exist with without being drowned in spam) and then start to charge/show intrusive ads on it, then you have no option other than accept it.


The regular CAPTCHA is already approaching the point of being unsolvable for mere humans. If the only people who have to use it are cutting into Google's profit margins by blocking tracking, that gives them even more incentive to make life miserable for those users.


okay, that's reasonable.


But then a bot need only copy a human's mouse movements and disable cookies and we're back to the status quo.


Sure, for the bot. Captchas are pretty effective against bots, what's wrong with presenting the status quo to a bot? This is meant to improve things for the real people. I will appreciate not having to decipher some strange text.


I'm just not convinced from what's been shown that they'll have a long-term ability to distinguish between human mouse movements and those of a bot. So I'm curious as to really how one can keep this in place before it's overrun by spam and you need to make it more difficult anyway. More power to them if it works, but they're pretty light on details that inspire confidence, IMO.


i'm sure no one at google thought about that


There's no call for sarcasm. Do you have more insight into what data could allow them to distinguish between bots and people? If you do please share, because the Wired article and Google's own marketing video don't provide much information - it's obviously more focused on marketing than the technology.


why would google voluntarily give up that information?


That's not reasonable, its yet another way google is monitoring everything I do online.


well yes and no. In a way it is a win-win situation: It is reasonable in that it doesn't aggravate the situation for those who block cookies. (yet) And those that allow cookies get at least some convenience in return for that.

However, looking from a different perspective you can say that they're taking advantage of people blind with greed who want maximum convenience when using the web.


What makes you think they weren't already monitoring?


I'm using a tablet device, thus no mouse, and it gives me a captcha whenever I'm not logged in..

Really is this the breakthrough of bot detection? They are just leveraging cookies --which is nice improv UX-- but why do I need to click? delay loading to relax servers?


Well, it's probably that most sites you get a captcha on, give one to everyone... this will only reduce that for real users.

On the flip side, there are other events to hook into... onfocus/onblur, keydown, etc, etc... which can all go into bot detection... if you fill out a form and didn't focus on anything, click on anything, or press any keys.. you're probably a bot... If you have JS disabled, you deserve what you get.


For now I'm you'll just get a captcha if you don't match the standard behaviour. But it's that little bit easier to just use Chrome and stay signed in to Google all the time now. That's good enough for now surely.


It also makes your cookies valuable to spammers.


It's definitely not only relying on cursor movements. A simple $('iframe').contents().find('.recaptcha-checkbox-checkmark').click() proved that I'm not a robot, without me touching the mouse.


The cursor movement story seems to be a smokescreen to dilute the fact they're actually running an internet wide monitoring network, tracking users from site to site and building profiles on them.


I tested it a few times in their demo trying hard to to funny things with the mouse pointer. Still I was asked to type in some text all the time.

Then I logged into my gmail account and yes it worked.

So you're probably right about that smoke screen and it has nothing to do at all with mouse movement.

My fx browser deletes cookies at exit and my IP changes frequently and I think that's the true explanation for the outcome of my little test.


I was confused about all of this until I read your comment. Makes perfect sense now!

Also makes me think that they can dump the whole charade and provide no security check at all, but that'd probably make the service-providers uncomfortable and they lose the user as a source of human-intelligence for classifying things on google image searches.


I'm not sure where cursor movement was mentioned, other than the comments.


I read it in the wired.com article about this also submitted to HN: "And [Vinay] Shet says even the tiny movements a user’s mouse makes as it hovers and approaches a checkbox can help reveal an automated bot."


Sounds like if I 'Tab' to the field & hit enter I must be a bot....


I guess it's a lot more complicated than it looks at a cursory glance.


It's almost as if Google is playing its cards close to its chest to avoid tipping off people intent on defeating it's captchas!


Which is why Google Analytics is free for site operators.


To be fair, any service or site - Facebook, Twitter, Google Analytics, etc - that encourages your putting their code or widgets or whatever on your site is doing exactly that in addition to listing out which of your friends "like this".


Makes perfect sense. This is Google's response to the Facebook 'Like' button [tracker]


Then what is Google Analytics?


can be blocked without any impact on the browsing experience.


I tried this demo (https://www.google.com/recaptcha/api2/demo) in a new incognito window. Then in devtools:

  javascript:if(!window.jQuery||confirm('Overwrite\x20current\x20version?\x20v'+jQuery.fn.jquery))(function(d,s){s=d.createElement('script');s.src='https://ajax.googleapis.com/ajax/libs/jquery/1.8/jquery.js';(d.head||d.documentElement).appendChild(s)})(document);
  $('iframe').contents().find('.recaptcha-checkbox-checkmark').click()
I got an extra verification (enter two bits of text)


Probably the incognito window. It seems to rely heavily on cookies (likely linked to their analytics data.)


At best this would be just trigger another arms race with people developing bots to trigger a series of "human-like" mouse events, which I'd imagine is a far-simpler problem than OCR.

Not to mention that it could only be used as a heuristic and not a test; so, eventually the weight of that heuristic will just be reduced to zero once someone publishes humanlike_mouse_driver.js with carefully-tuned-to-look-statistically-human mouse interactions available out of the box.


You know a peculiar thing that humans do? They do things slowly. You know what bot (programmers) hate doing? Doing things slowly. When you've got bots that are diving through thousands of registration forms per minute, and suddenly you need to slow them down to 15 seconds per form, well, that's already an enormous win for the site owners, even if it makes the new captcha fail (if the quoted statistic of 99.8% of captchas being solvable by bots is correct).


Are you logged into Google? If you have their cookies it's one indication you're not a robot.


I wonder if you used a Google account in good standing for a bot, at what point would they start to detect it was a bot? I imagine if you used it for more than a few CAPTCHAs daily you'd probably end up having to do additional validation.


Which is a bad idea.

I actually have some bots which scrape Google sites (for the purpose of integrating stuff like Google Keep into KRunner), and they just use the Useragent of a regular phone, send normal POST data, etc. Works perfectly fine, and — I just checked — this bot is recognized as normal user by this captcha system. No Captcha input.

I tried it even with a new Google profile and just using cURL to log into Google, then started a new browser session and imported the cookies from cURL. Worked just as well.

I guess this makes it easier for malicious bot-authors...


Interestingly I logged out of my Google account in Chrome and immediately got an old style CAPTCHA. But not when I logged out in Safari (I very rarely use Safari).

Edit- Nevermind, it looks like Safari left some google.com cookie lying around while Chrome deleted it. Deleting it gave me the old CAPTCHA.


I strongly believe it is the bot detection .. They may also added extra checks to already logged users


Wouldn't it be possible to just tune bots easier now instead of having to work with OCR?


Who wants to place bets on the "potential robots" more likely to be those without Google cookies or a Google Account?


I can't think of many better uses for the menagerie of different tracking methods that they have planted on me, to be honest.


Well if you ever run for political office it's going to save them a boatload on campaign contributions.


Eh, you never know. Attitudes toward certain improprieties have moderated in recent years...


So what? Other once acceptable behavior is no longer tolerated.

There is always a center, just because it moves doesn't mean that people can't be a socially unacceptable distance from it.


I suspect you have mistakenly stumbled into a lighthearted thread with an overly serious mindset.


I've learned that I'm probably a robot, and it's probably because someone on my subnet hits the rate limits on Google APIs sometimes.

There was one day when it was so convinced, it was giving me impossible captchas just to use Google Search.


It's a poor betting opportunity as it's pretty much obvious it's the case. It also makes perfect sense.


Certainly seems to be true in my case.


Seems to be true for me too. When I reject cookies, I get more difficult recaptchas.


I recently added recaptcha to a site and got this version.

From an implementation standpoint it is utterly painless. The client side is copy/paste from Google's site and the PHP/server side was this:

      $recapchaURL = 'https://www.google.com/recaptcha/api/siteverify?secret=600SZZ0ZZZZZIZi-ZZ0ZEHZW1000Z_0ZZZ00QZZ&response=' . request_var('g-recaptcha-response','') .'&remoteip=' . $request->server('REMOTE_ADDR');
      $recapchaRespone = file_get_contents($recapchaURL);
      if(is_null($recapchaRespone))
      {
            print("Recaptcha failed. <more error msg>"); return; 
      }
      $recapchaResponeJSON = json_decode($recapchaRespone);
      if(!( !is_null($recapchaResponeJSON->{'success'}) && $recapchaResponeJSON->{'success'} == 'true'))
      {
            print("Recaptcha failed. <more error msg>"); return;
       }    
Most of the time it just gives you that one checkbox, but if you use the form multiple times (e.g. testing) it starts to give you the classical text entry box. I have no idea how it works fully and this article only sheds little light on it.


Is that your actual secret? you might not want to reveal that/you should get a new one.


It is not. I kept the length and style the same to give a better example, but replaced most of the characters with 0s and Zs.


So now spammers will use botnets… Oh wait, they already do.

They already have the botnets. Now they need to use those end-user machines as proxies, using the credentials already on the machine. They just need to figure out the other parameters: maybe it's running js code ? Then you can use a browser engine/selenium). Maybe it's the click pattern ? Just generate the json data and send it. They can even apply the same machine learning techniques to figure out the best way to circumvent the captchas.

And the escalation continues.


Yeah, I feel like it won't be too long before spammers start finding ways to emulate users without having to solve any CAPTCHAs. Google is likely going to need to switch their 98%/2% to something more like 80%/20% (that is, 20% of users will still need to enter CAPTCHAs).


I am using a small tool that I wrote to integrate Google Keep and other Google stuff with KRunner and so on, and this tool (essentially being a dumb bot) also passes all the Captchas.

I’d say malicious authors would have it really easy now.


Computer Vision guy here. Okay so you've made some improvements for normal users.

The captchas are still the old same, just not shown everytime. Still can be cracked with latest neural net techniques. The visual matching stuff can be guessed 6/10 times.

You still have audio captchas, that can be cracked.

If all fails you still have cheap labour from third world country. I don't see why this is revolutionary?

Google will now have their captchas present on every site and start logging user behavior in the name of identifying bots. Who says they won't use the data to drive their ad empire?


Even if the success rate of detecting robots stays the same, I would say this is still a win because the majority of humans won't have to mess with it any longer.

Even better: those with various disabilities won't have to mess with it. My parents' only disability that I know of is near-complete computer illiteracy and I can tell you from experience that every time they're presented with a normal CAPTCHA it's like somebody just handed them a Rubik's Cube and told them to solve it before they can create a profile. In every case I know of, they just turn the computer off and walk away. Now, these are what I would call normal humans (don't tell them I ever said that) so I can only imagine how aggravated those with visual and/or auditory problems get when presented with a crazy CAPTCHA. And when your revenue comes from getting people to submit these, I can see it still being a boon to the website, even if all they did was lower the barrier of entry for humans.


My guess is it's based on the tracking data they already collect on most people. I try to avoid it, so I get stuff like this:

http://i.imgur.com/6mGYsav.png

I have no idea what that second word is supposed to be, so if you use this, I probably won't use your site.


It isn't user friendly, but often with that type it doesn't care what you enter for the tough to read world. The easy to read word is your test. The other is a way to harness people to do difficult OCR tasks. Some web sites have had fun organizing entering dirty words for the tough to read ones regularly to mess with the results :)


It's actually the other way round for the Captcha posted and most recaptchas I've seen. The easy to read word is the OCR and the hard one is the real captcha.


This is fascinating. So they're harnessing the collective OCR powers of the internet surfing public? Diabolically clever.


"unklist", although I have not idea what it means.


When a Captcha contains two words, usually there is one "real" word, and one "fake" word.


Here's the JavaScript behind it: https://www.gstatic.com/recaptcha/api2/r20141202135649/recap...

It's hard to see what's sent over the wire (it's obfuscated), but the source gives you a good idea of what they're collecting. The biggie is the GA cookie which is running on over 10 million sites. Like any CAPTCHA, this is still breakable -- just load your actual cookies into Selenium or PhantomJS and replay your mouse movements. Of course, once you do that more than a couple times, you'll likely have to write a crawlers to generate fresh cookies. At that point, you may as well just break the visual CAPTCHA which is trivial anyway. Ie. You should still never use a CAPTCHA (http://www.onlineaspect.com/2010/07/02/why-you-should-never-...).


Captchas can also be useful as a differentiator between free/paid plans, or to slow down users (see 4chan)


In the long run, I think it's unavoidable that AI-type systems continue to improve, while humans don't, so this will become a harder and harder problem.

One helpful approach would be to separate out "why CAPTCHA" into preventing abuse (through high volumes) and "guaranteed one (or small number) per person" from "am I interacting directly with a live human", and using different things for each.

The naive solution to a lot of this is identity -- if FB profiles are "expensive" to create, especially old ones with lots of social proof, you can use something like FB connect. However, there are a lot of downsides to this (chief being centralization/commercial control by one entity, which might be a direct competitor; secondarily, loss of anonymity overall.)

One interesting approach might be some kind of bond -- ideally with a ZK proof of ownership/control, and where the bond amount is at risk in the case of abuse, but it's not linked to identity.


The tiniest mouse movements I make while tabbing to the checkbox and hitting my spacebar to check it? Or tap it on my touch screen? And why wouldn't this be vulnerable to replaying a real user's input--collected on, say, a "free" pornographic website? Their answer seems to be "security through obscurity".


"Security through obscurity" is a weak concept, however the goal here is not security but fraud detection.

Obscurity is a legitimate component of a fraud detection system, for the same reason that hiding your cards is an important part (but only a part!) of being a good poker player.


It's always security through obscurity.

f(obscurity, time, analysis) = clarity

The details of implementation are left to the reader as an exercise


I wish they would explain more about how the user interacts with the whole reCAPTCHA leads them to know it's a person and not a robot, but maybe they're worried about people writing bots to get around their protections.


> However, CAPTCHAs aren't going away just yet. In cases when the risk analysis engine can't confidently predict whether a user is a human or an abusive agent, it will prompt a CAPTCHA to elicit more cues, increasing the number of security checkpoints to confirm the user is valid.

Probably using a combination of G+ and GA to check your 'history' to see the activity is like a normal human. Visits a couple news sites each day, checks their gmail, searches for random crap randomly, GA registered a 'conversion' for some company = probably a human


I was thinking they may be looking at how long it takes for a user to click the "I'm not a robot" link. A robot would probably load the page and quickly, without delay, send the HTTP POST but I have to imagine they thought of this already and bots writers would quickly add a sleep() call in there at some point... Yea, I wonder about their internal logic too.


Or more likely they realize that explaining that your robot quotient is based on a statistical analysis of the last 6 hours of your browser traffic would probably freak everybody the fuck out.

They're almost certainly using the adwords cookies that get hit from 90% of the sites out there to figure out if you're a bot or not.


True enough, although it doesn't stop Google Now from coming up with helpful suggestions like "Hey, we noticed you were looking at this movie; would you like to see it on Amazon?"


Giving that away will give the bots a head start to figure out how to overcome this new roadblock.


In regards to the video, I feel like it has become a cliche to have upbeat, light ukelele music in the background for product demonstration videos. I instantly felt myself become annoyed when the music started.


And some unison whistling at the end to complete the cliche. Weird they would even have a video for something like this, leading me to be more suspicious of the mechanisms and data they are using behind the scenes to make this work.


God, I never put my finger on it until you mentioned it. So true.


What a misleading headline. Google will now look at your mouse movement but really be scanning to see if they've been tracking you across the web. Anybody who is is concerned enough about privacy to block/clean cookies will be assumed to be a non-human.


And then you'll do a captcha like they would have you done previously. So either you're status quo, or you get an improved experience if you're in the vast majority that don't do anything other than non-default. What's wrong with that?


It's interesting that the adversarial nature of internet security is "breeding" an adversarial AI. Inevitably, people will start working on AI to beat this new captcha. I think in terms of parallels to biological evolution, security/fraud AI has the greatest evolutionary force behind it. Fun and scary to think where this particular breed of AI will lead.


I always assumed Google's use of reCAPTCHA was to augment the OCR used to digitize Google Books, particularly in results the software couldn't confidently match to a word. Is this true? It's interesting that it's still the fallback for the new method.


That was the original goal of the project.

http://en.wikipedia.org/wiki/ReCAPTCHA

"By presenting two words it both protects websites from bots attempting to access restricted areas[2] and helps digitize the text of books."

For some time, you could pass a reCAPTCHA test by just entering the more distorted word correctly.


This should be the top thread. I find the whole topic of crowdsourcing to compensate for the inadequacies of computer vision (and other inadequacies) fascinating. OCR was the first problem. We've been helping Google Maps identify house addresses for a while now with reCaptcha, and with this announcement it looks like Google is finally tackling the problem of image association. Computers suck at determining which pictures contain birds. By making users tag all of the images on the web, they're making image search much more powerful and will hopefully improve the entire field of computer vision.

When I tell my future robot to go get my coffee mug, I don't want it coming back with the PS5 controller.


I only ever enter the distorted one, works every time.


That was the original idea behind reCAPTCHA (which originated outside of Google, acquired in 2009), but my understanding is that they long ago ran out of actual text that needed human OCR'ing, and/or found other reasons that approach no longer was helpful.

The "help OCR while also spam protecting" thing isn't currently mentioned on Google's recaptcha product page.


It is:

> Creation of Value

> Stop a bot. Save a book.

> reCAPTCHA digitizes books by turning words that cannot be read by computers into CAPTCHAs for people to solve. Word by word, a book is digitized and preserved online for people to find and read.

https://www.google.com/recaptcha/intro/index.html#creation-o...


Good catch.

I wonder where i heard/got the impression that it wasn't really being used for this much anymore. Maybe from when most of the recaptchas most of us saw switched from scanned books to google street view photo crops. And I was also surprised by the implication that google's algorithms really needed human help for visual recognition of almost exclusively strings of 0-9. I would have thought that would be a pretty well solved problem.

Anyway, somehow I got the idea that recaptcha wasn't actually providing much OCR help anymore, but maybe I just made that up.


For the past few years the recaptchas I've seen were illegible text next to easy to read text. I think its obvious that they've run out of the low hanging fruit and now just have the worst of the worst as placeholders. The move to house numbers just proves that they're kinda running out of badly OCR'd text.

This move isn't too surprising. OCR based captchas have always been a hack and the "best" captchas are like having the best collection of duct tape and WD40. At a certain point you need to stop doing half-assed repairs and remodel.


they also used it to decode street number addresses, for street view


Those computer-vision challenges mentioned in the blog post aren't 100% clear. For the first one, my eyes went directly to the cranberry sauce, and I thought to myself "Wait... is that one supposed to be clicked, too?"


I thought it was unclear too for a different reason - the text says "that match this one" which I read as being actual identical matches. Sure it's obvious when you see the images but that wording feels really awkward.


Same, my first thought was to look for the identical cat photo, or maybe the same cat from a different angle, not just other cats.


don't know if this works only for me but here is a live example - https://www.google.com/cbk?cb_client=maps_sv.tactile&output=...


It gives me the new version as well, but it seems google is convinced that I am a bot. Getting a regular captcha after clicking the button and I have to say that this is a lot worse of an experience than regular old captchas. Now I have to wait for a few seconds after clicking a button, then still solve a captcha.

Hopefully it gets better with time.


Do you happen to browse incognito or with 3rd-party cookies blocked?

Looks like the new version needs an active and valid google cookie in order to tell if you're a robot or not.


Try this one:

http://nomorecaptchas.com/

It's very similar. I might go as far as saying that Google copied them.


I attempted tabbing to the checkbox and pressing spacebar (not moving the mouse at all) and it worked just fine. Impressive. But I guess the tell-all is how secure it is against bots. not how easy it is for humans to get through. For all we know it could just be letting everyone through :P


It worked for me, though Google thought I was a robot at first.


Probably because lots of users are currently visiting the link.


Got recaptcha'd, probably because of Ghostery.


I've been blocking third-party cookies for a while, and I noticed that I only get the old, hard to read, captchas, instead of the easier version with numbers.

Too bad this new version won't work for me either.


Obligatory XKCD: http://xkcd.com/810/


I just hope it also works for pen-tablets, where the "pointer" can suddenly jump from one location to the next when the pen comes near the surface of the tablet.


Or the much more common case of touch screens. I'm assuming it's fine - the fine pointer movements are just one aspect of it, and tapping/clicking with a pen are likely to produce small movements anyway (whether or not those movements are suppressed by the driver of whatever device you're using is another matter).


That and what about those who fill their form with tab navigation ? No mouse involved here. It is just showing off ...


I just tried the one ins0 linked above, and tabbing through, using space to select the checkbox, worked fine.


Using my iPhone, it thought I was a bot.


More appropriately stated: couldn't determine a priori that you were a human.


It's definitely not mouse based. I tried it in an incognito tab and it showed me the old form when I clicked the checkbox.


I tried the demo in two different browsers that I use regularly. On the one that stays logged into my google account, I was not challenged with a captcha. On the other browser, which I use quite a lot but not with my google account, the captcha appears.

I'd think that having a long-standing google account with a normal history of activity would be a good indication that one might be a human. If google is weighting that heavily for this test, that may create a new incentive for spammers and scammers to hijack people's google accounts.


On what page did you tested the new system?



1. Tested on my normal chrome where I didn't delete any cookies and logged in to my google accounts, and no adblocker running. So plenty of evidence I'm human, with all those cookies from big G.

2. Tested in incognito mode: BAAM: I'm a bot, had to fill out the old captcha!


Wow, that's odd. I checked it out on mobile and they confirmed me human without giving me the images set similarity question.

I don't think they can predict this from the way I touch the screen >.<


So, I tried it. With a Google Account that had ZERO activity for two years. Using a Java program to activate that site.

Passed the CAPTCHA. Without any further verification.

I mean, spammers are going to love it xD


This not even close to a Turing test.

I thought Wired was supposed to be a tech site.


Wired has been slowly moving from being a tech site to a clickbait/sensationalism site.


I suspect there is no theoretically perfect solution to CAPTCHA. Bot must get good at emulating user and software must get better at identified users. None of our interaction with computer are non-emulatable so the war would be indefinitely ongoing.

But what we can do is to make it expensive for bot to emulate user. One way to do this is creating ID system which requires some form of payment and thus creating an ID and expensive proposition. For example, Amazon can make their user account as open ID for logins and provide the target system a flag IsVerifiedPurchaser. Payments don't have be strickly in direct monetary forms also. For example, Facebook can estimate ad revenue generated by an user so far and provide some flag as to whether user is active and trustable as not being a bot.


It's based on Google account. There might be other vectors too. I tried browsing lots of pages in incognito, making Google searches, clicking ads etc. but that didn't help (might work in long term). Signing into my Google account (with history) was enough to pass the test.


How long before you will have to answer a series of annoying and difficult questions if you don't allow tracking cookies and google to collect personal information (which I assume the far majority of users allow on a regular basis)? Not sure how I feel about this.


If they create services that you want to use so badly, they can charge you whatever they want for them. If you don't want to pay, I'm sure Bing will be happy to have you.


I suddenly became extremely conscious of how I was moving my mouse for the duration of that article.


Tangential: From a philosophical perspective, I wonder if notion of asking human beings if they are robots, will soon escape the space we consider virtual? In sci-fi (which more or less informs the masses not involved in such fields and have more sway over public opinion than say HN or LW), the premise focused on is that people seemed more concerned with asking robots/automata if they are human. I'm starting to wonder if such questions will become moot.

Though, I wonder if you can start to defeat such systems by slurping up headers sent on public networks (like coffee shops, public wi-fi in large cities, airports, etc) and with techniques like ssl striping, to obtain local-storage info being sent in the body.


> in the last week, more than 60% of WordPress’ traffic and more than 80% of Humble Bundle’s traffic on reCAPTCHA encountered the No CAPTCHA experience—users got to these sites faster.

Does this mean WordPress saw an 60% decrease in traffic from bots?


It still shows a CAPTCHA if it's not sure that you're a human or a bot.


The old recaptcha takes good sites and gives them terrible user experience. I tried to order tickets one time on ticketmaster and actually gave up because I couldn't get past the captcha. I hate current captchas with a passion and I hope they finally die. I understand fighting spam but when it completely ruins a user experience it's not worth it.

edit: Bury me with no explanation why? Please don't tell me you think the UX of using recaptcha is great. I'm a 28 year old dev with near perfect eyesight and It takes me several tries to get these right. They are horrible. I welcome this new change and hope it isn't easily cracked.


"Bury me with no explanation why?" -- because your comment has only the most tangential relation to the linked article.

Downvotes are supposed to penalize uninteresting posts, not just wrong posts. No one likes captchas, and everyone has had shitty experiences with them. Your comment adds nothing to the discussion, doubly so since you're complaining about a type of captcha that has just been replaced!


I'm guessing your third sentence triggered a penalty. Replace "current captchas" with a blank and you'll see why.

I agree that the reCAPTCHA experience is terrible and assume many others agree with you, in part spurring the development of this new approach. I don't believe that every reCAPTCHA has a solution, or at least a consistent one, so I always feel like a percentage of time wasting is built-in. To work around it, I usually regenerate it until I get one that looks easy, but it's still frustrating. Improving the odds of getting it right the first time will help improve the experience a bit. But my biggest gripe is that they can make direct downloads impossible for resources that don't require extra protection.


Hmmmm. Nice move Google. But I also remember that a machine passed the Turing test. And I don't like the sound of these two news together. Not because I'm worried about you, but because I'm worried about the websites I run as a dev.

Moreover, I'm not sure if most people know this, but reCaptcha was supposed to be converting ancient text to digital text[I read that once on quora, I'm not really sure if it's entirely true, but I guess it is]. So now, I can be less proud of not contributing to conversion of ancient texts to digital books.

And, I'd love you if you make the thing open-source, Google...


ha yeah that was definitely the original purpose/project of reCAPTCHA.... but I think shortly after the google purchase, the AI got so good at reading garbled ancient texts that G allowed it to move on to other things..like house numbers.


House numbers are actually the easier version, you got them if reCAPTCHA already knew you're probably human. After screwing it up a few times, it punts you back into the general category and you get two words again.


.. or human tracking?


I feel like there is a lot of discussion on whether its okay for Google to have this information or not, but truthfully if you don't want them having this info then don't use their services. Its that simple.


It rise privacy concern for me.

If real name and email validation is mandatory, I will wonder what Google will use that data for. Is that some kind of monitor tool which want to know REAL NAME, EMAIL ADDR whenever you want to use a website?

Don't forget Google always beg for your telephone number for "security reasons", Google+ wants your real name for "Policy reason", I upgraded one of my Nexus to Android L, and everything I do has to sign in: some game(Botanicula for instant) work fine without internet in Android 4.4.4 but requires internet signin in Android L.


Its an example form illustrating where CAPTCHAs or where the new captcha will be used


Where are you seeing mandatory real name and email validation?


The video demonstrates filling in name and email, although it doesn't say anything about it.


In the video the user has to type in their first and last name, and an email address.

I have to assume there is a confirmation email, otherwise what's the point in asking for an email address?


I believe that the video was just showing an example of a form on which a CAPTCHA would appear.


I think it's just a demo for a typical form.


Why would you care whether a user is a robot? Surely whether the actions of that user are desirable or not is determined solely by those actions and not whether the thing performing the actions is a human or a robot. It seems like a better idea to disallow bad actions than to disallow robots. There are also people farms who solve captchas (e.g. via porn sites who ask you to solve a captcha which they then input to another site, or by paying people $1 to solve X thousand capchas).


Because your dual business model is (1) showing ads to humans and (2) gathering intel on humans. Bots can be used to bypass both.


Robots don't buy stuff. Their actions by definition are undesirable from the service provider's standpoint.


Then limit the access of everyone who does not buy stuff to X requests per minute?


Unless its a arbitrage robot!


Do they even need the checkbox anymore? They could track mouse movement, keypresses, history or whatever else they are tracking without showing any kind of GUI at all.


How do you track mouse movement on a tablet?


You could track other things like window scrolling or the focusing of other form inputs.


If the user is on a tablet, you can use other methods like the front-facing camera.


Or add the mouse tracking checks to a button that is naturally part of the website, such as "Post Comment" on a WordPress blog.


This has the bonus of being an advertisement for itself.


Sorry if the article mentioned it (i only read the first few paragraphs) but looking at the cat click, that looks like a great way to generate training data for AI.


This is exactly what the previous implementation was as well. The "words" you had to recognize were either scanned from Google Books, or house numbers from StreetView, effectively enhancing their OCR training set.


I always liked the captcha's that asked simple questions or told you what to type. Sites that used what is 1+3 or type the number four into the box below.

Is there really a system once employed that coders won't overcome in days? Do we need a trusted user system, where the machine registered at a central site and that can be queried by the commerce site? Recognition of the consumer machine through combination of mac/ip range/provider


This will greatly help them with bettering their CV classification algorithms. The kitty picture matching question is a great example.


Maybe I am getting ahead of myself but this seems to have huge potentials and is actually quite interesting from a whole other perspective.

If a machine can determine you are human what if it learns your unique patterns? Couldn't it then be used to determine you are you?

And couldn't this solve the problem of identity?

In the bitcoin world, what if you could use this to log into your bitcoin wallet?


Think of Captcha as the first battle in humanity's fight against AI. Over the long term this problem has no solution.


Captcha isn't the first battle in humanity's fight against AI, its a recent battle in humanity's fight against abusive humans.


Except by answering captchas, you're helping the machines (first words for books, then numbers for street view, now pictures for classification).


That's an interesting thought that every time we introduce novel classification schemes to resolve humans from computers we feed a large training set for them.

I never understood how that works, though. If I get a captcha street address wrong then that means they already had the answer, so how am I contributing?


For the street address ones you generally have to answer two questions, one that they know and one that they don't know. You're only 'tested' on one of them, but you don't know which. Once enough people have gotten the known one right and given the same answer to the unknown image they move the unknown image to the known pile.


They already know the correct answer if they are only showing you one image to solve.

When there are two images to solve, they know the correct answer to one. The other is shown to thousands of people, and eventually it is solved with high confidence.


I wonder if they've taken into account people with movement disabilities such as multiple sclerosis or parkinsonism. Then again, the previous recaptcha techniques would be somewhat discriminatory against those with visual or auditory difficulties as well so they've had to think of this already


There's practically no documentation for this. They have "examples" for many languages, but they're so sparse as to be useless. I don't have time to crawl through their code. I'm going to wait a few months before implementing this to see what questions pop up on the blogs.


It seems like they're using cursor tracking to validate human-ness. Assuming thats the case: Since the cursor is outside of javascript's control, it would force the attacker one level higher (to the browser/os, instead of the dom). Not impossible, but still a significant barrier.


Not really, you would just need to reverse the js code, look at what data they actually send to google and randomly generate appropriate data like mouse movements.


Ah, good point


> Google also will use other variables that it is keeping secret—revealing them, he says, would help botmasters improve their software and undermine Google’s filters

I'm pretty sure that looking at the javascript calls will tell what "variables" they use, with browser agent, ip, cookies.


I had my own simple 'captcha' on my site (since recaptcha could be annoying), but now I added recaptcha since it looks like it will be easier on users.

You just need to query recaptcha's service and check if the json string they return contains "success\": true".


So how does it work?


Google collects all your data and applies machine learning to predict a probability value that you are human. If it is below a certain threshold you have to enter a CAPTCHA.


The question was more about on how the new captcha works, I guess.


That's the question he answered. It's machine learning based on browser variables and mouse movements.


The old system presented you a challenge-response test no matter what.


Sorry, I just thought "all your data" was a bit unspecific. I have no idea what Google collects in order to derive that. I have a lot of guesses, but "all your data" doesn't tell me in fact, how it works. And I thinks that was what the parent poster asked for.


The image captcha looks nearly identical to Confident Captcha: http://confidenttechnologies.com/products/confident-captcha/ (formerly Vidoop).


Robot testing if user is not a robot.

Now that it knows how to detect humans, one day we'll all laugh when we read the news that Google can't log in to their own administration systems, because an AI security algorithm evolved the decision to lock out human beings.


I'm concerned this will just ban people using obscure browsers, blocking javascript and cookies, and just behaving in non-typical ways.

Fortunately I rarely encounter CAPTCHAs outside of creating an account, and I can just do that through a different browser.


Challenge Accepted!


I feel like if anything this would just add some time to bots plowing through these so their web scrapers can move in gentle curves and whatnot to simulate a human. These don't appear to be that far off from being broken.


Forcing bots to slow down their interaction with online services to human speeds would still reduce spam by a lot.



This is great news! I really really really hate the normal Captcha's.

It has a nice property: To combat spam, Google can tweak this as often as they like without bothering the users of a website or the devs running it.


Anyone know where the documentation for the "old" recaptcha has gone?

How long can we continue to use the "old" way?

Seems like google want everyone to use this "new" method, which i am not so sure about yet



Does this remind anyone of SweetCaptcha? [1] The image match games aren't new. Is one better than the other?

[1] http://sweetcaptcha.com/


Here is a video introducing the no CAPTCHA reCAPTCHA https://www.youtube.com/watch?v=jwslDn3ImM0


If I had only seen that video, I would think this was an April Fool's joke. I then read the article and was relieved to hear there is a little more than this. Perhaps I'm just being pessimistic, but I feel like this only raises the bar a little. I would expect checks it does of mouse movements or headers to be spoofable quite quickly after introduction. I think the best measure mentioned is tracking which IPs are bots, but that's still going to have serious shortcomings.


Very cute video. I like the way the mouse pointers scares away the 'nasties'.


I don't understand how this can't be spoofed. I can only see how it'll slow a bot down, and maybe reduce the number of accounts it can create/commandeer.


As long as it behaves like a human, what difference does it make if it's actually a really sophisticated bot?


That brings me back to "why are we trying to prevent bots?" And I'm under the impression that we want to prevent automated spamming of $web_community_resourse. So to answer your question, it makes a difference because once the really sophisticated bot is in, it starts spamming up the place.


Am I Robot by Goodnight Electric is awesome http://m.youtube.com/watch?v=78nGkD3-0kU


Did anybody notice that they killed the email obfuscation service? It made possible to share publicly your email, which was getting revealed with reCAPTCHA.


I wonder how long it will take for black-hat hackers to crack this. I imagine that it will be very difficult to crack, but it seems inevitable to me.


One day we are going to have thinking, feeling robots and all this prejudice against them is going to haunt us.

Seriously though, this is a great improvement.


Is there a page where we can test out these new captchas? I'm curious about how they work in the wild, and this post only shows gifs.


Isn't that like a reverse touring test? The machine has to guess if I am human and not I have to guess if I am talking to a machine.


Is there any published research on this? I am interested to learn about these risk analysis techniques they are talking about.


If a bot just makes more random mouse movements of varied speeds, and takes a bit longer, won't it appear human?


The bot probably has too much or too little mail in it's Gmail inbox, or maybe its is not many Google+ circles of confirmed authentic humans. Maybe the bot has not been using Google Chrome for a very long time, so its browsing patterns may still seem incoherent. This socially disconnected bot is abnormal; it doesn't fit into society. It deserves punishment; we'll make it click pictures of cats. Once it gets enough friends on Google+, we'll cut it some slack.


I posted this multiple times in this thread already, but I tried it with a Google Account that has a fake name, hasn’t been used for over 2 years, has received nor sent any emails since then, is in only one Google+ circle, and was only ever used from Firefox.

And it passed the test. While trying it from within a Java client toolkit.

Like, I took the worst setup any spammer would have, and it passed.

How is this going to protect my sites from spammers? And on the other hand, am I even allowed to embed this into my site, if I am in the EU (Data protection, etc)?


Isn't that like a reversed touring test? The machine has to guess if I am human and not if I talk to a machine.


I can not find any documentation about how to put in the new reCAPTCHA if you already using their older version...


I'm fairly certain that it doesn't require you to change anything, it will just start working. You must be a bot.


What am I missing? Breaking this looks much easier than a regular captcha...

This is the sort of problem that genetic algorithms are well suited for (a small, well defined input domain with a binary oracle). You'd simply generate a random path, run a smoothing function over it, see if that works, then iterate.

edit: does anyone know a site that is actually using this new widget? I only seem to be finding the older version... :/


You're missing the fact that they're relying on many cues, not just mouse movement. I don't think they're even discolosing everything they use.


You're missing the fact that they went from something that is quite difficult to get a program to do (essentially an AI complete task) to one that's not AI complete.

It doesn't really matter how many non-AI complete components they are measuring... without at least one AI complete task, they removed the thing that makes CAPTCHAs work.


Except distorted text reading is not hard for AI anymore, and all "AI-complete" tasks, whatever that means, are pretty much broken nowadays.

So where left with a very hard problem, and their best solution so far seems to be security through obscurity with a bunch of non-disclosed "cues". Not great, but I guess it's hard to come up with anything better.


I really wish they hadn’t botched this through use of the first-person pronoun.

“I’m not a robot” no, computer, you sort of are


[Irony] Isn't "bot detection" a small problem related to "human tracking"??


I noticed this a while ago when the Humble Bundle switched. I was able to break it with PhantomJS.


What do you mean by "break"?


Passing the captcha through an entirely automated procedure.


In bulk? It's not supposed to stop real users from using a bit of scripting.


It kind of is, actually. Cloudflare, for example, uses a single CAPTCHA to prevent ongoing DDoS attacks. If they switched to this new reCAPTCHA and if a DDoSer can use Selenium to get past the challenge, then the CAPTCHA process has failed.

There are always tradeoffs with this. I strongly suspect Google is going to have to restrict it within a year or so, resulting in the number of users who still have to solve CAPTCHAs closer to 10-20%.


'a bit' being key. It's not like a DDoSer can't already solve captchas for three minutes if that's the only protection.


The work still has to be done manually in those cases, though, whether they type it themselves or rent use of a captcha farm.


In most scenarios you only have to solve one captcha. Those are not going to be significantly affected, since the manual work is minimal. It will provide a multiplier on traffic in the case that a captcha is needed for every single action.


Does your automated process still send your real Google cookies though?


Yeah, it uses real user's cookies to accomplish it, but I was also able to break it with traditional captcha-breaking mechanisms after Google presents the fallback. The point I'm making here is that this change doesn't really help much with proving that you are a human.


So if the risk analysis fails I'll get to see a captcha PLUS the new cool check box !, nice.


Let's make Captcha mobile friendly by having the client download MBs of useless pictures...


I could imagine that this system could still be tricked with a Markov chain and some dedication.


This looks like a very clever way to train machine learning algorithms on image recognition.


Perhaps Google will be using this to train a more broader purposed image recognition AI


Yes exactly... the whole concept of their original captcha came from fixing OCR where the text was distorted. ie put armies of unsuspecting individuals across the globe to work to help their ability to digitize text content into something more searchable.

So, no surprise, same thing is going on here. It's much less about security than it is about deriving value from the solved image matches.


It seems plausible that as more sites adopt this kind of technology, automated web access (e.g. scraping) the web will become harder -- for whatever purpose, good or ill. This has long been an "arms race" between hiding and detection. I can hope that reasonable uses of automation still remain feasible.


You're welcome as long as you're respectful.

Just use a bot with a clear User Agent, not "Mozilla/5.0 (iPad; CPU OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) CriOS/36.0.1985.57 Mobile/11D257 Safari/9537.53". And, don't forget to start by reading my /robots.txt. If you behave yourself and abide by the rules, why should I ban your bot?

If for whatever reason I don't want to allow your bot in, you might still try and contact me to ask, and perhaps I could arrange for your bot to scrape my site.

Automated web access to my site must obey my rules because you're using my bandwidth and resources.


This comment seems like a non-sequitur. My comment had nothing to do with a particular site, much less "your" site.

I was making a general comment about automation and detection. If the detection gets better than the automation, it could change the dynamic. There is no fixed rule that says that content providers will or will not allow scraping based on robots.txt or other guidelines. Some could elect to disallow any/all robot behavior, if they have the capability.


That strongly depends on the data you have. I have written boots to scrape sites with government data so that I could do searches that wasn't possible using their online forms. I did not look at, nor attempt to obey, their robots text, nor would I have given a rats ass about breaking whatever they would put up to stop me.

There is also the argument that when you make something available on the web you make it available to everybody.


[deleted]


It was good enough for 4chan, a site that used to recieve countless spam from probably dozens of botters before they got recaptcha. And the botters probably kept trying to spam afterwards without much success.


Now bots will have to up their game and really become human level agents. #evolution


Given that I use keyboard to navigate, I'd qualify as bot easily.


Where are you supposed to click if you're a robot?


And what about people with motor-neural conditions?


I think worst case scenario is that they get what they have today (solve a captcha style problem).

Con: They will have an additional click. Pro: They might get a more solvable puzzle than some of the "read this distorted text" images they would see today.


did you mean cyborgs :)


Hi all, if you want to give it a try, I've just added it to the MaterialUp submission form.

http://www.materialup.com/submit


will GUI automation tools be able to surpass this?


It probably uses a LOT more info than only the mouse move/mouse click. Remember, google tracking is embedded in probably 99% of websites you visit so from your ip, cookies, tracking, logged in Google profile, etc... they're able to know if you're a human or not.


So if I don't have a Google account or allow their cookies in other sites then I'm not a human anymore?


Nothing as dramatic : if you don't have a strong Google footprint, it is less likely that the system will recognize you automatically as human, and you'll have to answer the picture question.


If you don't have advertising cookies set then you're either a robot or a potentially unprofitable human user from Google's point of view. Both would be best avoided.

As others have noted here, that's not the goal here and the captcha will degrade to the current ones in this case, but it highlights an interesting way for internet properties to maximize their revenue per user by only allowing users whose existing advertising footprint suggests they will contribute meaningful value to use the service in the first place.

Detecting bots is the first step to this, but detecting potentially unprofitable humans would be a natural extension.


Think of it as "remember my login on this computer". If you check the box, you sacrifice security and privacy for convenience. If you don't want to be tracked, you need to perform extra steps.


Correct, no Google account or cookies = not human :-)


Not from Googles point of view, probably...


So we're actually training users to behave like what Google's behaviour model assumes to be the behaviour of humas. Because then they won't get annoying captchas.


I'm still more impressed by sites like http://www.funcaptcha.co/try-it/

Why have they not caught on?


This CAPTCHA is "fun" the way that fun-size candy bars are fun.


I would hate that one. Too time-consuming.


Hey! Thanks for the shout-out, glad you're digging us.


I guess I know why they haven't caught on now...

Just curious - is it the game? or just the whole concept that you guys don't like?


Oh, so just program bots to provide a mouse movement toward a form element along a distorted path, and always trigger them through the UI rather than as events. Got it!


The spammers thought the same when Bayesian spam filtering started working: "I'll just program a Markov chain to insert some none-spammy words". It did not work out quite so well.

I've also been thinking on how to defeat this, but even the mouse movement seems hard. You are trying to beat the big data statistics that Google has on these mouse movements. Programming a distorted mouse path does not take into account micro-movements (compare with saccades) or speed. To simulate mouse movements convincingly, I figure you'd need a model based on actual mouse movement data on these Captcha's. Also, you may succeed once or twice, but the third time the system is detecting your bot, making all your work void, since your code is now a new signature for bot-detection.

Bots want to quickly leave a message and move on to the next one. Real users first read an article, before they comment. If "Time on site before filling in Captcha" is a feature, then there may be no other way around detection, than for your bot to wait 5-10 minutes before filling in the Captcha.

I think this new system really makes it easier for non-bots to quickly fill in a captcha, and makes it harder for bots. Also because reputation (of IP, of cookies and behavior on Google domains) now seems to play a larger role.

The result of this reputation system for Captcha's is that Google admits to tracking their users outside their own domains. This also places the entire Google+ eco-system, analytics code, Chrome, fonts, charts and Google Hosted Libraries in a different light. Google tracks you everywhere, and the actions you take build or break your online reputation. I am not sure as a legit user that I want to trade this privacy concern, just so I can show my "humanity" and that I am not an evil bot. An evil bot would crawl the pages where these captchas are hosted, and join this data with the captcha user data. Then you can get personalized advertisements when you comment on 4chan in political threads.


Exactly what I was thinking. Even if Google got smart enough to detect that the distorted path speed was too mechanical, you could record 100's of macros of yourself moving the mouse towards a target. When it's time to submit a captcha, select one of the macros at random and play it back with some slight randomness added. Voilà!


Mouse movement is not the only factor Google is using.


I don't have a mouse, as I'm using a tablet device, and it recognized me as a human as long as I have cookies enabled


Yeah, good luck with that.


Ya, a little good luck to he, but not so much...

I have seen video-games bots that does that unbelievably well. Some powerbot.org scripts are sincerely more human than myself.


http://phantomjs.org/ makes this easy enough to do.


You seem to be assuming Google doesn't know of PhantomJS and many other automation frameworks. In fact, I personally wrote some of the code to detect them - not in ReCaptcha, but in a closely related project (which ReCaptcha may be using, even...)


I'm pretty sure you can just harvest mouse movement information and train a computer to mimic that.


The death of Lord Inglip.


Is it keyboard friendly?


I want to know how well it works with accessibility concerns of various and sundry natures.


You can try it out here: https://www.google.com/recaptcha/api2/demo

If you are a recognized as human you could perhaps try it via Tor.


I wish these google pages wouldn't automatically assume you speak some language based on your IP address - I get mine in German without any option to switch. Aren't there standards for setting language in web browsers?


Your browser sends an "Accept-Language" header with the list of languages and order of preference you have configured.


Sure, but many sites including Google like to ignore these (or at least have done so in the past).

The rationalisation is that users don't change settings like these, so their location is a better indicator than what their browser thinks.


> Aren't there standards for setting language in web browsers?

Yes, the Accept-Language HTTP header.


I'd assume that where cursor tracking is unavailable, this experiences just degrades to a standard CAPTCHA as depicted in the post.


Equally as annoying.


How does this work?


kjh kh khk hkjhkj


kjhkj hkj


fuck


Because ... Google.


Just give us your full name. How is this better?

They might already have my name but not in an incognito window. But they want it, obviously.


We can run robots to act on our behalf, but you can't. We can build and leverage AI, but you can't. Seems like that is where this is going.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: