Google ReCaptcha targeting non-Google browsers

dual_basis · on Nov 12, 2018

It used to be that I never had to fill out reCAPTCHAs, but now I get prompted almost 100% of the time, even when I am logged in to my Google account. I'm not doing anything unusual, and the only other person on our home internet is my wife. (Besides, I still get prompted to fill them out on other internet connections anyway.) Some of them take up a minute or two, especially the image segmentation ones (apparently I disagree with the general population on where the dividing lines should be).

Is there anything I can do about this? I'm being held hostage for about 20 minutes a day to help train Google's AIs. I used to think CAPTCHA was excellent - we were fixing up OCR from books and allowing the library of congress to be digitized and available to all. This gradually morphed, and now (as I understand it) the benefits from this human training are kept locked behind Google's walls. If they at least released the resulting dataset it wouldn't bother me as much.

beatgammit · on Nov 12, 2018

I started putting some BS answers, and they seem to be accepted. It seems you need to select at least 2 tiles, but not necessarily the right ones.

It doesn't get rid of them, but I can usually get through them a bit faster since I don't need to actually look at the image or text.

Relevant XKCD https://xkcd.com/1897/

dual_basis · on Nov 12, 2018

I mean, the reason this works is generally because there is some information which Google does not have, that's why they are asking you to provide it. So, for instance, in the case of words, they used to present two words which were scanned in, one which they knew was correct and one which they didn't know. As long as you enter the one they know correctly, they assume you've also answered the one they don't know correctly. (Maybe they do some variant of this, but that was the general idea.)

I assume they're doing something similar with the image segmentation tasks, so if you randomly select two tiles and click "next" there are some images where it actually doesn't know if you are correct or not, so it ends up storing your response and lets you through.

adetrest · on Nov 13, 2018

I thought I was getting crazy, glad to see I'm not the only one. I very much resent training Google's ai for free as well just so that I can login or send a contact form.

whyagaindavid · on Nov 12, 2018

May be switch off your devices. Restart router to get a new ip. - get a new phone and try. Perhaps one device is having malware.

dual_basis · on Nov 12, 2018

I manage IT for a number of small businesses, so I routinely reformat my devices and also monitor traffic on my network regularly so I am fairly certain there is no malware. (Well, to the extent that one can be confident, at all, nowadays.)

That said, the idea that Google can hold my time hostage in this way is concerning. The suggestions you presented are not feasible for most users. Who has money to just go buy a new phone because Google's reCAPTCHAs are taking up their day? Why should that be acceptable?

breakingcups · on Nov 12, 2018

There is something incredibly worrying about the fact that choosing to not have a Google account means a large portion of the web is walled off from you.

Choosing to not have a Google account means you are co-opted into doing labor for free (helping their image classifiers recognize cars, shop signs, etc.) for Google. Have a browser with tracker blocking built in? Aah, too bad. Please "solve" these 5-10 pages of work for our algorithm. Don't want to? Too bad, you can't use this site.

And yet, if clients come to me looking for a good solution to determine who is a bot and who isn't, I don't know what else to recommend. Other Captcha solutions seem completely inferior.

dual_basis · on Nov 12, 2018

I agree completely. The worse this gets, however, the less reCAPTCHA makes sense to me. The main reason to use it was the fact that it worked invisibly for users it deemed as not being a bot. Based on the responses here (as well as my personal experience) this seems to work less and less now - maybe partially because Google wants more users to train it's algorithms. Therefore there is not much to distinguish reCAPTCHA from some other CAPTCHA system, including simple ones you could make yourself. The image recognition stuff requires more work but, arguably, we are even worse at basic NLP tasks, which are even easier to make CAPTCHAs for. All you need is some private dataset, which you can create manually fairly quickly, and a corresponding task. For example:

Select which of the following are fruits:

1. banana 2. firetruck 3. baseball 4. apple

On top of that, if there were independent implementations of CAPTCHA systems the exact mechanics of interacting with the CAPTCHA would be sufficiently different that the very act of submitting responses would, itself, be a bot test.

Tsubasachan · on Nov 12, 2018

I care quite a bit about privacy on the internet. If you do things like use a VPN, delete cookies and not let your browser leak information Google will make your user experience hell.

heroprotagonist · on Nov 12, 2018

I haven't found a perfect solution to avoiding tracking, but I frequently look around for new ways to improve my privacy.

One interesting thing I have found was to use Firefox's container tabs with the Temporary Containers extension and a per-domain isolation configuration which will put keep Google search in an temporary isolated container but still let me use mail.google.com in a persistent container.

Combine this with 'Google search link fix' so that search result page no longer uses local stubs to track links followed. Configure Temporary Containers configuration for Google to always open links from there in a new temporary container.

Then use uBlock Origin to not load Google's analytics when you're on the remote page.

Hopefully this prevents Google from tracking search history, though it's possible they could use some IP and browser fingerprinting to establish a weaker correlation on a shadow profile.

I'd love to also have per-container proxy configuration which would use a random proxy per temporary container, but I guess that's not possible yet.

I haven't looked much into ways to randomize the browser data sent to server, but its also something I'd like to be able to do if it were automatic and random enough (eg, a per-request randomization of user-agent and reporting of things like fonts, plugins, screen resolution, etc, which are used as correlating factors in fingerprinting tools).

jocoda · on Nov 12, 2018

On principle I don't use Chrome/chromium, unless a project requires it. It's not a bad browser but I wonder if running a google executable on your computer is like inviting a thief into your home. I still remember when chrome first launched and they would check for updates multiple times a day, every hour or something like that. And what's this virus scanner they offer?

Anyway, I run ff most of the time, and there's a site that I use daily that requires a login. Used to be I would receive a recaptcha challenge maybe one or twice a week, even though logged in. No problem, probably using recaptcha to throttle bots I think.

For the last week plus there's been a fundamental change and I'm receiving multiple recaptcha challenges in the same session, even though I'm signed in, max so far has been 4 in the same session covering maybe 30minutes or so.

Not sure if it is the site itself which is to blame for a poor implementation, or if this is google.

But co-incidentally this happens just as google announces recaptcha 3. Conspiracy nuts might say this is google leveraging their effective captcha monopoly to drive users to chrome.

dual_basis · on Nov 12, 2018

I use Chrome, and I also am getting an annoyingly large number of reCAPTCHA requests.

That said, I don't think it's that much of a conspiracy. It is easy to defend, from Google's point of view - if you use Chrome they have much more information about your web traffic to therefore discern if you are a bot or not. I could absolutely see them implementing rules which, while they do not explicitly prioritize Chrome traffic, in practice end up doing exactly that.

adtac · on Nov 12, 2018

A showerthought I recently had: as long as CAPTCHAs exist in their currently inefficient form, it could be argued that we'll never have strong artificial intelligence because CAPTCHAs are basically reverse Turing tests. I haven't developed this thought thoroughly, but I thought it'd be interesting to share.

jobigoud · on Nov 12, 2018

The T in CAPTCHA stands for Turing Test.

The moment we can't make them efficiently distinguish bots from humans would be the moment the test is passed.

jammygit · on Nov 12, 2018

For the last 6 months its been taking 10-30 minutes to log in to certain sites due to the captchas. Its like an entire afternoon after dinner to log in. I use a VPN, which is the cause, but its gotten insane. How much bot spam are sites getting now that this became necessary?

leibwiht · on Nov 12, 2018

It's even worse for Tor users, logging into Discord makes me solve about ten or twelve rounds of reCAPTCHA. It's maddening.

gcb0 · on Nov 12, 2018

and they ate not even careful with their data sources.

most of my IPs run tor relays without any exit traffic, and google still flags the whole company for 30 minutes of their image classifier training every time each time a captcha shows up.

I wonder if I can start to write that time off as a donation to google image classifier program for the IRS

DeathArrow · on Nov 12, 2018

Are there any decent alternatives to invisible reCAPTCHA?