Wow, that is very surprising. Is it that the web development industry hurting that much for good programmers, or are just the wrong people being hired?
There is also a skills shortage in programmers. If people like this can get work then imagine if you actually knew about programming. Remember that next time your thinking about your unsatisfying job or at pay review time.
For what it worth, I worked on a website that started receiving massive amounts of spam on its feedback page very shortly after it went live. We (as in the three programmers on the project) hated captchas with a assion. Instead we put in a field with the text "What is 1 + 1?" (If they missed it, we'd actually put in red next to it "Hint: the answer is '2'". (Granted, we checked the value server side.)
The end result, spam disappeared and we didn't add much pain to our customers.
Most spammers likely don't go check every website to see how they can break the captcha, they just set up a script to go fill out forms and submit them.
They're solution, while not being the "awesome, technologically advanced solution", if it prevented spam, was a working solution without the complexity of actual captchas.
Furthermore, as captchas have been known to be broken, who's to say that the spammers tool doesn't recognize valid, commly-used captchas and break them automatically? As opposed to a field that says "Type the following word", which the spammers don't (can't easily?) check for.
"Scammed" is probably not the right word here - at least to me, it conveys a malicious intent, while mistakes like this are merely ignorance. I'm sure most of us have made mistakes just as stupid as this, despite working hard to earn our pay.
'Scammed' is not the right word. Ever heard the aphorism "never attribute to malice what can be attributed to incompetence"? Manager types (and unfortunately probably quite a few programmers) have no idea what CAPTCHAs do, and I would bet money that somewhere, somebody has vetoed a server CAPTCHA in favor of a client CAPTCHA because it sounded easier or something. I'm not saying that's what happened here, but don't say it was obviously malice when you just don't know.
Usually scammers treat their victims as customers and wish them well.
In this particular example it was combination of technical incompetence [not being able to deliver proper CAPTCHA] with scam [of getting paid for project that did not deliver on promise].
How can this be a "mistake"? They created something that looks like a captcha to fool the client into believing it's an actual captcha. If they did't know how to make a proper captcha it's better to tell the client so someone else does it.
What if they don't know they don't know how to make a captcha? It's an obfuscated image, those are easy to make and check! If you don't know basic things like "never trust the client", and you don't know that they exist to know, then you may not know to tell the client to have someone else do it.
That doesn't excuse the programmer. As a web programmer, it is, to some extent, their job to know when they're out of their league. But second-order knowledge can be a rare skill.
Yes, this exactly. Donald Rumsfeld got no end of flack for his comment (distilled here) "there are known knowns, known unknowns, and unknown unknowns", but it's actually a great statement - in this case, there are some people who know they know how to make captchas, some people who know they don't, and some people who don't know that they don't know.
Many programmers have no idea how a CAPTCHA is supposed to work. It never occurs to them to think though how someone would break it. Someone tells them the client wants a CAPTCHA, they go "oh yeah, that's those weird letters on the screen", and are probably pretty proud of how they did it.
Don't believe me?
Think about how often you see obvious SQL injection problems - the same (lack of!) thought process is responsible for both.
You are assuming that the client knows what a CAPTCHA is. Probably the manager at the client-side said "Oh yeah, before I forget to mention it, add that funny image you see on websites - you know, the CAPTCHA thing, a guy at my gym said it improves security. We definitely want good security in this project!".
Completely OT: I find it interesting that this post and several other HN posts this week are hosted on Google Plus. I definitely would not have predicted that G+ would encroach on the LiveJournal/Tumblr space.
On a similar tangent to your OT post: we're getting to the point where seeing (plus.google.com) would be useful, since it conveys quite a different meaning to me from (google.com).
Actually, yes, that did the trick for me; thanks! (Though I still think it's a feature that everyone would appreciate, so if it could go into news.yc, that would be even better.)
Isn't that just a side effect of using Blogger? I know why people demanded that feature for Blogger (imaginary effects on PageRank), but I imagine Google has a better algorithm than looking at text in the URL for content that they host. For example, they can find the post's title right in the database.
If anyone ever wondered what the phrase "cargo cult science" referred to, this is a prime example. They're going through all the motions, but sadly their understanding of the universe is gratuitously flawed.
+1 for cargo cults: http://en.wikipedia.org/wiki/Cargo_cult. Its a great idea to keep in mind for a creator/designer/programmer. People/users/everyone all too often intuit through imitation.
On a forum I run (phpbb3) I eliminated 99% of the spam by adding 1 field that says "enter 42 here to prove you are human". No image, no hidden field, nothing.
We still get the occasional spammer but the real problem was our phpbb3 board showing up in the automated spam programs. As soon as we were slightly different than the default install, nearly all the spam stopped.
The interesting thing was that even the built-in captcha didn't stop the spam--it was worth cracking since everyone uses it.
Yeah, even recaptcha is broken. A new board I helped set up at my company got some spam before even being publicly announced!
On my blog I generate two random sequences of characters and tell the user to join them together without a space. This seems to have worked really well. (Though in the past I've also had static strings like "join 'bow' and 'ser' together" or "join 'doc' and 'tor' together".) I used to have the addition challenge like the GP but it was broken. My comment form was slammed with hits, so I rate-limited attempts, but a few still got through (since it's actually not a big set of responses to go through and you can defeat rate limits). That's when I implemented my string scheme and changed the comment form submission url (which only lives in Javascript now), haven't had a spammer get through yet.
On another forum I used to moderate (I think it was an Invision Powerboards one) I fixed it with a second field asking something like "What makes things fall down? gravity or noodles?" And if they entered gravity it would let them register. It lasted a few years, then a few randomly got in but by that time the forum had died.
What I loved was when I signed up some time ago they had given me a partial derivative with a single variable, telling me what the variable was. Meaning that the answer was 0. Some of them look REALLY complex but they're actually far simpler than they appear, except for the fact that it'd be incredibly difficult to break them in practice given the variation they produce.
When you design solution, you have to decide if you're protecting against targeted or not targeted attack. It's not all just "spam".
If your concern are only dumb, fully-automated bots not targeting your site specifically (which is true for the bottom 99.5% of the web) then you don't need CAPTCHA.
2 and 3 are great for non-targeted attack. 1 is a very weak protection against targeted attack and it's likely an overkill unnecessarily burdening users.
2 and 3 are decent, as long as you don't have commenters trying to discuss something spammy (depends on the site community). #1 only works because your site isn't big enough for anybody to specifically target, though. I'm not saying it's bad (so long as it works, it's by definition at least "good enough"), just don't expect it to scale.
I should note that registered users get to skip the captcha. Right now the site gets around 1,500 visitors and 3,500 pages a day, and growth has been steady and incremental for some years.
We wanted to do something similar on a site I was involved with.
Unfortunately it wasn't allowed because the site owner pointed out that the market the site was aimed at had a reasonable number of people with connotative difficulties - ie, they struggled to follow multi-step instructions.
(Yes, this does mean that computers are able to solve a problem that is supposed to identify a human much better than some humans.)
I've seen similar systems, such as "Which one of these four images is a puppy?". I think the problem is that the set has to be small, so it ends up being a multiple choice quiz. With one correct answer out of four or five choices, it is very easy to brute force.
I have a few sites only getting about 1k visitors a month and #1 does reduce the spam a bit, but I still get 2-3 submissions a day, and I would not say these are targeted at all, just mass spam bots.
As always, one of the most interesting part of truly great CAPTCHA systems is that they are advancing the state of the art in image recognition. But on the other hand we still have scams like this, and no real solutions.
A few years ago, or so i think, people went all crazy talking about a replacement for captcha's: Show a range of images, and make the user pick the image described by a block of text.
Because the math doesn't work. Most "next-gen" captcha fundamentally fail (by orders of magnitude) one of the many pillars that make captchas scale....
1. Is it trivial for a human to answer correctly? This affects growth.
2. Can humans do it quickly? This affects growth.
3. How is the random guess-rate? This better be abysmal.
4. How good is the “opposing” technology?
5. How is the guess rate of a sophisticated attacker, using said technology?
6. How much human input is required to create your captcha? You better be asymptotically better than human-solving the captcha.
7. What are the cultural and accessibility issues?
I remember suggestions of using computing power to slow down guess-rates. Probably related to bitcoins. However, it doesn't work since some users don't seek better computer performance.
Any CAPTCHA scheme that can be solved by enumeration of all possible answers is a failure, because there are cost effective ways to hit a CAPTCHA over and over again, with cheap humans, and build the enumeration table. This is where the "pick the image with a cute thing" in it scheme falls down. In this case, once the enumeration of description -> image(s) is determined, you lose.
Any scheme that involves humans some how creating tags or labeling images or writing text will generally be enumerable as well, because they can trivially out-manpower you.
Also, many CAPTCHA schemes use a model of spammer in which the spammer isn't permitted to be clever. If there is a pattern, in the real world the spammer is "allowed" to exploit it. There are 2^64 different ways to add two 32-bit numbers to each other, but that doesn't mean that you can beat a spammer just by asking the user to do a simple addition, because when I say "enumerate" I mean it more in the computer science sense, not the literal sense. They can and will create something that parses the problem and does it, so for instance for my stupid "add two random 32-bit numbers" example the CAPTCHA is actually easier for a computer than a human.
CAPTCHAs are hard and getting steadily harder... at least, if you require them to work. Security theater is easy.
If you only have a limited collection of images to pick from, then bots could get decent scores by picking at random. A better approach might be to ask users to pick matching images (ie. 2^N possible choices).
What would the system use for its corpus of images and text descriptions? The corpus would have to be significantly larger than what any given attacker could manually identify. Once an attacker has manually identified an image+text combination, they could store the combination and use it to solve any future CAPTCHAs with the same image+text.
On the subject of terrible captcha systems. I found the following gem while looking for OSS games for linux:
"You are born into WHAT? (answer is one english word)* [1]
It is not entirely clear to me what the expected answer is. A google search for "you are born into" does not return any answer that is clearly correct. If I had to guess I would go with "sin" but I am hoping that nobody would be so ignorant as to design a captcha system that assumes a certain cultural/religious background.
A slightly less clueless (but still clueless) approach to CAPTCHA design is to 1) make the CAPTCHA case-sensitive, 2) use letters for which the lower-case representation is very similar to upper-case, and/or use both zero and the letter O, 1 and the letter l, and so on, 3) use an image munging algorithm that makes it next to impossible to disambiguate the cases in 2).
The problem with captchas is they have to be readable to humans.
Sure, a captcha of "lI0Ol1o" would would probably be unreadable to a computer ... but it would be to a human too.
We're quickly approaching the point that image recognition is getting as good at solving image captchas as humans are, and when we do, we'll need to find some other way to do it.
Which is exactly why they added those. Premium accounts do not require the user to enter a captcha for every download. So every user who was annoyed by the "cats captcha" was a potential customer.
What I think is cool are the captchas that make fake words that actually look like they could be real words (as opposed to a random string of text). Makes it easier for a human to read and figure out, but no easier for a bot. I dont know how they do that.
Imagine picking letters with the right frequencies. Now, instead of doing that, pick pairs of letters, with the right frequency, so that each pair "chains" with the previous. If you have good pair frequency data, you can do longer than pairs and get even closer to English.
I dislike like long, nonsensical captchas that confuse people, it's totally annoying. A few years ago i used a 5 digit captcha, but in the background i added faded small letters in various angles.
Unfortunately the faded small letters probably did not make your captcha more difficult to crack. It's relatively simple to remove all grey pixels from the image before OCRing it.
However having your own custom captcha probably helped quite a bit. I'm guessing spammers aren't going to bother writing custom software to decode your captcha unless you have a major site.
Yes of course. But you can choose certain colors, not gray that look faded to the eye but are not in rgb values. In any case, captchas are always mediocre solutions.
I can't believe Google is criticizing how Sony does CAPTCHAs when I've been complaining for years about how difficult Google's are to read. But as to their point, based on Sony's recent security issues, it doesn't sound like Sony has a very good IT department.
An example would be https://sso.state.mi.us/som/dch/enroll/reg_page1.jsp (You can enter any fake name/email, this is only step one of the registration script. The next page has the captch in question.)
The captcha is plaintext, right on the page. The data from the captcha isn't even sent to the server, it is processed locally via JavaScript.
So, the bots don't even have to do anything, but humans have to input a meaningless number...