Helping people with computers... one answer at a time.

"Retype the word" or more properly CAPTCHA tests are designed to prove you're not a computer - and that difficult to read text is part of the test.

I fully understand the theory behind using "gotcha" symbols for many online processes. But if the gotcha is a picture of numbers and letters, then WHY must they make them so difficult to read? I have to regenerate these things over and over to get one that is readable. If it's a photo, then why make the symbols all twisty, blurred, and faded?

While it might feel like a "gotcha", they're actually called a CAPTCHA, which is an acronym for "Completely Automated Public Turing test to tell Computers and Humans Apart".

Yep, it's a "prove you're human" test.

And all that twisty, blurry, faded stuff you're complaining about? That's actually kinda the point.

That's the test.

The problem that CAPTCHA's avoid is actually an important one: preventing computers from doing things in an automated fashion - like creating millions of fake email accounts or posting comment spam to websites. By forcing this test which computers cannot (yet) pass, the activity that is being protected can be performed only by a real, live person.

The limitation that these tests take advantage of is that computers can't read.

Now, technically that's incorrect - optical character recognition has come a long way. Computer OCR software can, with a very high degree of reliability, take a photograph or scan of text printed on a page and "read" it - turn it into the computer representation of the text that the page contains, as opposed to a picture of that text.

That's actually pretty cool, and very handy for many applications.

However, there are limits. Even with clear copies of the text a computer has a difficult time with some characters (the letter 'l' versus the number '1' in many typefaces, for example), and thus can still get things wrong.

When things get blurry, twisted or faded, current computer algorithms try and figure out what those characters are and fail miserably. It just can't figure out what those characters are.

You and I, on the other hand, can.

Usually.

So when we get the answer correct where a computer couldn't possibly it "proves" we're human.

For now.

As computer technology advances, techniques will I'm sure be developed that will allow the computer to correctly interpret today's CAPTCHA's. What happens then I don't know.

A couple of random notes on CAPTCHA's:

  • One way that they're often defeated is to hire real live humans - often cheaply, overseas.

  • Another way that some are bypassed is by exploiting weaknesses in a particular implementation. For example, if one type of CAPTCHA always selects from one of 100 different scrambled words, then one need only have a real human interpret each one once, and then simply let the computer compare pictures - something it is good at.

  • My favorite CAPTCHA, when I use one, is reCAPTCHA, which presents two words in random order: one of which is a real test, the other is a word that is part of a book digitization project. (Their about page has not only a good overview of CAPTCHA, but also how they're using it in reCAPTCHA.)

  • CAPTCHAs can have problems - specifically for people with poor or no eyesight. In most cases, an audible CAPTCHA equivalent is made available where you type in what you hear spoken.

  • Even in normal cases, as you're seeing, sometimes CAPTCHAs are too hard, too blurry, or too unreadable even for humans. Fortunately, most also include some kind of "show me something else" alternative.

But unfortunately, the bottom line is that the blurriness, and the difficulty is indeed the point.

And CAPTCHAs or something much like them will be around for quite some time - probably as long as there are spammers and those who would do other malicious things en masse, given the opportunity to automate the process.

Article C4253 - March 31, 2010 « »

Share this article with your friends:

Share this article on Facebook Tweet this article Email a link to this article
Leo Leo A. Notenboom has been playing with computers since he was required to take a programming class in 1976. An 18 year career as a programmer at Microsoft soon followed. After "retiring" in 2001, Leo started Ask Leo! in 2003 as a place for answers to common computer and technical questions. More about Leo.

Not what you needed?

4 Comments
Frances
March 31, 2010 5:58 PM

"In most cases an audible CAPTCHA equivalent is made available where you type in what you hear spoken."

I've tried this option a few times but the result is usually even less comprehensible than the visual.

Sometimes I just give up.

Ken B
April 2, 2010 3:11 PM

I've heard that "the bad guys" can use even cheaper human labor to bypass CAPTCHA tests -- free use of spam victims.

They grab the CAPTCHA image, and display it to a human who clicks on their spam link. (As if the CAPTCHA image were theirs.) The victim then decodes the image, and "the bad guy"'s scripts then pass that on to the target computer.

Voila! Bad guy's scripts now bypass the CAPTCHA test.

Patrick Coppae
April 6, 2010 11:49 AM

I like those twisty letters and numbers.
Maybe thay should include a few upside down alphanumericals.
Figure that out if you're not human.

Pedro
April 7, 2010 10:38 AM

Just a correction on how Recaptcha works: both words are from scanned books. One of them has already been verified. So, "If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct."

And that's how you are contributing to OCR antique books and and old editions of the New York Times ;)

Comments on this entry are closed.

If you have a question, start by using the search box up at the top of the page - there's a very good chance that your question has already been answered on Ask Leo!.

If you don't find your answer, head out to http://askleo.com/ask to ask your question.