CAPTCHAs is an annoying technology. According to data statistics, users around the world need to complete 0.1 billion CAPTCHA tests every day. ReCAPTCHA is a project launched by the University of Carnegie Mellon to digitize the old books with the catptcha technology. It is estimated that the technology can complete 160 books a day.
The reCAPTCHA technology, reCAPTCHA, has been deployed on 40000 websites.
The basic principle is that for optical character recognition (OCR) software, their recognition capabilities are limited, especially those old books or residual books that are not clearly printed, humans can easily identify
Characters that cannot be recognized by OCR. For such a text, the success rate of human recognition can reach 99%, and OCR software can only reach 80%.
ReCAPTCHA combines traditional OCR with a system similar to Amazon's Mechanical Turk. Each word is first identified by two different OCR software. If the two OCR results are inconsistent, the word will be marked as "Unrecognized ", the unrecognized text is sent to the reCAPTCHA system and made into CAPTCHA text for user recognition.
Note: The original Article does not clearly explain how the user completes CAPTCHA recognition, because CAPTCHA
The system itself must know the correct answer, and the current problem is that the system itself cannot identify it. I guess the mechanism should be like this. First, in the beginning, any recognition results provided by the user are correct or not.
Yes, but the system records the recognition results of each user, and finally accumulates a certain number of identification results. The system uses the results recognized by most people as the control word.
To verify future tests. As mentioned in the original article, the system initially provided a known control word (known control word),
How known control word came from is not described. First, the known control word cannot be accurate, otherwise it is unnecessary.
ReCAPTCHA. Second, since control word
It is not accurate. There is only one way to determine whether a user passes the test. That is, at the beginning, any recognition results submitted by the user will pass.
In general, reCAPTCHA achieves a 99.1% success rate, which is almost a success rate for one person to type and another person to recognize. ReCAPTCHA technology is still in concept, but developers believe it will recognize about 160 books every day.
What's amazing about this project is that it uses the mental power of human beings that are wasted. Other similar projects are based on the same idea, such as fold. it converts protein folding computing into a game, and Google's image labeler project uses the brainpower of a large user group to recognize images on the Internet.
International Source: http://www.readwriteweb.com/archives/recaptcha_stopping_spam.php
Source: comsharp CMS official website