Use TESSERACT-OCR to hack website verification code

Source: Internet
Author: User

First of all I have to admit that the focus on TESSERACT-OCR, is directed at the following this article gimmick go, 26 line groovy Code hack website Verification Code
http://www.kellyrob99.com/blog/2010/03/14/breaking-weak-captcha-in-slightly-more-than-26-lines-of-groovy-code/

Of course, after looking to know, originally called the three-party library TESSERACT-OCR ...
http://code.google.com/p/tesseract-ocr/

Nevertheless, in the spirit of Grandpa Deng's "no matter White cat black cat, can seize the mouse is a good cat" principle, while the holiday also began the "word recognition" of the primary research

HP's tesseract has recently been supported by Google and support English letters and numbers, it is said that the degree of recognition is ranked third in the world, and more commendable, the provision of multi-language pack download (including Chinese, accuracy is true ...) ), and bring your own training tool.

After installing and running through the example, the first thought of the application is naturally used for verification Code analysis



According to the instructions, the quality of the images fed into tesseract directly affects the effect of recognition, so simple preprocessing is essential

1. First grayscale, gray value =0.3r+0.59g+0.11b:
Java code
  1. for (int y = miny; y < height; y++) {
  2. For (int x = MinX; x < width; + x + +) {
  3. int RGB = SRCIMG.GETRGB (x, y);
  4. Color color = new color (RGB);  //R,g,b color is obtained according to the int value of RGB.
  5. int gray = (int) (0.3 * color.getred () + 0.59
  6. * Color.getgreen () + 0.11 * color.getblue ());
  7. Color Newcolor = new Color (gray, gray, gray);
  8. Srcimg.setrgb (x, Y, Newcolor.getrgb ());
  9. }
  10. }

Results


2. Followed by grayscale inversion:
Java code
  1. for (int y = miny; y < height; y++) {
  2. For (int x = MinX; x < width; + x + +) {
  3. int RGB = BUFFIMG.GETRGB (x, y);
  4. Color color = new color (RGB);  //R,g,b color is obtained according to the int value of RGB.
  5. Color Newcolor = new Color (255-color.getred (), 255-color
  6. . Getgreen (), 255-color.getblue ());
  7. Buffimg.setrgb (x, Y, Newcolor.getrgb ());
  8. }
  9. }

Results


3. Again is two value, take the average grayscale of the picture as the threshold value, below which all is 0, above this value all is 255:
Java code
  1. for (int y = miny; y < height; y++) {
  2. For (int x = MinX; x < width; + x + +) {
  3. int RGB = BUFFIMG.GETRGB (x, y);
  4. Color color = new color (RGB);  //R,g,b color is obtained according to the int value of RGB.
  5. int value = 255-color.getblue ();
  6. if (value > average) {
  7. Color Newcolor = new Color (0, 0, 0);
  8. Buffimg.setrgb (x, Y, Newcolor.getrgb ());
  9. } Else {
  10. Color Newcolor = new Color (255, 255, 255);
  11. Buffimg.setrgb (x, Y, Newcolor.getrgb ());
  12. }
  13. }
  14. }

Results


See how it works, eliminating the steps of sizing, median filtering, and noise removal.

The above completes the picture preprocessing work; Tesseract does not have an open API, pure command line invocation:
Java code
  1. list<string> cmd = new arraylist<string> (); //array to hold command line arguments
  2. Cmd.add (Tesspath + "\\tesseract");
  3. Cmd.add ("");
  4. Cmd.add (Outputfile.getname ()); //Output file location
  5. Cmd.add (lang_option); //Character categories
  6. Cmd.add ("Eng");  //English, find the corresponding dictionary file in Tessdata.
  7. Processbuilder PB = new Processbuilder ();
  8. Pb.directory (Imagefile.getparentfile ());
  9. Cmd.set (1, Tempimage.getname ()); //Put the picture file location in the first position
  10. Pb.command (CMD); //execute command line
  11. Pb.redirecterrorstream (true);  //Notifies the process generator whether to combine standard errors and standard output to save process errors.
  12. Process process = Pb.start (); //Start the execution process
  13. int w = process.waitfor ();  //The current process stops, until process stops executing, returning execution results.


The result output indicates that everything is OK

Of course, really want to use good TESSERACT-OCR, but also need to its powerful training tools, is something ...

In addition, about the word recognition, remove as a crack verification code of the counter-means, we also have relevant applications?

    • Size: 1.1 KB
    • Size: 1.3 KB
    • Size: 1.3 KB
    • Size: 1.4 KB

Use TESSERACT-OCR to hack website verification code

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.