Perl uses TESSERACT-OCR to implement authentication code recognition tutorials _perl

Source: Internet
Author: User

First, TESSERACT-OCR is what

an OCR Engine that is developed at HP Labs between 1985 and 1995 ... and no W at Google
based on the Leptonica (http://leptonica.com/) graphics processing library open source graphic recognition engine.
Support Linux, Windows, MAC platforms,
Support. NET, C + +, Python, Java, and other development languages: Https://code.google.com/p/tesseract-ocr/wiki/AddOns
Project Address: https://code.google.com/p/tesseract-ocr/

Two, using method

download installation: https:// Tesseract-ocr.googlecode.com/files/tesseract-ocr-setup-3.02.02.exe
Note The path directory, mathematical symbols, and language options when you install, and select on demand.
Execute: Tesseract yourpic.png res
Picture yourpic.png contents are identified and stored in Res.txt
For more precise identification you can go to the project address to download the corresponding languages language Tessdata
Example:
Simplified Chinese https://tesseract-ocr.googlecode.com/files/tesseract-ocr-3.02.chi_sim.tar.gz
Traditional Chinese
Download chi_sim.traineddata Copy to Tesseract-ocr\tessdata after decompression
:
tesseract yourpic.png ENG Use default Eng language Pack
Tess Eract yourpic.png sim-l Chi_sim "Use Chi_sim language packs
tesseract yourpic.png tra-l Chi_tra" Use Chi_sim language packs
To select the closest real data, convenient Fix

Later

Third, advanced use training

A handful of training TESSERACT-OCR Chinese documents
Http://yy-programer.blogspot.tw/2012/08/training-tesseract-ocr-301.html
For high-precision needs of the need to study, the daily civilian level, the default identification plus late correction can be.

Iv. adsorption agents for application examples

Agent adsorption for several proxy list pages for http://www.proxyfire.net/

Don't say much directly on the code,
Pf.bat

Copy Code code as follows:

pf.pl http://www.proxyfire.net/index.php?pageid=eliteproxylist Elite.txt
pf.pl http://www.proxyfire.net/index.php?pageid=anonymousproxylist Anony.txt
pf.pl http://www.proxyfire.net/index.php?pageid=transparentproxylist Trans.txt
pf.pl http://www.proxyfire.net/index.php?pageid=socks4proxylist S4.txt
pf.pl http://www.proxyfire.net/index.php?pageid=socks5proxylist S5.txt
Type *.txt > All.tmp
Del *.txt/s/q
ren all.tmp all.txt
@pause

pf.pl
Copy Code code as follows:

Use strict;

Our $url = $ARGV [0];
Our $file = $ARGV [1];

my $res = undef;
my @tmp = undef;
my @pxy = undef;

' Wget $url-Q-o ___html ';
Open FH, "<___html";
@tmp =;
Close FH;
$res = Join (", @tmp);
Undef (@tmp);
' del ___html/s/Q ';

@tmp = ($res =~/]+") ><\/td> (\d+) ' http://www.proxyfire.net '. $tmp [$i], ' Port ' => $tmp [$i +1]};
$i = $i + 1;
}

For (my $i =0; $i < @pxy; $i + +) {if (Length (${$pxy [$i]}{ip}) >0)
{
' Echo off & wget ${$pxy [$i]}{ip}-q-o ___png ';
' Tesseract ___png ___-L Chi_tra ';

my $txt = undef;
Open FH, "<___.txt";
$txt =;
Close FH;
if (length ($txt) >11)
{
$txt =~ s/\s+//g;
$txt =~ s/Day/8/g;
$txt =~ s/昍/88/g;
$txt =~ s/s0/60/g;
$txt =~ s/s1/61/g;
$txt =~ s/s2/62/g;
$txt =~ s/s3/69/g;
$txt =~ s/s4/64/g;
$txt =~ s/s5/65/g;
$txt =~ s/s7/67/g;
$txt =~ s/s8/68/g;
$txt =~ s/s9/69/g;
$txt =~ s/0s/06/g;
$txt =~ s/1s/16/g;
$txt =~ s/2s/26/g;
$txt =~ s/3s/96/g;
$txt =~ s/4s/46/g;
$txt =~ s/5s/56/g;
$txt =~ s/6s/66/g;
$txt =~ s/7s/76/g;
$txt =~ s/8s/86/g;
$txt =~ s/9s/96/g;
$txt =~ s/ss/66/g;
$txt =~ s/\.s/\.6/g;
${$pxy [$i]}{ip} = $txt;

My $bak 1 = $txt;
My $bak 2 = $txt;
$bak 1 =~ s/13/19/g;
$bak 1 =~ s/\.32\./\.92\./g;
$bak 1 =~ s/\.33\./\.99\./g;

$bak 2 =~ s/19/13/g;
$bak 2 =~ s/\.243/\.249/g;
$bak 2 =~ s/203\./209\./g;

Open Fhx, ">> $file";
Print Fhx ${$pxy [$i]}{ip}. ":". ${$pxy [$i]}{port}.] \ n ";
Print Fhx $bak 1. ":". ${$pxy [$i]}{port}. " \ n ";
Print Fhx $bak 2. ":". ${$pxy [$i]}{port}. " \ n ";
Close Fhx;

}
my $txt = undef;
}
}
' Del ___*/s/q ';
Undef ($url);
Undef ($file);
Undef ($res);
Undef (@tmp);
Undef (@pxy);

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.