These two days because of work needs, to a website to collect information, one is to simulate the landing, the second is to crack the verification code, this would like to use a third-party pay code, but think of online free codes are also quite a lot of, and then ready to go from the Internet to code down, who know, a lot of people are not, I have Code to do not know, at last not knowing in which Brother Taiwan blog to find a foreign third-party open source OCR, said to be a powerful Google maintenance, C + + development, there is a. NET Package link library, very good!
Project Address: Https://github.com/tesseract-ocr/tesseract
Language library: Https://github.com/tesseract-ocr/langdata
OCR language training: Https://github.com/tesseract-ocr/tessdata
Here's an example of starting a roll:
New C # console, version select. NET 4.5
Tesseract OCR = new Tesseract (), OCR. SetVariable ("Tessedit_char_whitelist", "0123456789"), OCR. Init (@ "d:\ test \ocr\tessdata", "Eng", true);
The first sentence is needless to say, the second sentence is to set the character of the recognition, for example, if you want to identify the verification code is a-z0-9 you write in it OK
The third sentence is to initialize the language training configuration of OCR, a lot of files, just write the file name in front of the decimal is OK
list<word> result = OCR. DOOCR (BMP, Rectangle.empty); if (result.) count<=0)return; string code = result[0]. Text;
The last code is the identification of the verification code, here to verify the code image must be converted to bitmap object only line, remember to release bitmap Object!
Here is my test:
Above is the verification code, the following file name is recognized by the result named saved file! Of course, the verification code is processed, the original verification code picture is Jiangzi:
Verification code picture is too small, OCR can not recognize, and by default, if the white bottom, sunspots can be recognized, so the online verification code must first of their own binary processing and remove the background color, and then to identify!
My treatment here is two value, the picture is enlarged three times times, do not worry about looking at the Sawtooth, OCR can recognize
I enlarged the image twice times, I found that sometimes 8 will be recognized as 3, so I simply zoom in one more times, found that there is no problem, although it seems obvious, but OCR no matter the beauty of the Ugly
Did you follow the case code and run? And then you found out that the program ran an error?
Here, add the uselegacyv2runtimeactivationpolicy= "true" node to the startup node of the program's App. Config, as detailed below:
<?XML version= "1.0" encoding= "Utf-8"?><Configuration> <StartupuseLegacyV2RuntimeActivationPolicy= "true"> <supportedruntimeversion= "v4.0"SKU=". netframework,version=v4.5 " /> </Startup></Configuration>
And then run it, is it possible to identify it, it is so simple!
0 OCR based on 6 lines of code to realize C # verification code identification