In fact, the identification of authentication code involves a lot of aspects of the content, start difficult, but after the start, can be extended and very wide, can play very strong, the sense of achievement is very sufficient, to this interested friends follow the small knitting together to learn.
Depend on
sudo apt-get install python-imaging
sudo apt-get install TESSERACT-OCR
pip install pytesseract
Using Google OCR to identify the verification code
From PIL import image
Import pytesseract
image = Image.open (' v1.jpg ')
Vcode = pytesseract.image_to_string ( Image)
Print Vcode
But pytesseract
its recognition rate is not high, and the general site's verification code with a large number of interference elements. ( ̄▽ ̄) "
So we first need to do the verification code denoising.
For single pixel jamming line and jamming point, we can scan the whole image and examine the color of eight pixels near each pixel, if the number is more than a certain value, then the point is discrete point and need to be removed.
You can also try setting thresholds to direct the validation code to binary values.
Here are two verification codes on the school web site.
We can see that the CAPTCHA has a single pixel jamming point, so we need to try to get rid of it. But after repeatedly refreshing the verification code, found that the verification code
1. Only the addition operation
2. Addition of up to two digits
3. The text part must be red (255,0,0)
With the above information, we can judge that the generation algorithm of this verification code is flawed.
Import Image from
numpy import *
import pytesseract
im = Image.open (' 1.png ')
im = Im.convert (' RGB ')
#拉长图像, easy to identify.
im = Im.resize ((200,80))
a = array (IM) for
i-xrange (Len (a)): for
J in Xrange (Len (a[i)):
if A[i] [j] [0] = = 255:
a[i][j]=[0,0,0]
else:
a[i][j]=[255,255,255]
im = Image.fromarray (a)
im.show ()
Vcode = Pytesseract.image_to_string (IM)
Using the above script we can binary image, using Google OCR to identify. eval()
the expression is evaluated again.
Summarize
Python Verification code recognition of the content to this basic introduction, I hope this article for everyone's study or work can help, if there is doubt you can message exchange.