Examples of Python verification code recognition _python

Source: Internet
Author: User

In fact, the identification of authentication code involves a lot of aspects of the content, start difficult, but after the start, can be extended and very wide, can play very strong, the sense of achievement is very sufficient, to this interested friends follow the small knitting together to learn.

Depend on

sudo apt-get install python-imaging
sudo apt-get install TESSERACT-OCR
pip install pytesseract

Using Google OCR to identify the verification code

From PIL import image
Import pytesseract
image = Image.open (' v1.jpg ')
Vcode = pytesseract.image_to_string ( Image)
Print Vcode

But pytesseract its recognition rate is not high, and the general site's verification code with a large number of interference elements. ( ̄▽ ̄) "

So we first need to do the verification code denoising.

For single pixel jamming line and jamming point, we can scan the whole image and examine the color of eight pixels near each pixel, if the number is more than a certain value, then the point is discrete point and need to be removed.

You can also try setting thresholds to direct the validation code to binary values.

Here are two verification codes on the school web site.

We can see that the CAPTCHA has a single pixel jamming point, so we need to try to get rid of it. But after repeatedly refreshing the verification code, found that the verification code

1. Only the addition operation

2. Addition of up to two digits

3. The text part must be red (255,0,0)

With the above information, we can judge that the generation algorithm of this verification code is flawed.

Import Image from 
numpy import * 
import pytesseract 
im = Image.open (' 1.png ') 
im = Im.convert (' RGB ') 
#拉长图像, easy to identify.
im = Im.resize ((200,80)) 
a = array (IM) for 
i-xrange (Len (a)): for 
J in Xrange (Len (a[i)): 
  if A[i] [j] [0] = = 255: 
    a[i][j]=[0,0,0] 
  else: 
    a[i][j]=[255,255,255] 
im = Image.fromarray (a) 
im.show () 
Vcode = Pytesseract.image_to_string (IM) 

Using the above script we can binary image, using Google OCR to identify. eval()the expression is evaluated again.

Summarize

Python Verification code recognition of the content to this basic introduction, I hope this article for everyone's study or work can help, if there is doubt you can message exchange.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.