Let's talk about some ideas about Python verification code recognition and some ideas about python.

Source: Internet
Author: User
Tags tesseract ocr

Let's talk about some ideas about Python verification code recognition and some ideas about python.

Search in baidu using python and the keyword "Verification Code". You can find many articles about verification code recognition. I have taken a general look at the following main methods: one is to process images and then use the font feature matching method. The other is to create a character correspondence dictionary after image processing, another type is recognition using ocr module directly. No matter what method is used, you must first process the image, so try to analyze the verification code below.

I. Image Processing

The main factor influencing this verification code is the intermediate curve. First, consider removing the curve from the image. Two algorithms are considered:
The first is to first obtain the position of the curve header, that is, the position of the black spot when x = 0. Then move the value of x backward, observe the position of the black spots under each x, and determine the distance between the two adjacent black spots. If the distance is within a certain range, it can be basically determined that the point is the point on the curve, and finally all the points on the curve are painted white. After trying this method, the result is very general, the curve cannot be completely removed, and the capacity will remove the lines of characters.
In the second case, the density of the point in the unit area is used for calculation. Therefore, the number of interior points per unit area is calculated first, and the number of interior points per unit area is removed from the area smaller than the number specified by a certain region. The remaining part is basically the verification code character. In this example, 5*5 is taken as the unit range for ease of operation, and the standard density of the point in the unit area is adjusted to 11. Effect after processing:

2. Character Verification

Here, I am using pytesser for ocr recognition. However, due to the irregular characters of such verification codes, the verification results are not very accurate. Which of the following is a good solution? I hope I can give some advice.

Iii. Preparation and code example

1. PIL, pytesser, tesseract

(1) install PIL: http://www.pythonware.com/products/pil/
(2) pytesser: http://code.google.com/p/pytesser/. after the download is decompressed, it is directly stored in a folder with the same code and can be used.
(3) download Tesseract OCR engine: Notebook.

2. Specific Code

# Encoding = UTF-8 ### import Image, ImageEnhance, ImageFilter, ImageDrawimport sysfrom pytesser import * # number of points in the calculation range def numpoint (im): w, h = im. sizedata = list (im. getdata () mumpoint = 0for x in range (w): for y in range (h): if data [y * w + x]! = 255: #255 is white mumpoint + = 1 return mumpoint # Calculate the density of the point in the 5*5 range def pointmidu (im): w, h = im. sizep = [] for y in range (0, h, 5): for x in range (0, w, 5): box = (x, y, x + 5, y + 5) im1 = im. crop (box) a = numpoint (im1) if a <11: # if the range of 5*5 is less than 11 points, convert all the points to white. For I in range (x, x + 5): for j in range (y, y + 5): im. putpixel (I, j), 255366im.save(r'img.jpg ') def ocrend (): # recognize image_name = "img.jpg" im = Image. open (image_name) im = im. filter (ImageFilter. medianFilter () enhancer = ImageEnhance. contrast (im) im = enhancer. enhance (2) im = im. convert ('1') im. save ("1.tif") print image_file_to_string ('1. tif ') if _ name __= =' _ main _ ': image_name = "1.png" im = Image. open (image_name) im = im. filter (ImageFilter. DETAIL) im = im. filter (ImageFilter. medianFilter () enhancer = ImageEnhance. contrast (im) im = enhancer. enhance (2) im = im. convert ('1') # a = remove_point (im) pointmidu (im) ocrend ()

My final recognition rate for this method is really not high. I hope you can give me some advice on which expert has a good idea or practice!

Articles you may be interested in:
  • Example code of a Chinese Verification Code randomly generated by Python
  • Python generates Verification Code instances
  • Python Verification Code Recognition Method
  • Python verification code identification and processing instance
  • Python website verification code recognition
  • Detailed explanation of Python verification code recognition

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.