Python + Selenium + PIL + Tesseract automatic identification verification code for one-click Login, piltesseract

Last Update:2017-09-27 Source: Internet

Author: User

Tags sleep function

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Python + Selenium + PIL + Tesseract automatic identification verification code for one-click Login, piltesseract

This article introduces Python + Selenium + PIL + Tesseract automatic identification verification code for one-click Login and share it with you as follows:

Python 1, 2.7
IDE Pycharm 5.0.3
Firefox: 47.0.1
Selenium
PIL
Pytesser
Tesseract

Nonsense

I believe that every script has its own story. My script comes from my GRD educational administration system. Every time I log on, even if all the input is correct, the First Login will definitely fail! I don't know what the designer thinks? Is it for anti-crawling mechanism? Do you think I can't climb tm once? If I'm not happy, believe it or not, visit you for over 1000 seconds, so that everyone can't go up ~ Cough, a little off the question.

Talk is cheap, Show me the code

Automatically identify the verification code to simulate login. Note that it is automatic, one-click Login, instead of scanning the verification code, and then manually enter the login! First, implement the code!

#-*-Coding: UTF-8-*-# Author: husky said Meow from selenium import webdriverimport osimport pytesserimport sys, timefrom PIL import Image, ImageEnhance # shift + tab multiple line indent (left) reload (sys) PostUrl = "http://yjsymis.hrbeu.edu.cn/gsmis/indexAction.do" driver = webdriver. firefox () driver. get (PostUrl) I = 0 while 1: # sb log on to the system. Even if you cannot log on to all the messages, You need to log on twice or more I = I + 1 try: elem_user = driver. find_element_by_name ('id') elem_psw = driver. find_elem Ent_by_name ('Password') elem_code = driver. find_element_by_name ('checkcode') doesn t: break # ----------------- perform a region on the verification code. Well, this method is a bit low ---------------- driver. get_screenshot_as_file ('C: \ Users \ MrLevo \ image1.jpg ') # Better understanding of im = Image. open ('C: \ Users \ MrLevo \ image1.jpg ') box = (516,417,564,437) # Set region = im. crop (box) # region is a new image object. # Region. show () # The display will be occupied, so comment out region. save ("e:/image_code.jpg") # ----------------------------------------------------------------- # -------------- ImageGrab. grab () can be directly located in the region, but there is a bug, not all ------- ''' bbox = (780, 0, 1020,800) img = ImageGrab. grab () img. save ("E: \ image_code.jpg") img. show () ''' # --------------------------- manually enter the verification code: It is applicable to a wider range, but not convenient enough ------------------------------ ''' response = opener. open (CaptchaUrl) picture = response. read () with open ('e:/image.jpg ', 'wb') as local: local. write (picture) # Save the verification code to the local machine # ------------ for recognition that cannot be performed using pytesser + ocr, manually open the image and manually enter -------- # Open the saved Verification code image input # SecretCode = raw_input ('Please enter the code :') # ---------------------------------------------------------------------- ''' # ---------------------- Image Enhancement + automatic identification of simple verification codes --------------------------- # time. sleep (3) prevents image recognition before def image_file_to_string (file): cwd = OS. getcwd () try: OS. chdir ("C: \ Users \ MrLevo \ Anaconda2 \ Lib") return pytesser. image_file_to_string (file) finally: OS. chdir (cwd) im = Image. open ("E: \ image_code.jpg") imgry = im. convert ('l') # image enhancement, binarization sharpness = ImageEnhance. contrast (imgry) # Contrast enhancement sharp_img = sharpness. enhance (2.0) sharp_img.save ("E: \ image_code.jpg") # http://www.cnblogs.com/txw1958/archive/2012/02/21/2361330.html # imgry. show () # This is used for distributed testing. The whole program needs to be commented out # imgry. save ("E: \ image_code.jpg") code = pytesser. image_file_to_string ("E: \ image_code.jpg") # code is the str-Type print code of the recognized image number # print code to check whether the recognition is correct # else if I <= 2: # based on my own login features, here is a Verification code failure, refill all, failed twice, re-fill the verification code elem_user.send_keys ('s315080092 ') elem_ps1_send_keys ('xxxxxxxxxx') elem_code.send_keys) click_login = driver. find_element_by_xpath ("// img [@ src = 'main _ images/loginbutton.gif ']") click_login.click () # time. sleep (5) # Wait for a moment on the search result page # driver. save_screenshot ('C: \ Users \ MrLevo \ image.jpg ') # driver. close () # driver. quit ()

Show Gif (:

The first time I put the animation on, I was excited ~

Problems and Solutions

1: The verification code is obtained. Because the verification code is dynamically refreshed after each refresh, if the cookie is not used (I will not use the cookie too much), no elements can be caught, in the next article, I used cookies to log on, but instead of calling a browser, this is a long journey.

1: solution: driver is used. the get_screenshot_as_file method is used to perform all-round operations, and then the crop in PIL is used to perform subsequent operations. Some may say why ImageGrab is not used. run the grab () function. Okay, because this function is available on win10! The full graph cannot be cut !! I tried it to know, btw, my resolution is 1920x1080, is it related to the resolution? I have not succeeded in this process for a long time. I finally thought about it. I cut it all and check it out. The result is only half of tmd. I can't find the desired part!

2: High verification error rate

2: The secondary node is used for processing, and the improved recognition accuracy is not a bit of two points: see the picture comparison, the left 1 is the source image captured with cookies, the right side is a panoramic view, and then locate, I want to use matlab for image recognition, but it is a little troublesome to call the map after binarization and sharpening...

3: Call tesseract.exe

3: solve the problem by calling tesseract.exe for program execution image recognition. Therefore, you must switch the path to the path with this exe. At the beginning, we thought it was dependent on the package, and the result was not identified! It takes more than an hour to write the verification code for recognition. It is really important to test it separately. Remember it!

4: logon Failure-secondary verification of the mdzz School Educational Administration System

4: solution: Write A while LOOP and throw a large part of the main program into it. The goal is also clear. If the first logon fails, log on again, check whether the try element still exists. The retry T will throw the break end loop, because after successful logon, such as driver. find_element_by_name ('id') does not exist! So when this element cannot be found on the login interface, it indicates that the login is successful, OK, jump out of the loop, and perform the next operation.

5: Why can't I recognize the clearly captured image?

5: solution. I really didn't expect this. I always thought that it may be because the image is not downloaded during the save operation, so the image does not exist in the library, so it cannot be identified, but I use time. the sleep function makes it stop and slow. If it doesn't work, I am speechless. I thought for a long time, probably because the image is occupied! I have an img. show () function. to check whether a standard image has been captured, this image is occupied after show! Just as you cannot delete a word document when editing a word! After I commented out show, everything was feasible. It was really a mistake for me !!

6: the element is in place. Why not perform the operation?

6: solution, this is a bit of a brain, but it is indeed what I met, or remember it, and then scold myself once sb, no click (), how do you let it deal !!! Just like there is an ENTRY for logging in with cookies!

7: After two verification failures, the user name is accumulated repeatedly.

7: In the solution, a variable is added directly to count the number of cycles. It is observed that the login name and user password will be accumulated as long as there are no logins for more than two times, I directly wrote an if statement for judgment. It's all done!

8: difficulty in selecting the cropping area of im. crop (box)

8: solution. Try it several times. I tried it anyway .... Of course, when you click an image to review the elements, you can see the image size. Then, you can know the difference between the horizontal and vertical coordinates, but you have to try it for a wide range of areas. If you have a better way, please advise, the following is the number of my labs, more than 30 times

9: Image cannot be imported, ImageEnhance

9: solution. Because PIL uses a third-party library, the import method is as follows. Check the official documentation. The official description is as follows:
Usefrom PIL import Imageinstead ofimport Image.

10: the element to be typed cannot be found.

10: for this problem, right-click the blank space to be entered and review the elements. find_element_by _ various methods to locate the element. If the input is hidden, what should I do if I cannot find it on the current page? For example, I need to click my library to see the entered account and password, first, find the elements in my library, click them, and then find the elements. In a word, think of yourself as a browser, a "no", and think of python as a browser .....

I also put the code, which is similar in that it is simpler than having a verification code, but it has an additional click operation.

#-*-Coding: UTF-8-*-# Author: husky said Meow from selenium import webdriverimport timeimport sys # shift + tab multi-line indent (left) reload (sys) postUrl = "http://lib.hrbeu.edu.cn/#" driver = webdriver. firefox () driver. get (PostUrl) elem_user = driver. find_element_by_name ('number') elem_psw = driver. find_element_by_name ('passwd') # select my library and click it to see the input account password click_first = driver. find_element_by_xpath ("// ul [@ id = 'imgmenu ']/li [4]") click_first.click () elem_user.send_keys ('s315080092') elem_ps1_send_keys ') # click to log on to click_second = driver. find_element_by_name ('submit ') click_second.click () time. sleep (5) # After logging in, select click_third = driver. find_element_by_xpath ("// * [@ id = 'mainbox']/div/ul/li/a") click_third.click () time. sleep (5) # Wait for a moment on the search result page # driver. save_screenshot ('C: \ Users \ MrLevo \ image.jpg ') driver. close () driver. quit ()

Last

(Although I know I will definitely add it later): Although it is not difficult for everyone to implement this operation on and off for almost two days, it is still a great improvement for myself, selenium has basic concepts and operations, PIL is also used, and ocr is called. Although it looks cool to call firefox to execute operations, however, the execution efficiency and memory usage are a huge internal injury, but as a visual simulated browser login, this is still very brilliant.

The above is all the content of this article. I hope it will be helpful for your learning and support for helping customers.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More