Python crawler simulator login with verification code website

Source: Internet
Author: User
Crawling a Web site often involves the need to sign in, which is how you need to use a simulated login. Python provides a powerful URL library, and it's not hard to do that. Here is a simple example of logging in to the school administration system.

The first thing to understand is the role of cookies, which are data that some websites store on the user's local terminal in order to identify the user and track the session. So we need to use the Cookielib module to keep cookies on our website.

This is the address to be logged in http://202.115.80.153/and verify code address http://202.115.80.153/CheckCode.aspx

It can be found that this verification code is dynamic update every time the opening is different, generally this code and cookies are synchronized. Secondly want to identify the verification code is definitely a thankless thing, so our idea is to first access the verification code page, save the verification code, get a cookie to log in, and then directly to the login address post data.

First, the request and header information for the post is analyzed via the grab kit or Firefox or Google browser. Take Google Chrome as an example.

It can be seen that the URL to post is not a page to visit, but a http://202.115.80.153/default2.aspx,

Where you need to submit the form data in txtUserName and TextBox2 separate user names and passwords.

Now go directly to the key section on the code!!

Import urllib2import cookielibimport urllibimport reimport sys "Analog login" "Reload (SYS) sys.setdefaultencoding (" Utf-8 ") # Prevent Chinese error Captchaurl = "http://202.115.80.153/CheckCode.aspx" PostURL = "http://202.115.80.153/default2.aspx" # Verification code address and post Address cookie = Cookielib. Cookiejar () handler = Urllib2. Httpcookieprocessor (cookie) opener = Urllib2.build_opener (handler) # binds cookies to a opener Cookies are automatically managed by cookielib username = ' username ' password = ' password123 ' # User name and password picture = Opener.open (Captchaurl). Read () # Access the CAPTCHA address with openr, get cookielocal = open (' e:/image.jpg ', ' WB ') local.write (picture) Local.close () # Save Captcha to local Secretcode = Raw_ Input (' Enter Verification Code: ') # Open the saved captcha picture input postdata = {' __viewstate ': ' ddwyode2ntm0otg7oz6ph0twzk5t0lupp/tla1l+rml83g== ', ' txtUserName ': username, ' TextBox2 ': password, ' txtsecretcode ': Secretcode, ' RadioButtonList1 ': ' Student ', ' Button1 ': ', ' Lblanguage ': ', ' Hidpdrs ': ', ' HIDSC ': ',}# constructs the form according to the packet capture information headers = {' Accept ': ' Text/html,application/xhtml+xml, application/xml;q=0.9,image/webp,*/*;q=0.8 ', ' accept-language ': ' Zh-cn,zh;q=0.8 ', ' Connection ': ' keep-alive ', ' content-type ': ' application/x-www-form-urlencoded ', ' user-agent ': ' Mozilla /5.0 (Windows NT 6.1) applewebkit/537.36 (khtml, like Gecko) chrome/46.0.2490.86 safari/537.36 ',}# constructs headersdata based on packet capture information = Urllib.urlencode (postdata) # Generate post data? key1=value1&key2=value2 form request = Urllib2. Request (PostURL, data, headers) # constructs request requests Try:response = Opener.open (Request) result = Response.read (). Decode (' gb2312 ') # because the page is gb2312 encoded, you need to decode print result# after printing the login page except Urllib2. Httperror, E:print e.code# Use the opener login page where the cookie was previously stored

After successful login, you can use the Openr to access other pages that need to be logged in to access.

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.