Python implements automatic login implementation code for websites with verification codes, and python verification Codes

Source: Internet
Author: User
Tags image processing library

Python implements automatic login implementation code for websites with verification codes, and python verification Codes

I have heard that it is very convenient to use python for web crawlers. Just in the past few days, the organization has such a need to log on to the XX website to download some documents, so I tried it myself and the effect was good.

In this example, the user name, password, and verification code must be provided for a website to be logged on. In this example, the python urllib2 is used to directly log on to the website and process the Cookie of the website.

Cookie Working principle:
The Cookie is generated by the server and sent to the browser. The browser saves the Cookie in a text file in a directory. The Cookie will be sent to the server when the same website is requested next time, so that the server will know whether the user is legal and whether to log on again.

Python provides the basic cookielib library. When you access a page for the first time, the cookie is automatically saved. After accessing other pages, the Cookie for normal logon is displayed.

Principle:

(1) activate the cookie Function
(2) Anti-leeching, disguised as browser access
(3) Access the verification code link and download the verification code image to your local device.
(4) There are many verification code recognition solutions online, and python also has its own image processing library. This example calls the OCR recognition interface of the locomotive collector.
(5) form processing. You can use a packet capture tool such as fiddler to obtain the parameters to be submitted.
(6) generate the data to be submitted, generate an http request and send it
(7) Determine whether the login is successful Based on the returned js page
(8) download other pages after successful login

In this example, multiple accounts are used for round-robin login. Each account downloads three pages.

The download URL is not disclosed due to some problems.

Some code is as follows:

#!usr/bin/env python#-*- coding: utf-8 -*-import osimport urllib2import urllibimport cookielibimport xml.etree.ElementTree as ET#-----------------------------------------------------------------------------# Login in www.***.com.cndef ChinaBiddingLogin(url, username, password):    # Enable cookie support for urllib2    cookiejar=cookielib.CookieJar()    urlopener=urllib2.build_opener(urllib2.HTTPCookieProcessor(cookiejar))    urllib2.install_opener(urlopener)        urlopener.addheaders.append(('Referer', 'http://www.chinabidding.com.cn/zbw/login/login.jsp'))    urlopener.addheaders.append(('Accept-Language', 'zh-CN'))    urlopener.addheaders.append(('Host', 'www.chinabidding.com.cn'))    urlopener.addheaders.append(('User-Agent', 'Mozilla/5.0 (compatible; MISE 9.0; Windows NT 6.1); Trident/5.0'))    urlopener.addheaders.append(('Connection', 'Keep-Alive'))    print 'XXX Login......'    imgurl=r'http://www.*****.com.cn/zbw/login/image.jsp'    DownloadFile(imgurl, urlopener)    authcode=raw_input('Please enter the authcode:')    #authcode=VerifyingCodeRecognization(r"http://192.168.0.106/images/code.jpg")    # Send login/password to the site and get the session cookie    values={'login_id':username, 'opl':'op_login', 'login_passwd':password, 'login_check':authcode}    urlcontent=urlopener.open(urllib2.Request(url, urllib.urlencode(values)))    page=urlcontent.read(500000)    # Make sure we are logged in, check the returned page content    if page.find('login.jsp')!=-1:        print 'Login failed with username=%s, password=%s and authcode=%s' \                % (username, password, authcode)        return False    else:        print 'Login succeeded!'        return True#-----------------------------------------------------------------------------# Download from fileUrl then save to fileToSave# Note: the fileUrl must be a valid filedef DownloadFile(fileUrl, urlopener):    isDownOk=False    try:        if fileUrl:            outfile=open(r'/var/www/images/code.jpg', 'w')            outfile.write(urlopener.open(urllib2.Request(fileUrl)).read())            outfile.close()            isDownOK=True        else:            print 'ERROR: fileUrl is NULL!'    except:        isDownOK=False    return isDownOK#------------------------------------------------------------------------------# Verifying code recoginizationdef VerifyingCodeRecognization(imgurl):    url=r'http://192.168.0.119:800/api?'    user='admin'    pwd='admin'    model='ocr'    ocrfile='cbi'    values={'user':user, 'pwd':pwd, 'model':model, 'ocrfile':ocrfile, 'imgurl':imgurl}    data=urllib.urlencode(values)    try:        url+=data        urlcontent=urllib2.urlopen(url)    except IOError:        print '***ERROR: invalid URL (%s)' % url    page=urlcontent.read(500000)    # Parse the xml data and get the verifying code    root=ET.fromstring(page)    node_find=root.find('AddField')    authcode=node_find.attrib['data']    return authcode#------------------------------------------------------------------------------# Read users from configure filedef ReadUsersFromFile(filename):    users={}    for eachLine in open(filename, 'r'):        info=[w for w in eachLine.strip().split()]        if len(info)==2:            users[info[0]]=info[1]    return users#------------------------------------------------------------------------------def main():    login_page=r'http://www.***.com.cnlogin/login.jsp'    download_page=r'http://www.***.com.cn***/***?record_id='    start_id=8593330    end_id=8595000    now_id=start_id    Users=ReadUsersFromFile('users.conf')    while True:        for key in Users:            if ChinaBiddingLogin(login_page, key, Users[key]):                for i in range(3):                    pageUrl=download_page+'%d' % now_id                    urlcontent=urllib2.urlopen(pageUrl)                    filepath='./download/%s.html' % now_id                    f=open(filepath, 'w')                    f.write(urlcontent.read(500000))                    f.close()                    now_id+=1            else:                continue#------------------------------------------------------------------------------if __name__=='__main__':    main()

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.