Python implements automatic login implementation code for websites with verification codes, and python verification Codes

Last Update:2015-01-14 Source: Internet

Author: User

Tags image processing library

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Python implements automatic login implementation code for websites with verification codes, and python verification Codes

I have heard that it is very convenient to use python for web crawlers. Just in the past few days, the organization has such a need to log on to the XX website to download some documents, so I tried it myself and the effect was good.

In this example, the user name, password, and verification code must be provided for a website to be logged on. In this example, the python urllib2 is used to directly log on to the website and process the Cookie of the website.

Cookie Working principle:
The Cookie is generated by the server and sent to the browser. The browser saves the Cookie in a text file in a directory. The Cookie will be sent to the server when the same website is requested next time, so that the server will know whether the user is legal and whether to log on again.

Python provides the basic cookielib library. When you access a page for the first time, the cookie is automatically saved. After accessing other pages, the Cookie for normal logon is displayed.

Principle:

(1) activate the cookie Function
(2) Anti-leeching, disguised as browser access
(3) Access the verification code link and download the verification code image to your local device.
(4) There are many verification code recognition solutions online, and python also has its own image processing library. This example calls the OCR recognition interface of the locomotive collector.
(5) form processing. You can use a packet capture tool such as fiddler to obtain the parameters to be submitted.
(6) generate the data to be submitted, generate an http request and send it
(7) Determine whether the login is successful Based on the returned js page
(8) download other pages after successful login

In this example, multiple accounts are used for round-robin login. Each account downloads three pages.

The download URL is not disclosed due to some problems.

Some code is as follows:

#!usr/bin/env python#-*- coding: utf-8 -*-import osimport urllib2import urllibimport cookielibimport xml.etree.ElementTree as ET#-----------------------------------------------------------------------------# Login in www.***.com.cndef ChinaBiddingLogin(url, username, password):    # Enable cookie support for urllib2    cookiejar=cookielib.CookieJar()    urlopener=urllib2.build_opener(urllib2.HTTPCookieProcessor(cookiejar))    urllib2.install_opener(urlopener)        urlopener.addheaders.append(('Referer', 'http://www.chinabidding.com.cn/zbw/login/login.jsp'))    urlopener.addheaders.append(('Accept-Language', 'zh-CN'))    urlopener.addheaders.append(('Host', 'www.chinabidding.com.cn'))    urlopener.addheaders.append(('User-Agent', 'Mozilla/5.0 (compatible; MISE 9.0; Windows NT 6.1); Trident/5.0'))    urlopener.addheaders.append(('Connection', 'Keep-Alive'))    print 'XXX Login......'    imgurl=r'http://www.*****.com.cn/zbw/login/image.jsp'    DownloadFile(imgurl, urlopener)    authcode=raw_input('Please enter the authcode:')    #authcode=VerifyingCodeRecognization(r"http://192.168.0.106/images/code.jpg")    # Send login/password to the site and get the session cookie    values={'login_id':username, 'opl':'op_login', 'login_passwd':password, 'login_check':authcode}    urlcontent=urlopener.open(urllib2.Request(url, urllib.urlencode(values)))    page=urlcontent.read(500000)    # Make sure we are logged in, check the returned page content    if page.find('login.jsp')!=-1:        print 'Login failed with username=%s, password=%s and authcode=%s' \                % (username, password, authcode)        return False    else:        print 'Login succeeded!'        return True#-----------------------------------------------------------------------------# Download from fileUrl then save to fileToSave# Note: the fileUrl must be a valid filedef DownloadFile(fileUrl, urlopener):    isDownOk=False    try:        if fileUrl:            outfile=open(r'/var/www/images/code.jpg', 'w')            outfile.write(urlopener.open(urllib2.Request(fileUrl)).read())            outfile.close()            isDownOK=True        else:            print 'ERROR: fileUrl is NULL!'    except:        isDownOK=False    return isDownOK#------------------------------------------------------------------------------# Verifying code recoginizationdef VerifyingCodeRecognization(imgurl):    url=r'http://192.168.0.119:800/api?'    user='admin'    pwd='admin'    model='ocr'    ocrfile='cbi'    values={'user':user, 'pwd':pwd, 'model':model, 'ocrfile':ocrfile, 'imgurl':imgurl}    data=urllib.urlencode(values)    try:        url+=data        urlcontent=urllib2.urlopen(url)    except IOError:        print '***ERROR: invalid URL (%s)' % url    page=urlcontent.read(500000)    # Parse the xml data and get the verifying code    root=ET.fromstring(page)    node_find=root.find('AddField')    authcode=node_find.attrib['data']    return authcode#------------------------------------------------------------------------------# Read users from configure filedef ReadUsersFromFile(filename):    users={}    for eachLine in open(filename, 'r'):        info=[w for w in eachLine.strip().split()]        if len(info)==2:            users[info[0]]=info[1]    return users#------------------------------------------------------------------------------def main():    login_page=r'http://www.***.com.cnlogin/login.jsp'    download_page=r'http://www.***.com.cn***/***?record_id='    start_id=8593330    end_id=8595000    now_id=start_id    Users=ReadUsersFromFile('users.conf')    while True:        for key in Users:            if ChinaBiddingLogin(login_page, key, Users[key]):                for i in range(3):                    pageUrl=download_page+'%d' % now_id                    urlcontent=urllib2.urlopen(pageUrl)                    filepath='./download/%s.html' % now_id                    f=open(filepath, 'w')                    f.write(urlcontent.read(500000))                    f.close()                    now_id+=1            else:                continue#------------------------------------------------------------------------------if __name__=='__main__':    main()

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More