Python implements automatic login implementation code for websites with verification codes, and python verification Codes
I have heard that it is very convenient to use python for web crawlers. Just in the past few days, the organization has such a need to log on to the XX website to download some documents, so I tried it myself and the effect was good.
In this example, the user name, password, and verification code must be provided for a website to be logged on. In this example, the python urllib2 is used to directly log on to the website and process the Cookie of the website.
Cookie Working principle:
The Cookie is generated by the server and sent to the browser. The browser saves the Cookie in a text file in a directory. The Cookie will be sent to the server when the same website is requested next time, so that the server will know whether the user is legal and whether to log on again.
Python provides the basic cookielib library. When you access a page for the first time, the cookie is automatically saved. After accessing other pages, the Cookie for normal logon is displayed.
Principle:
(1) activate the cookie Function
(2) Anti-leeching, disguised as browser access
(3) Access the verification code link and download the verification code image to your local device.
(4) There are many verification code recognition solutions online, and python also has its own image processing library. This example calls the OCR recognition interface of the locomotive collector.
(5) form processing. You can use a packet capture tool such as fiddler to obtain the parameters to be submitted.
(6) generate the data to be submitted, generate an http request and send it
(7) Determine whether the login is successful Based on the returned js page
(8) download other pages after successful login
In this example, multiple accounts are used for round-robin login. Each account downloads three pages.
The download URL is not disclosed due to some problems.
Some code is as follows:
#!usr/bin/env python#-*- coding: utf-8 -*-import osimport urllib2import urllibimport cookielibimport xml.etree.ElementTree as ET#-----------------------------------------------------------------------------# Login in www.***.com.cndef ChinaBiddingLogin(url, username, password): # Enable cookie support for urllib2 cookiejar=cookielib.CookieJar() urlopener=urllib2.build_opener(urllib2.HTTPCookieProcessor(cookiejar)) urllib2.install_opener(urlopener) urlopener.addheaders.append(('Referer', 'http://www.chinabidding.com.cn/zbw/login/login.jsp')) urlopener.addheaders.append(('Accept-Language', 'zh-CN')) urlopener.addheaders.append(('Host', 'www.chinabidding.com.cn')) urlopener.addheaders.append(('User-Agent', 'Mozilla/5.0 (compatible; MISE 9.0; Windows NT 6.1); Trident/5.0')) urlopener.addheaders.append(('Connection', 'Keep-Alive')) print 'XXX Login......' imgurl=r'http://www.*****.com.cn/zbw/login/image.jsp' DownloadFile(imgurl, urlopener) authcode=raw_input('Please enter the authcode:') #authcode=VerifyingCodeRecognization(r"http://192.168.0.106/images/code.jpg") # Send login/password to the site and get the session cookie values={'login_id':username, 'opl':'op_login', 'login_passwd':password, 'login_check':authcode} urlcontent=urlopener.open(urllib2.Request(url, urllib.urlencode(values))) page=urlcontent.read(500000) # Make sure we are logged in, check the returned page content if page.find('login.jsp')!=-1: print 'Login failed with username=%s, password=%s and authcode=%s' \ % (username, password, authcode) return False else: print 'Login succeeded!' return True#-----------------------------------------------------------------------------# Download from fileUrl then save to fileToSave# Note: the fileUrl must be a valid filedef DownloadFile(fileUrl, urlopener): isDownOk=False try: if fileUrl: outfile=open(r'/var/www/images/code.jpg', 'w') outfile.write(urlopener.open(urllib2.Request(fileUrl)).read()) outfile.close() isDownOK=True else: print 'ERROR: fileUrl is NULL!' except: isDownOK=False return isDownOK#------------------------------------------------------------------------------# Verifying code recoginizationdef VerifyingCodeRecognization(imgurl): url=r'http://192.168.0.119:800/api?' user='admin' pwd='admin' model='ocr' ocrfile='cbi' values={'user':user, 'pwd':pwd, 'model':model, 'ocrfile':ocrfile, 'imgurl':imgurl} data=urllib.urlencode(values) try: url+=data urlcontent=urllib2.urlopen(url) except IOError: print '***ERROR: invalid URL (%s)' % url page=urlcontent.read(500000) # Parse the xml data and get the verifying code root=ET.fromstring(page) node_find=root.find('AddField') authcode=node_find.attrib['data'] return authcode#------------------------------------------------------------------------------# Read users from configure filedef ReadUsersFromFile(filename): users={} for eachLine in open(filename, 'r'): info=[w for w in eachLine.strip().split()] if len(info)==2: users[info[0]]=info[1] return users#------------------------------------------------------------------------------def main(): login_page=r'http://www.***.com.cnlogin/login.jsp' download_page=r'http://www.***.com.cn***/***?record_id=' start_id=8593330 end_id=8595000 now_id=start_id Users=ReadUsersFromFile('users.conf') while True: for key in Users: if ChinaBiddingLogin(login_page, key, Users[key]): for i in range(3): pageUrl=download_page+'%d' % now_id urlcontent=urllib2.urlopen(pageUrl) filepath='./download/%s.html' % now_id f=open(filepath, 'w') f.write(urlcontent.read(500000)) f.close() now_id+=1 else: continue#------------------------------------------------------------------------------if __name__=='__main__': main()