Python realizes the code of the automatic login with the Verification Code website

Python realizes the code of the automatic login with the Verification Code website _python

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Early heard that using Python to do web crawler is very convenient, just these days units also have such a demand, need to log on XX website to download part of the document, so I personally tested some, the effect is not bad.

In this case, a Web site that is logged on needs to provide a username, password, and authentication code, which uses Python's urllib2 to directly log on to the site and process cookies from the site.

How Cookies work:
Cookies are generated by the server and then sent to the browser, where the browser saves the cookie in a text file in a directory. The next time the same Web site is requested, the cookie is sent to the server so that the server knows if the user is legitimate and needs to log on again.

Python provides a basic Cookielib library, where the cookie is automatically saved when the page is first accessed, and then the other pages are accompanied by a properly logged in cookie.

Principle:

(1) Activate cookie function
(2) Anti-"hotlinking", disguised as browser access
(3) Access to the authentication Code link and download the code picture to the local
(4) Verification Code identification Scheme on the Internet more, Python also has its own image processing library, this example calls the locomotive collector OCR recognition interface.
(5) Form processing, can be used fiddler, such as grab tools to obtain the required parameters to submit
(6) Generate the data that needs to be submitted, generate HTTP requests and send
(7) According to the return of the JS page to determine whether the landing success
(8) Download other pages after successful landing

This example uses multiple account polling login, each account download 3 pages.

Download URL Because of some problems, it is not disclosed.

Here's a partial code:

#!usr/bin/env python #-*-coding:utf-8-*-import OS import urllib2 import urllib import cookielib import Xml.etree.Ele Menttree as ET #-----------------------------------------------------------------------------# Login in www.***. com.cn def chinabiddinglogin (URL, username, password): # Enable Cookies support for Urllib2 cookiejar=cookielib. Cookiejar () Urlopener=urllib2.build_opener (urllib2. Httpcookieprocessor (Cookiejar)) Urllib2.install_opener (Urlopener) urlopener.addheaders.append (' Referer ', ' HT Tp://www.chinabidding.com.cn/zbw/login/login.jsp ')) urlopener.addheaders.append ((' Accept-language ', ' zh-cn ')) URL Opener.addheaders.append (' Host ', ' www.chinabidding.com.cn ') urlopener.addheaders.append (' user-agent ', ' Mozilla /5.0 (compatible; MISE 9.0; Windows NT 6.1); trident/5.0 ') urlopener.addheaders.append (' Connection ', ' keep-alive ')) print ' XXX Login ... ' imgurl=r ' Http://www.*****.com.cn/zbw/login/image.jsp ' DownloadFile (Imgurl, Urlopener) authcode=raw_input (' Please enter the Authcode: ') #authcode =verifyingcoderecognization (r "Yun_qi_ Img/code.jpg ") # Send Login/password to the site and get the session cookie values={' login_id ': username, ' opl ': ' Op_login ', ' login_passwd ':p assword, ' Login_check ': Authcode} urlcontent=urlopener.open (Urllib2. Request (URL, Urllib.urlencode (values)) Page=urlcontent.read (500000) # Make sure we are logged in, check the RET urned page Content if Page.find (' login.jsp ')!=-1:print ' login failed with username=%s, password=%s and Authco 
        de=%s '% (username, password, authcode) return False else:print ' Login succeeded! ' Return True #-----------------------------------------------------------------------------# Download from File Url then save to Filetosave # Note:the FILEURL must is a valid file Def DownloadFile (FileUrl, Urlopener): Isdownok=fa LSE Try:if Fileurl:ouTfile=open (R '/var/www/images/code.jpg ', ' W ') Outfile.write (Urlopener.open (URLLIB2). Request (FILEURL)). Read ()) Outfile.close () isdownok=true else:print ' ERROR:
    FILEURL is null! ' Except:isdownok=false return Isdownok #------------------------------------------------------------------ ------------# Verifying Code recoginization def verifyingcoderecognization (imgurl): Url=r ' http://192.168.0.119:800/a
    Pi? ' User= ' admin ' pwd= ' admin ' model= ' OCR ' ocrfile= ' CBI ' values={' user ': User, ' pwd ':p wd, ' model ': Model, ' Ocrfi Le ': ocrfile, ' Imgurl ': Imgurl} data=urllib.urlencode (values) Try:url+=data Urlcontent=urllib2.ur  Lopen (URL) except ioerror:print ' ***error:invalid URL (%s) '% URL page=urlcontent.read (500000) # Parse the XML data and get the Verifying Code root=et.fromstring (page) node_find=root.find (' AddField ') Authco de=node_find.attrib[' dATA '] return Authcode #------------------------------------------------------------------------------# Read users From Configure file def readusersfromfile (filename): users={} for Eachline in open (filename, ' R '): info=[ W for W in Eachline.strip (). Split ()] If Len (info) ==2:users[info[0]]=info[1] Return to users #--- ---------------------------------------------------------------------------def main (): Login_page=r ' http://www.* **.com.cnlogin/login.jsp ' download_page=r ' http://www.***.com.cn***/***?record_id= ' start_id=8593330 end_id=8
            595000 now_id=start_id users=readusersfromfile (' users.conf ') while True:for key in Users: If Chinabiddinglogin (Login_page, Key, Users[key]): For I in range (3): PAGEURL=DOWNL Oad_page+ '%d '% now_id urlcontent=urllib2.urlopen (pageurl) filepath= './download/
           %s.html '% now_id         F=open (filepath, ' W ') F.write (Urlcontent.read (500000)) F.close () Now_id+=1 Else:continue #------------------------------------------------------ ------------------------if __name__== ' __main__ ': Main ()

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More