Python realizes the code of the automatic login with the Verification Code website _python

Source: Internet
Author: User
Tags file url image processing library urlencode

Early heard that using Python to do web crawler is very convenient, just these days units also have such a demand, need to log on XX website to download part of the document, so I personally tested some, the effect is not bad.

In this case, a Web site that is logged on needs to provide a username, password, and authentication code, which uses Python's urllib2 to directly log on to the site and process cookies from the site.

How Cookies work:
Cookies are generated by the server and then sent to the browser, where the browser saves the cookie in a text file in a directory. The next time the same Web site is requested, the cookie is sent to the server so that the server knows if the user is legitimate and needs to log on again.

Python provides a basic Cookielib library, where the cookie is automatically saved when the page is first accessed, and then the other pages are accompanied by a properly logged in cookie.

Principle:

(1) Activate cookie function
(2) Anti-"hotlinking", disguised as browser access
(3) Access to the authentication Code link and download the code picture to the local
(4) Verification Code identification Scheme on the Internet more, Python also has its own image processing library, this example calls the locomotive collector OCR recognition interface.
(5) Form processing, can be used fiddler, such as grab tools to obtain the required parameters to submit
(6) Generate the data that needs to be submitted, generate HTTP requests and send
(7) According to the return of the JS page to determine whether the landing success
(8) Download other pages after successful landing

This example uses multiple account polling login, each account download 3 pages.

Download URL Because of some problems, it is not disclosed.

Here's a partial code:

#!usr/bin/env python #-*-coding:utf-8-*-import OS import urllib2 import urllib import cookielib import Xml.etree.Ele Menttree as ET #-----------------------------------------------------------------------------# Login in www.***. com.cn def chinabiddinglogin (URL, username, password): # Enable Cookies support for Urllib2 cookiejar=cookielib. Cookiejar () Urlopener=urllib2.build_opener (urllib2. Httpcookieprocessor (Cookiejar)) Urllib2.install_opener (Urlopener) urlopener.addheaders.append (' Referer ', ' HT Tp://www.chinabidding.com.cn/zbw/login/login.jsp ')) urlopener.addheaders.append ((' Accept-language ', ' zh-cn ')) URL Opener.addheaders.append (' Host ', ' www.chinabidding.com.cn ') urlopener.addheaders.append (' user-agent ', ' Mozilla /5.0 (compatible; MISE 9.0; Windows NT 6.1); trident/5.0 ') urlopener.addheaders.append (' Connection ', ' keep-alive ')) print ' XXX Login ... ' imgurl=r ' Http://www.*****.com.cn/zbw/login/image.jsp ' DownloadFile (Imgurl, Urlopener) authcode=raw_input (' Please enter the Authcode: ') #authcode =verifyingcoderecognization (r "Yun_qi_ Img/code.jpg ") # Send Login/password to the site and get the session cookie values={' login_id ': username, ' opl ': ' Op_login ', ' login_passwd ':p assword, ' Login_check ': Authcode} urlcontent=urlopener.open (Urllib2. Request (URL, Urllib.urlencode (values)) Page=urlcontent.read (500000) # Make sure we are logged in, check the RET urned page Content if Page.find (' login.jsp ')!=-1:print ' login failed with username=%s, password=%s and Authco 
        de=%s '% (username, password, authcode) return False else:print ' Login succeeded! ' Return True #-----------------------------------------------------------------------------# Download from File Url then save to Filetosave # Note:the FILEURL must is a valid file Def DownloadFile (FileUrl, Urlopener): Isdownok=fa LSE Try:if Fileurl:ouTfile=open (R '/var/www/images/code.jpg ', ' W ') Outfile.write (Urlopener.open (URLLIB2). Request (FILEURL)). Read ()) Outfile.close () isdownok=true else:print ' ERROR:
    FILEURL is null! ' Except:isdownok=false return Isdownok #------------------------------------------------------------------ ------------# Verifying Code recoginization def verifyingcoderecognization (imgurl): Url=r ' http://192.168.0.119:800/a
    Pi? ' User= ' admin ' pwd= ' admin ' model= ' OCR ' ocrfile= ' CBI ' values={' user ': User, ' pwd ':p wd, ' model ': Model, ' Ocrfi Le ': ocrfile, ' Imgurl ': Imgurl} data=urllib.urlencode (values) Try:url+=data Urlcontent=urllib2.ur  Lopen (URL) except ioerror:print ' ***error:invalid URL (%s) '% URL page=urlcontent.read (500000) # Parse the XML data and get the Verifying Code root=et.fromstring (page) node_find=root.find (' AddField ') Authco de=node_find.attrib[' dATA '] return Authcode #------------------------------------------------------------------------------# Read users From Configure file def readusersfromfile (filename): users={} for Eachline in open (filename, ' R '): info=[ W for W in Eachline.strip (). Split ()] If Len (info) ==2:users[info[0]]=info[1] Return to users #--- ---------------------------------------------------------------------------def main (): Login_page=r ' http://www.* **.com.cnlogin/login.jsp ' download_page=r ' http://www.***.com.cn***/***?record_id= ' start_id=8593330 end_id=8
            595000 now_id=start_id users=readusersfromfile (' users.conf ') while True:for key in Users: If Chinabiddinglogin (Login_page, Key, Users[key]): For I in range (3): PAGEURL=DOWNL Oad_page+ '%d '% now_id urlcontent=urllib2.urlopen (pageurl) filepath= './download/
           %s.html '% now_id         F=open (filepath, ' W ') F.write (Urlcontent.read (500000)) F.close () Now_id+=1 Else:continue #------------------------------------------------------ ------------------------if __name__== ' __main__ ': Main ()

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.