Python Crawler Example (iv) website simulation login

Last Update:2017-10-04 Source: Internet

Author: User

Tags http post urlencode

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, to obtain a login information with a cookie simulation login

Under the example of Renren, first use their own account and password in the browser to log in, and then grab the packet to get a cookie, and then put the cookie in the request to send the request, the specific code is as follows:

#-*-coding:utf-8-*-ImportUrllib2#build headers information for a user who has logged inheaders = {    "Host":"www.renren.com",    "Connection":"keep-alive",    "upgrade-insecure-requests":"1",    "user-agent":"mozilla/5.0 (Windows NT 10.0; Win64; x64) applewebkit/537.36 (khtml, like Gecko) chrome/54.0.2840.99 safari/537.36",    "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",    "Accept-language":"zh-cn,zh;q=0.8,en;q=0.6",    #Add a capture cookie, which is a cookie for a user who has stored a password without having to log in repeatedly, which records login information such as user name and password (I only show a portion here)    "Cookies": "anonymid=j5xitrrwqgbk8; _r01_=1; loginfrom=syshome; wp_fold=0; _de= Bf09ee3fed92e6b65f6a4705d973f1383380866d39ff5;}#build the Request object with the header information (primarily cookie information) in the headersRequest = Urllib2. Request ("http://www.renren.com/", headers =headers)#Direct access to the Renren home page, the server will be based on the headers header information (mainly cookie information) to determine that this is a logged-in user, and return to the corresponding pageResponse =Urllib2.urlopen (Request)#Print Response ContentPrintResponse.read ()

This allows you to access pages that are not rendered until you log on.

Second, using Cookielib Library and Httpcookieprocessor processor

Although the above method is feasible, but it is too troublesome, we need to login to the account in the browser, and set the password to save, and capture the packet to obtain this cookie. Below we will simplify the code as follows:

#-*-coding:utf-8-*-ImportUrllibImportUrllib2ImportCookielib#Build a Cookiejar object instance to hold cookiesCookie =Cookielib. Cookiejar ()#use Httpcookieprocessor () to create a cookie processor object with the parameter Cookiejar () objectCookie_handler =Urllib2. Httpcookieprocessor (Cookie)#build opener with Build_opener ()Opener =Urllib2.build_opener (Cookie_handler)#Addheaders accepts a list in which each element is a ganso of headers information, opener will be accompanied by headers informationOpener.addheaders = [("user-agent","mozilla/5.0 (Windows NT 10.0; Win64; x64) applewebkit/537.36 (khtml, like Gecko) chrome/54.0.2840.99 safari/537.36")]#account and password required for logindata = {"Email":"Account","Password":"Password"}  #transcoding via UrlEncode ()PostData =urllib.urlencode (data)#build the Request object with the user name and password you want to sendRequest = Urllib2. Request ("http://www.renren.com/PLogin.do", data =postdata)#send this request via opener and get the cookie value after login,Opener.open (Request)#opener contains the value of the cookie after the user has logged in, and can directly access pages that have been logged in to accessResponse = Opener.open ("http://www.renren.com/410049765/profile")  #Print Response ContentPrintResponse.read ()

Here you can use requests to simplify the code, as follows:

ImportRequests#create session object to save cookie valueSsion =requests.session ()#Handling Headersheaders = {"user-agent":"mozilla/5.0 (Windows NT 10.0; Win64; x64) applewebkit/537.36 (khtml, like Gecko) chrome/54.0.2840.99 safari/537.36"}#3 user name and password required for logindata = {"Email":"Account","Password":"Password"}  #send a request with a username and password and get the cookie value after login, saved in SsionSsion.post ("http://www.renren.com/PLogin.do", data =data)#Ssion contains the value of the cookie after the user has logged in, and can directly access pages that have been logged in to accessResponse = Ssion.get ("Http://www.renren.com/410049765/profile")#Print Response ContentPrintResponse.text

 
   Note: The old interface of Renren is used here http://www.renren.com/PLogin.do if it is www.renren.com/This interface login will fail because the new interface requires more than just the account and password, but also some   
    
 
    
  Span style= "font-size:14px" > 1, login is usually preceded by an HTTP GET, used to pull some information and obtain a cookie, and then HTTP POST login.   
    
   3, password some are sent in clear text, some are sent after encryption. Some websites even use dynamic encryption, including a lot of other data encryption information, only by viewing the JS source code to obtain encryption algorithm, and then to crack encryption, very difficult.   
   4, most Web sites are similar to the overall process, there may be some different details, so there is no guarantee that the other site login Success   
 
 
   III, using selenium and Phantomjs impersonation login

#-*-coding:utf-8-*- fromSeleniumImportWebdriver fromSelenium.webdriver.common.keysImportKeysImportTimedriver=Webdriver. PHANTOMJS () Driver.get ("http://www.renren.com/")#Enter your account passwordDriver.find_element_by_name ("Email"). Send_keys ("Account") Driver.find_element_by_name ("Password"). Send_keys ("Password")#Analog Click to loginDriver.find_element_by_xpath ("//input[@class = ' input-submit login-btn ']"). Click ()#wait 3 secondsTime.sleep (3)#Generate post-login snapshotsDriver.save_screenshot ("Renren.png")

After running the program can get the login page, you can also use Driver.page_source to get the page source code.

Python crawler Instance (iv) website simulation login

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More