Python Crawler Example (iv) website simulation login

Source: Internet
Author: User
Tags http post urlencode

First, to obtain a login information with a cookie simulation login

Under the example of Renren, first use their own account and password in the browser to log in, and then grab the packet to get a cookie, and then put the cookie in the request to send the request, the specific code is as follows:

#-*-coding:utf-8-*-ImportUrllib2#build headers information for a user who has logged inheaders = {    "Host":"www.renren.com",    "Connection":"keep-alive",    "upgrade-insecure-requests":"1",    "user-agent":"mozilla/5.0 (Windows NT 10.0; Win64; x64) applewebkit/537.36 (khtml, like Gecko) chrome/54.0.2840.99 safari/537.36",    "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",    "Accept-language":"zh-cn,zh;q=0.8,en;q=0.6",    #Add a capture cookie, which is a cookie for a user who has stored a password without having to log in repeatedly, which records login information such as user name and password (I only show a portion here)    "Cookies": "anonymid=j5xitrrwqgbk8; _r01_=1; loginfrom=syshome; wp_fold=0; _de= Bf09ee3fed92e6b65f6a4705d973f1383380866d39ff5;}#build the Request object with the header information (primarily cookie information) in the headersRequest = Urllib2. Request ("http://www.renren.com/", headers =headers)#Direct access to the Renren home page, the server will be based on the headers header information (mainly cookie information) to determine that this is a logged-in user, and return to the corresponding pageResponse =Urllib2.urlopen (Request)#Print Response ContentPrintResponse.read ()

This allows you to access pages that are not rendered until you log on.

Second, using Cookielib Library and Httpcookieprocessor processor

Although the above method is feasible, but it is too troublesome, we need to login to the account in the browser, and set the password to save, and capture the packet to obtain this cookie. Below we will simplify the code as follows:

#-*-coding:utf-8-*-ImportUrllibImportUrllib2ImportCookielib#Build a Cookiejar object instance to hold cookiesCookie =Cookielib. Cookiejar ()#use Httpcookieprocessor () to create a cookie processor object with the parameter Cookiejar () objectCookie_handler =Urllib2. Httpcookieprocessor (Cookie)#build opener with Build_opener ()Opener =Urllib2.build_opener (Cookie_handler)#Addheaders accepts a list in which each element is a ganso of headers information, opener will be accompanied by headers informationOpener.addheaders = [("user-agent","mozilla/5.0 (Windows NT 10.0; Win64; x64) applewebkit/537.36 (khtml, like Gecko) chrome/54.0.2840.99 safari/537.36")]#account and password required for logindata = {"Email":"Account","Password":"Password"}  #transcoding via UrlEncode ()PostData =urllib.urlencode (data)#build the Request object with the user name and password you want to sendRequest = Urllib2. Request ("http://www.renren.com/PLogin.do", data =postdata)#send this request via opener and get the cookie value after login,Opener.open (Request)#opener contains the value of the cookie after the user has logged in, and can directly access pages that have been logged in to accessResponse = Opener.open ("http://www.renren.com/410049765/profile")  #Print Response ContentPrintResponse.read ()
Here you can use requests to simplify the code, as follows:
ImportRequests#create session object to save cookie valueSsion =requests.session ()#Handling Headersheaders = {"user-agent":"mozilla/5.0 (Windows NT 10.0; Win64; x64) applewebkit/537.36 (khtml, like Gecko) chrome/54.0.2840.99 safari/537.36"}#3 user name and password required for logindata = {"Email":"Account","Password":"Password"}  #send a request with a username and password and get the cookie value after login, saved in SsionSsion.post ("http://www.renren.com/PLogin.do", data =data)#Ssion contains the value of the cookie after the user has logged in, and can directly access pages that have been logged in to accessResponse = Ssion.get ("Http://www.renren.com/410049765/profile")#Print Response ContentPrintResponse.text
 
Note: The old interface of Renren is used here http://www.renren.com/PLogin.do if it is www.renren.com/This interface login will fail because the new interface requires more than just the account and password, but also some



Span style= "font-size:14px" > 1, login is usually preceded by an HTTP GET, used to pull some information and obtain a cookie, and then HTTP POST login.

3, password some are sent in clear text, some are sent after encryption. Some websites even use dynamic encryption, including a lot of other data encryption information, only by viewing the JS source code to obtain encryption algorithm, and then to crack encryption, very difficult.
4, most Web sites are similar to the overall process, there may be some different details, so there is no guarantee that the other site login Success


III, using selenium and Phantomjs impersonation login
#-*-coding:utf-8-*- fromSeleniumImportWebdriver fromSelenium.webdriver.common.keysImportKeysImportTimedriver=Webdriver. PHANTOMJS () Driver.get ("http://www.renren.com/")#Enter your account passwordDriver.find_element_by_name ("Email"). Send_keys ("Account") Driver.find_element_by_name ("Password"). Send_keys ("Password")#Analog Click to loginDriver.find_element_by_xpath ("//input[@class = ' input-submit login-btn ']"). Click ()#wait 3 secondsTime.sleep (3)#Generate post-login snapshotsDriver.save_screenshot ("Renren.png")

After running the program can get the login page, you can also use Driver.page_source to get the page source code.

Python crawler Instance (iv) website simulation login

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.