Use of cookies
Use Python to log in to the website, use cookies to record login information, and then capture the information that you can see after you log in.
What is a cookie?
Cookies are data (usually encrypted) stored on the user's local terminal by certain websites in order to identify the user and track the session.
For example, some sites need to log in to access a page, before you log in, you want to crawl a page content is not allowed. Then we can use the Urllib library to save our registered cookies, and then crawl the other pages to achieve the goal.
The concept of opener
When you get a URL you use a opener (a urllib2. Openerdirector instances). In front, we are all using the default opener, which is Urlopen.
Urlopen is a special opener that can be understood as a special instance of opener, and the incoming parameters are only url,data,timeout.
If we need to use cookies, it is not possible to use this opener, so we need to create more general opener to implement the cookie settings.
Cookielib
The primary role of the Cookielib module is to provide objects that store cookies to facilitate access to Internet resources in conjunction with the URLLIB2 module. The Cookielib module is very powerful, and we can use the object of the Cookiejar class of this module to capture cookies and resend them on subsequent connection requests, such as the ability to implement the impersonation login function. The main objects of the module are Cookiejar, Filecookiejar, Mozillacookiejar, Lwpcookiejar.
Their relationship: cookiejar--derived-->filecookiejar--derived-–>mozillacookiejar and Lwpcookiejar
To log in using a cookie
1) Get cookie saved to variable
Urllib.requesthttp.cookiejarurl_root = R ' http://d.weibo.com/' cookie = Http.cookiejar.CookieJar () # Declares a Cookiejar object instance to hold Cookiehandler = urllib.request.HTTPCookieProcessor (cookie) # Using the Httpcookieprocessor object of the URLLIB2 library to create a cookie processor opener = Urllib.request.build_opener (handler) # Handler to build Openerresponse = Opener.open (url_root) # Here The Open method and Urllib2 method of Urlopen, you can also pass in the request Cookie: print (' Name = ' + item.name) print (' Value = ' + item.value)
We use the above method to save the cookie in the variable, and then print out the value in the cookie, the result is as follows
Name = yf-page-g0
Value = dc8d8d4964cd93a7c3bfa7640c1bd10c
Note:p Y3 opener can also be used in this way:
Request = Urllib.request.Request (Url_root, PostData, headers) response = Opener.open (Request)
Or:
Urllib.request.install_opener (opener)
Request = Urllib.request.Request (Url_root, PostData, headers)
Response = Urllib.request.urlopen (Request)
2) Save cookies to file
We saved the cookie in the cookie variable, so what do we do if we want to save the cookie to a file?
At this point, we are going to use the Filecookiejar object, where we use its subclass Mozillacookiejar to save the cookie.
Import urllib.request, Urllib.parse, urllib.error import http.cookiejarurl_root = ' http/ www.jobbole.com/login/' values = {' name ': ' ****** ', ' password ': ' ****** '}postdata = Urllib.parse.urlencode (values). Encode () User_agent = R ' mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/44.0.2403.157 safari/537.36 ' headers = {' User-agent ': user_agent} Cookie_filename = ' cookie.txt ' cookie = Http.cookiejar.LWPCookieJar (cookie_filename) handler = Urllib.request.HTTPCookieProcessor (cookie) opener = Urllib.request.build_opener (handler) Request = Urllib.request.Request (Url_root, PostData, headers) try : response = Opener.open (request) except Urllib.error.URLError as e:print (E.reason) cookie.save ( ignore_discard = True , ignore_ Expires = True ) # Save Cookie to Cookie.txt for item in cookie:print (' Name = ' + item.name) Print (' Value = ' + item.value)
Note:
1. Explanations of different cookie write file methods:
Filecookiejar (filename): Creates an Filecookiejar instance, retrieves cookie information, and stores the information in a file, filename.
Mozillacookiejar (filename): Creates a Filecookiejar instance that is compatible with Mozilla Cookies.txt files.
Lwpcookiejar (filename): Creates a Filecookiejar instance that is compatible with Libwww-perl set-cookie3 files.
2. Official explanation of the two parameters of the Save method:
Ignore_discard:save even cookies set to is discarded. Even if the cookie is discarded, it will be preserved.
Ignore_expires:save even cookie that has expiredthe file is overwritten if it already exists. If the cookie already exists in the file, overwrite the original Into
3. Error in Python3 if Http.cookiejar.CookieJar (filename) is used directly: Self._policy._now = Self._now = Int (Time.time ()) Attributeerror: ' str ' object has no attribute ' _now '. Note To change the Cookiejar to Lwpcookiejar.
3) obtain a cookie from the file and access
So we've already saved the cookie to the file, and if you want to use it later, you can use the following method to read the cookie and visit the website and feel
Urllib.requesturllib.parseurllib.errorignore_discard=True Ignore_expires=True) handler = urllib.request.HTTPCookieProcessor (cookie) opener = Urllib.request.build_opener (handler) Get_url = ' http://www.jobbole.com/' # Use cookie to request access to another URL Get_request = Urllib.request.Request (get_url) get_response = Opener.open (get_request) print (Get_response.read (). Decode ())
http://blog.csdn.net/pipisorry/article/details/47905781
Implementation of Urllib---cookie processing