Implementation of Urllib---cookie processing

Source: Internet
Author: User

Use of cookies

Use Python to log in to the website, use cookies to record login information, and then capture the information that you can see after you log in.

What is a cookie?

Cookies are data (usually encrypted) stored on the user's local terminal by certain websites in order to identify the user and track the session.
For example, some sites need to log in to access a page, before you log in, you want to crawl a page content is not allowed. Then we can use the Urllib library to save our registered cookies, and then crawl the other pages to achieve the goal.
The concept of opener
When you get a URL you use a opener (a urllib2. Openerdirector instances). In front, we are all using the default opener, which is Urlopen.

Urlopen is a special opener that can be understood as a special instance of opener, and the incoming parameters are only url,data,timeout.
If we need to use cookies, it is not possible to use this opener, so we need to create more general opener to implement the cookie settings.
Cookielib
The primary role of the Cookielib module is to provide objects that store cookies to facilitate access to Internet resources in conjunction with the URLLIB2 module. The Cookielib module is very powerful, and we can use the object of the Cookiejar class of this module to capture cookies and resend them on subsequent connection requests, such as the ability to implement the impersonation login function. The main objects of the module are Cookiejar, Filecookiejar, Mozillacookiejar, Lwpcookiejar.
Their relationship: cookiejar--derived-->filecookiejar--derived-–>mozillacookiejar and Lwpcookiejar

To log in using a cookie
1) Get cookie saved to variable

Urllib.requesthttp.cookiejarurl_root = R ' http://d.weibo.com/' cookie = Http.cookiejar.CookieJar ()  # Declares a Cookiejar object instance to hold Cookiehandler = urllib.request.HTTPCookieProcessor (cookie)  # Using the Httpcookieprocessor object of the URLLIB2 library to create a cookie processor opener = Urllib.request.build_opener (handler)  # Handler to build Openerresponse = Opener.open (url_root)  # Here The Open method and Urllib2 method of Urlopen, you can also pass in  the request  Cookie:    print (' Name = ' + item.name)    print (' Value = ' + item.value)

We use the above method to save the cookie in the variable, and then print out the value in the cookie, the result is as follows
Name = yf-page-g0

Value = dc8d8d4964cd93a7c3bfa7640c1bd10c

Note:p Y3 opener can also be used in this way:

Request = Urllib.request.Request (Url_root, PostData, headers) response = Opener.open (Request)

Or:

Urllib.request.install_opener (opener)
Request = Urllib.request.Request (Url_root, PostData, headers)
Response = Urllib.request.urlopen (Request)
2) Save cookies to file
We saved the cookie in the cookie variable, so what do we do if we want to save the cookie to a file?
At this point, we are going to use the Filecookiejar object, where we use its subclass Mozillacookiejar to save the cookie.

  Import  urllib.request, Urllib.parse, urllib.error  import  http.cookiejarurl_root = ' http/ www.jobbole.com/login/' values = {' name ': ' ****** ', ' password ': ' ****** '}postdata = Urllib.parse.urlencode (values). Encode () User_agent = R ' mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/44.0.2403.157 safari/537.36 ' headers = {' User-agent ': user_agent} Cookie_filename = ' cookie.txt ' cookie = Http.cookiejar.LWPCookieJar (cookie_filename) handler = Urllib.request.HTTPCookieProcessor (cookie) opener = Urllib.request.build_opener (handler) Request = Urllib.request.Request (Url_root, PostData, headers)  try : response = Opener.open (request)  except  Urllib.error.URLError  as  e:print (E.reason) cookie.save ( ignore_discard  =  True ,  ignore_    Expires  =  True ) # Save Cookie to Cookie.txt  for  item  in  cookie:print (' Name = ' + item.name) Print (' Value = ' + item.value) 

Note:

1. Explanations of different cookie write file methods:

Filecookiejar (filename): Creates an Filecookiejar instance, retrieves cookie information, and stores the information in a file, filename.

Mozillacookiejar (filename): Creates a Filecookiejar instance that is compatible with Mozilla Cookies.txt files.

Lwpcookiejar (filename): Creates a Filecookiejar instance that is compatible with Libwww-perl set-cookie3 files.

2. Official explanation of the two parameters of the Save method:

Ignore_discard:save even cookies set to is discarded. Even if the cookie is discarded, it will be preserved.

Ignore_expires:save even cookie that has expiredthe file is overwritten if it already exists. If the cookie already exists in the file, overwrite the original Into

3. Error in Python3 if Http.cookiejar.CookieJar (filename) is used directly: Self._policy._now = Self._now = Int (Time.time ()) Attributeerror: ' str ' object has no attribute ' _now '. Note To change the Cookiejar to Lwpcookiejar.

3) obtain a cookie from the file and access
So we've already saved the cookie to the file, and if you want to use it later, you can use the following method to read the cookie and visit the website and feel

Urllib.requesturllib.parseurllib.errorignore_discard=True  Ignore_expires=True) handler = urllib.request.HTTPCookieProcessor (cookie) opener = Urllib.request.build_opener (handler) Get_url = ' http://www.jobbole.com/'  # Use cookie to request access to another URL Get_request = Urllib.request.Request (get_url) get_response = Opener.open (get_request) print (Get_response.read (). Decode ())

http://blog.csdn.net/pipisorry/article/details/47905781

Implementation of Urllib---cookie processing

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.