Python access to Web pages using Cookies sample

Source: Internet
Author: User
Tags http cookie

Cookies are data that the site stores on the user's local terminal in order to identify the user and perform session tracking (usually encrypted)
Before that, you must first introduce the concept of a opener.

1.Opener

When you get a URL you use a opener (a urllib2. instance of Openerdirector).

Previously we were all using the default opener, which is Urlopen. It is a special opener that can be understood as a special instance of opener, and the incoming arguments are simply url,data,timeout.

If we need to use cookies, this opener is not going to work, so we need to create a more generic opener to implement the cookie settings.

2.Cookielib

The primary role of the Cookielib module is to provide objects that can store cookies to facilitate access to Internet resources in conjunction with the URLLIB2 module.

The Cookielib module is very powerful, and we can use the object of the Cookiejar class in this module to capture cookies and resend them on subsequent connection requests, such as the ability to simulate login.

The main objects of this module are Cookiejar, Filecookiejar, Mozillacookiejar, Lwpcookiejar.

Their relationships: cookiejar--derived-->filecookiejar--derived-–>mozillacookiejar and Lwpcookiejar

1) Get cookie to save to variable

First, we use the Cookiejar object to implement the function of getting cookies and storing them in a variable:

Import Urllib2
Import Cookielib
#声明一个CookieJar对象实例来保存cookie
Cookie = Cookielib. Cookiejar ()
#利用urllib2库的HTTPCookieProcessor对象来创建cookie处理器
Handler=urllib2. Httpcookieprocessor (Cookie)
#通过handler来构建opener
Opener = Urllib2.build_opener (handler)
#此处的open方法同urllib2的urlopen方法, you can also pass in the request
Response = Opener.open (' http://www.baidu.com ')
For item in Cookie:
print ' Name = ' +item.name
print ' Value = ' +item.value

We use the above method to save cookies to a variable and then print out the values in the cookie, and the results are as follows:


Name = Baiduid
Value = 485a446e92ba0af31e70b88bf2beacd0:fg=1
Name = Bidupsid
Value = 485a446e92ba0af31e70b88bf2beacd0
Name = H_ps_pssid
Value = 1450_18280_18879_17946_18205_18778_18559_17001_17072_15009_11563_18019
Name = PSTM
Value = 1454031065
Name = Bdsvrtm
Value = 0
Name = Bd_home
Value = 0

2 Save cookies to file

In the above method, we save the cookie to the variable of the cookie, what do we do if we want to save the cookie to a file?

At this point, we are going to use the object of Filecookiejar, where we use its subclass Mozillacookiejar to implement the cookie Save

Import Cookielib
Import Urllib2

#设置保存cookie的文件, cookie.txt under the same category
filename = ' Cookie.txt '
#声明一个MozillaCookieJar对象实例来保存cookie, and then write to the file
Cookie = Cookielib. Mozillacookiejar (filename)
#利用urllib2库的HTTPCookieProcessor对象来创建cookie处理器
Handler = Urllib2. Httpcookieprocessor (Cookie)
#通过handler来构建opener
Opener = Urllib2.build_opener (handler)
#创建一个请求, the principle of urlopen with URLLIB2
Response = Opener.open ("http://www.baidu.com")
#保存cookie到文件
Cookie.save (Ignore_discard=true, Ignore_expires=true)
For the last two parameters of the Save method, the official explanation is as follows:

Ignore_discard:save even cookies set to be discarded.

Ignore_expires:save even cookies that have expiredthe the file is overwritten if it already exists

Thus, ignore_discard means that even if the cookie is discarded and preserved, ignore_expires means that if the cookie already exists in the file, overwrite the original file, and here we will set both of these to true.

After the operation, the cookies will be saved to the Cookie.txt file, which reads as follows:


# Netscape HTTP Cookie File
# http://www.netscape.com/newsref/std/cookie_spec.html
# This is a generated file! Do not edit.

. baidu.com true/false 3601514761 baiduid 1b07eee0c5ce4b0ebaeba61c3c8f52c9:fg=1
. baidu.com true/false 3601514761 bidupsid 1b07eee0c5ce4b0ebaeba61c3c8f52c9
. baidu.com True/false H_ps_pssid 1425_17757_18879_12825_18205_18778_17001_17073_15048_11808_10634
. baidu.com true/false 3601514761 PSTM 1454031113
Www.baidu.com false/false Bdsvrtm 0
Www.baidu.com false/false Bd_home 0

3 Get cookies from the file and access

So we've already saved cookies to a file, and if you want to use them later, you can use the following methods to read cookies and visit the Web site:


Import Cookielib
Import Urllib2

#创建MozillaCookieJar实例对象
Cookie = Cookielib. Mozillacookiejar ()
#从文件中读取cookie内容到变量
Cookie.load (' Cookie.txt ', Ignore_discard=true, Ignore_expires=true)
#创建请求的request
req = Urllib2. Request ("http://www.baidu.com")
#利用urllib2的build_opener方法创建一个opener
Opener = Urllib2.build_opener (urllib2. Httpcookieprocessor (Cookie))
Response = Opener.open (req)
Print Response.read ()

Imagine, if our cookie.txt file is stored in a person login Baidu cookies, then we extract the contents of this cookie file, you can use the above method to simulate this person's account login Baidu.

4 using cookies to simulate Web site login

The following is an example of an education system in a school that uses cookies to implement a mock login:


Import urllib
Import urllib2
Import cookielib
 
#声明一个CookieJar对象实例来保存cookie
Cookie = COO Kielib. Cookiejar ()
#利用urllib2库的HTTPCookieProcessor对象来创建cookie处理器
Handler=urllib2. Httpcookieprocessor (cookie)
#通过handler来构建opener
opener = Urllib2.build_opener (handler)
 
PostData = Urllib.urlencode ({
            ' stuid '): ' 201200131012 ',
            ' pwd ': ' 23342321 '
        })
#模拟登录, and save the cookie to the variable
loginurl = ' http://jwxt.sdu.edu.cn:7890/pls/ Wwwbks/bks_login2.login '
result = Opener.open (loginurl,postdata)
#利用cookie请求访问成绩查询网址
Gradeurl = ' http:// Jwxt.sdu.edu.cn:7890/pls/wwwbks/bkscjcx.curscopre '
result = Opener.open (gradeurl)
Print result.read ()

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.