Python's cookielib description and instance example

Source: Internet
Author: User
Tags urlencode

cookielib Introduction

I. Core class

Cookie

This class implements the cookie standard defined by the Netscape and RFC 2965 cookies, which can be understood as a single cookie data. The

section of the code is as follows, many of the properties are not familiar?

        Self.domain_initial_dot = Domain_initial_dot
         Self.path = Path
        self.path_specified = Path_specified
        self.secure = Secure
         self.expires = Expires
        self.discard = Discard
        self.comment = Comment
        Self.comment_url = Comment_url
        self.rfc2109 = rfc2109

Cookiepolicy

The primary function of this class is to send and receive cookies, that is, to ensure that the correct cookie is sent to the corresponding domain name, and vice versa.


Defaultcookiepolicy

This class implements the Cookiepolicy interface.


Cookiejar

Cookiejar is a collection of cookies that can contain a lot of cookie classes and is our main object of action. There are a number of ways to support more detailed operation!


Filecookiejar

This class inherits from Cookiejar,cookiejar just completes its lifecycle in memory, Filecookiejar subclasses can implement data persistence, define save, load, revert three interfaces.


Mozillacookiejar & Lwpcookiejar

Two implementation classes, the inheritance relationship is as follows:

Mozillacookiejar & Lwpcookiejar


II. Use


Simple example

A simple code of use

#!/usr/bin/env python
# Encoding:utf-8


Import requests
Import Cookielib


url = ' http://www.baidu.com/'
Jar = Cookielib. Lwpcookiejar (' Cookie.txt ')

# Try to load cookies
# Ask a question, why do you want to add the Ignore_discard attribute?
Try
Jar.load (Ignore_discard=true)
Except
Pass

# Create a session
s = requests. Session ()

# Set Headers and cookies
S.headers = {' user-agent ': ' mozilla/5.0 ' (Macintosh; Intel Mac OS X 10_7_2) applewebkit/537.36 (khtml, like Gecko) chrome/27.0.1453.93 safari/537.36 '}
S.cookies = Jar

# Access URL
r = S.get (URL)

# Persistent Cookies
Jar.save (Ignore_discard=true)

# Print Cookies
For item in jar:
print ' Cookie name:%s----Value:%s '% (Item.name, Item.value)

We get the following cookie

Cat Cookie.txt

#LWP-cookies-2.0
Set-cookie3:baiduid= "2f5340b39928231aa09353cdae3da14d:fg=1"; Path= "/"; Domain= ". Baidu.com"; Path_spec; Domain_dot; expires= "2083-07-09 16:27:51z"; Version=0
set-cookie3:bidupsid=2f5340b39928231aa09353cdae3da14d; Path= "/"; Domain= ". Baidu.com"; Path_spec; Domain_dot; expires= "2083-07-09 16:27:51z"; Version=0
Set-cookie3:h_ps_pssid=14872_1457_14412_14509_14444_12826_10812_14430_12868_14871_12723_14962_14919_14902_ 15384_12095_13937_15963; Path= "/"; Domain= ". Baidu.com"; Path_spec; Domain_dot; Discard; Version=0
set-cookie3:pstm=1434892424; Path= "/"; Domain= ". Baidu.com"; Path_spec; Domain_dot; expires= "2083-07-09 16:27:51z"; Version=0
set-cookie3:bdsvrtm=0; Path= "/"; Domain= "www.baidu.com"; Path_spec; Discard; Version=0
set-cookie3:bd_home=0; Path= "/"; Domain= "www.baidu.com"; Path_spec; Discard; Version=0

Generate a Cookie class

We can simply generate a cookie by the definition of a cookie

Import Cookielib

Class Cookie:
def __init__ (self, version, name, value,
Port, Port_specified,
Domain, domain_specified, Domain_initial_dot,
Path, path_specified,
Secure
Expires
Discard,
Comment
Comment_url,
Rest
Rfc2109=false,
):
.....

# Initialization of a cookie

def createcookie (name, value, domain, Expires=none):
Return Cookielib. Cookies (
Version=none,
Name=name,
Value=value,
Port= ' 80 ',
Port_specified=true,
Domain=domain,
Domain_specified=true,
Domain_initial_dot=false,
Path= '/',
Path_specified=true,
Secure=false,
Expires=expires,
Discard=false,
Comment=none,
Comment_url=none,
Rest=none,
Rfc2109=false
)


New_cookie = Createcookie (' Phpgao ', ' Laogao ', ' www.phpgao.com ', ' 1434977736 ')

# Add to Existing cookies

MyCookie = Cookielib. Cookiejar ()
Mycookie.set_cookie (New_cookie)

Iii. Extended Reading


Cookielib and URLLIB2 modules combined to simulate Web site login


1.cookielib Module

The primary role of the Cookielib module is to provide objects that can store cookies to facilitate access to Internet resources in conjunction with the URLLIB2 module. For example, you can use objects from the Cookiejar class in this module to capture cookies and resend them on subsequent connection requests. The Coiokielib module uses the following main objects: Cookiejar, Filecookiejar, Mozillacookiejar, Lwpcookiejar. Their relationship is as follows:



2.URLLIB2 Module


Speaking of the most powerful part of the URLLIB2 module is definitely its opener,

The Openerdirector action class for the URLLIB2 module. This is a class that manages many processing classes (Handler). And all of these Handler classes correspond to the corresponding protocol, or special functions. Each has the following processing class:

Basehandler
Httperrorprocessor
Httpdefaulterrorhandler
Httpredirecthandler
Proxyhandler
Abstractbasicauthhandler
Httpbasicauthhandler
Proxybasicauthhandler
Abstractdigestauthhandler
Proxydigestauthhandler
Abstracthttphandler
HttpHandler
Httpcookieprocessor
Unknownhandler
Filehandler
Ftphandler
Cacheftphandler

Cookielib modules are generally used in conjunction with URLLIB2 modules, mainly used in Urllib2.build_oper () functions as URLLIB2. The Httpcookieprocessor () parameter.

This allows you to use Python to simulate web site logons.

First, write a demo that gets the Cookiejar instance:


1 #!/usr/bin/env python
2 #-*-coding:utf-8-*-
3
4 Import Urllib
5 Import Urllib2
6 Import Cookielib
7
8 #获取Cookiejar对象 (local cookie message exists)
9 cookies = Cookielib. Cookiejar ()
#自定义opener and bind the opener to the Cookiejar object
One opener = Urllib2.build_opener (urllib2. Httpcookieprocessor (CJ))
#安装opener, the opener object that is installed is used after the call to Urlopen ()
Urllib2.install_opener (opener)
14
url = "Http://www.baidu.com"
Urllib2.urlopen (URL)





Then write a way to use the Post method to access the Web site (using URLLIB2 to simulate a post process):


1 #! /usr/bin/env python
2 #coding =utf-8
3
4 Import Urllib2
5 Import Urllib
6 Import Cookielib
7
8 def login ():
9 email = raw_input ("Please enter user name:")
Ten pwd = raw_input ("Please enter password:")
One data={"email": Email, "Password":p WD} #登陆用户名和密码
Post_data=urllib.urlencode (data) #将post消息化成可以让服务器编码的方式
Cj=cookielib. Cookiejar () #获取cookiejar实例
Opener=urllib2.build_opener (URLLIB2. Httpcookieprocessor (CJ))
#自己设置User-agent (can be used to forge access, prevent certain sites from IP injection)
Headers ={"user-agent": "mozilla/4.0" (compatible; MSIE 6.0; Windows NT 5.1 "}
Website = raw_input (' Please enter URL: ')
Req=urllib2. Request (Website,post_data,headers)
Content=opener.open (req)
Print Content.read () #linux下没有gbk编码, only utf-8 encoding
21st
if __name__ = = ' __main__ ':
Login ()


Note that this example has been tested, found that only Renren and happy net of such websites can, and like Alipay, Baidu network disk, or even our school's academic system can not successfully log on, will display the following error message:


Traceback (most recent call last):
File "login.py", line, in <module>
Login ()
File "login.py", line, in login
Content=opener.open (req)
File "/usr/lib/python2.7/urllib2.py", line 406, in open
Response = Meth (req, response)
File "/usr/lib/python2.7/urllib2.py", line 519, in Http_response
' HTTP ', request, response, code, MSG, HDRs
File "/usr/lib/python2.7/urllib2.py", line 444, in error
Return Self._call_chain (*args)
File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
result = Func (*args)
File "/usr/lib/python2.7/urllib2.py", line 527, in Http_error_default
Raise Httperror (Req.get_full_url (), Code, MSG, HDRs, FP)
Urllib2. Httperror:http Error 405:method not allowed


It is possible that these sites do not accept client requests for this method at the time of writing, and I do not know why. And this program can not automatically pass the verification code verification of the site, so purely learn the principle of it.

And then let's take a look at some examples of using Python to simulate logins (from: http://www.nowamagic.net/academy/detail/1302882)


#-*-Coding:utf-8-*-
#!/usr/bin/python

Import Urllib2
Import Urllib
Import Cookielib
Import re

Auth_url = ' http://www.nowamagic.net/'
Home_url = ' http://www.nowamagic.net/';
# Login Username and password
data={
"username": "Nowamagic",
"Password": "Pass"
}
# Urllib to encode
Post_data=urllib.urlencode (data)
# Send header information
Headers ={
"Host": "Www.nowamagic.net",
"Referer": "Http://www.nowamagic.net"
}
# Initialize a cookiejar to process cookies
Cookiejar=cookielib. Cookiejar ()
# Instantiation of a global opener
Opener=urllib2.build_opener (URLLIB2. Httpcookieprocessor (Cookiejar))
# Get Cookies
Req=urllib2. Request (Auth_url,post_data,headers)
result = Opener.open (req)
# Access to the home page automatically with cookie information
result = Opener.open (Home_url)
# Show Results
Print Result.read ()

1. Use existing cookies to access the website


Import Cookielib, Urllib2

Ckjar = Cookielib. Mozillacookiejar (Os.path.join (' C:\Documents and Settings\tom\application data\mozilla\firefox\profiles\ H5m61j1i.default ', ' cookies.txt ')

req = Urllib2. Request (URL, postdata, header)

Req.add_header (' user-agent ', \
' Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)

Opener = Urllib2.build_opener (urllib2. Httpcookieprocessor (Ckjar))

F = Opener.open (req)
htm = F.read ()
F.close ()



2. Visit the website to get cookies and save the cookies in the cookie file


Import Cookielib, Urllib2

req = Urllib2. Request (URL, postdata, header)
Req.add_header (' user-agent ', \
' Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)

Ckjar = Cookielib. Mozillacookiejar (filename)
Ckproc = Urllib2. Httpcookieprocessor (Ckjar)

Opener = Urllib2.build_opener (Ckproc)

F = Opener.open (req)
htm = F.read ()
F.close ()

Ckjar.save (Ignore_discard=true, Ignore_expires=true)



3. Generate cookies using the specified parameters and use this cookie to access the Web site


Import Cookielib, Urllib2

Cookiejar = Cookielib. Cookiejar ()
Urlopener = Urllib2.build_opener (urllib2. Httpcookieprocessor (Cookiejar))
Values = {' redirect ': ', ' email ': ' abc@abc.com ',
' Password ': ' Password ', ' rememberme ': ', ' submit ': ' OK, let Me in! '}
data = Urllib.urlencode (values)

Request = Urllib2. Request (URL, data)
url = urlopener.open (request)
Print Url.info ()
page = Url.read ()

Request = Urllib2. Request (URL)
url = urlopener.open (request)
page = Url.read ()
Print page


Also, add the Urllib2 method:

1.geturl ():

This returns the real URL obtained, which is useful because the Urlopen (or the opener object) may be redirected. The URL you get may be different from the request URL.

URL redirection (URL redirection, or Web site redirection or domain name transfer) refers to the technique of directing a user to another Web site when he or she browses to it. It is often used to turn a long web site into a shorter URL. Because when you want to spread a site's web site, often because the URL is too long, bad memory, but also because of the Internet for free web space, the Web site must be changed, uninformed users also think the site closed. Then you can use the Internet to transfer services. This technique enables a Web page to be linked by a different Uniform Resource Locator (URL).

>>> Import Urllib2
>>> url = "Http://www.baidu.com"
>>> req = urllib2. Request (URL)
>>> response = Urllib2.urlopen (req)
>>> Response.geturl ()
' Http://www.baidu.com '
>>> Print Response.info ()
Date:fri, Mar 2014 03:30:01 GMT
Content-type:text/html
Transfer-encoding:chunked
Connection:close
Vary:accept-encoding
set-cookie:baiduid=af7c001fca87716a52b353c500fc45db:fg=1; Expires=thu, 31-dec-37 23:55:55 GMT; max-age=2147483647; path=/; Domain=.baidu.com
set-cookie:bdsvrtm=0; path=/
set-cookie:h_ps_pssid=1466_5225_5288_5723_4261_4759_5659; path=/; Domain=.baidu.com
p3p:cp= "OTI DSP COR IVA our IND COM"
Expires:fri, Mar 2014 03:29:06 GMT
Cache-control:private
server:bws/1.1
Bdpagetype:1
bdqid:0xea1372bf0001780d
bduserid:0


We can automatically redirect actions (URL redirection) for HTTP 3XX return codes by URLLIB2 by default, without human configuration. To detect if a redirect action has occurred, just check the Response url and the URL of the Request to be consistent.


Import Urllib2
My_url = ' http://www.google.cn '
Response = Urllib2.urlopen (My_url)
redirected = Response.geturl () = = My_url
Print redirected

My_url = ' Http://rrurl.cn/b1UZuP '
Response = Urllib2.urlopen (My_url)
redirected = Response.geturl () = = My_url
Print redirected

Debug Log

When using URLLIB2, you can open the debug Log in the following way, so that the contents of the packet will be printed on the screen, convenient debugging, and sometimes can save the work of grasping the bag


Import Urllib2
HttpHandler = Urllib2. HttpHandler (debuglevel=1)
Httpshandler = Urllib2. Httpshandler (debuglevel=1)
Opener = Urllib2.build_opener (HttpHandler, Httpshandler)
Urllib2.install_opener (opener)
Response = Urllib2.urlopen (' http://www.google.com ')

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.