cookielib Introduction
I. Core class
Cookie
This class implements the cookie standard defined by the Netscape and RFC 2965 cookies, which can be understood as a single cookie data. The
section of the code is as follows, many of the properties are not familiar?
Self.domain_initial_dot = Domain_initial_dot
Self.path = Path
self.path_specified = Path_specified
self.secure = Secure
self.expires = Expires
self.discard = Discard
self.comment = Comment
Self.comment_url = Comment_url
self.rfc2109 = rfc2109
Cookiepolicy
The primary function of this class is to send and receive cookies, that is, to ensure that the correct cookie is sent to the corresponding domain name, and vice versa.
Defaultcookiepolicy
This class implements the Cookiepolicy interface.
Cookiejar
Cookiejar is a collection of cookies that can contain a lot of cookie classes and is our main object of action. There are a number of ways to support more detailed operation!
Filecookiejar
This class inherits from Cookiejar,cookiejar just completes its lifecycle in memory, Filecookiejar subclasses can implement data persistence, define save, load, revert three interfaces.
Mozillacookiejar & Lwpcookiejar
Two implementation classes, the inheritance relationship is as follows:
Mozillacookiejar & Lwpcookiejar
II. Use
Simple example
A simple code of use
#!/usr/bin/env python
# Encoding:utf-8
Import requests
Import Cookielib
url = ' http://www.baidu.com/'
Jar = Cookielib. Lwpcookiejar (' Cookie.txt ')
# Try to load cookies
# Ask a question, why do you want to add the Ignore_discard attribute?
Try
Jar.load (Ignore_discard=true)
Except
Pass
# Create a session
s = requests. Session ()
# Set Headers and cookies
S.headers = {' user-agent ': ' mozilla/5.0 ' (Macintosh; Intel Mac OS X 10_7_2) applewebkit/537.36 (khtml, like Gecko) chrome/27.0.1453.93 safari/537.36 '}
S.cookies = Jar
# Access URL
r = S.get (URL)
# Persistent Cookies
Jar.save (Ignore_discard=true)
# Print Cookies
For item in jar:
print ' Cookie name:%s----Value:%s '% (Item.name, Item.value)
We get the following cookie
Cat Cookie.txt
#LWP-cookies-2.0
Set-cookie3:baiduid= "2f5340b39928231aa09353cdae3da14d:fg=1"; Path= "/"; Domain= ". Baidu.com"; Path_spec; Domain_dot; expires= "2083-07-09 16:27:51z"; Version=0
set-cookie3:bidupsid=2f5340b39928231aa09353cdae3da14d; Path= "/"; Domain= ". Baidu.com"; Path_spec; Domain_dot; expires= "2083-07-09 16:27:51z"; Version=0
Set-cookie3:h_ps_pssid=14872_1457_14412_14509_14444_12826_10812_14430_12868_14871_12723_14962_14919_14902_ 15384_12095_13937_15963; Path= "/"; Domain= ". Baidu.com"; Path_spec; Domain_dot; Discard; Version=0
set-cookie3:pstm=1434892424; Path= "/"; Domain= ". Baidu.com"; Path_spec; Domain_dot; expires= "2083-07-09 16:27:51z"; Version=0
set-cookie3:bdsvrtm=0; Path= "/"; Domain= "www.baidu.com"; Path_spec; Discard; Version=0
set-cookie3:bd_home=0; Path= "/"; Domain= "www.baidu.com"; Path_spec; Discard; Version=0
Generate a Cookie class
We can simply generate a cookie by the definition of a cookie
Import Cookielib
Class Cookie:
def __init__ (self, version, name, value,
Port, Port_specified,
Domain, domain_specified, Domain_initial_dot,
Path, path_specified,
Secure
Expires
Discard,
Comment
Comment_url,
Rest
Rfc2109=false,
):
.....
# Initialization of a cookie
def createcookie (name, value, domain, Expires=none):
Return Cookielib. Cookies (
Version=none,
Name=name,
Value=value,
Port= ' 80 ',
Port_specified=true,
Domain=domain,
Domain_specified=true,
Domain_initial_dot=false,
Path= '/',
Path_specified=true,
Secure=false,
Expires=expires,
Discard=false,
Comment=none,
Comment_url=none,
Rest=none,
Rfc2109=false
)
New_cookie = Createcookie (' Phpgao ', ' Laogao ', ' www.phpgao.com ', ' 1434977736 ')
# Add to Existing cookies
MyCookie = Cookielib. Cookiejar ()
Mycookie.set_cookie (New_cookie)
Iii. Extended Reading
Cookielib and URLLIB2 modules combined to simulate Web site login
1.cookielib Module
The primary role of the Cookielib module is to provide objects that can store cookies to facilitate access to Internet resources in conjunction with the URLLIB2 module. For example, you can use objects from the Cookiejar class in this module to capture cookies and resend them on subsequent connection requests. The Coiokielib module uses the following main objects: Cookiejar, Filecookiejar, Mozillacookiejar, Lwpcookiejar. Their relationship is as follows:
2.URLLIB2 Module
Speaking of the most powerful part of the URLLIB2 module is definitely its opener,
The Openerdirector action class for the URLLIB2 module. This is a class that manages many processing classes (Handler). And all of these Handler classes correspond to the corresponding protocol, or special functions. Each has the following processing class:
Basehandler
Httperrorprocessor
Httpdefaulterrorhandler
Httpredirecthandler
Proxyhandler
Abstractbasicauthhandler
Httpbasicauthhandler
Proxybasicauthhandler
Abstractdigestauthhandler
Proxydigestauthhandler
Abstracthttphandler
HttpHandler
Httpcookieprocessor
Unknownhandler
Filehandler
Ftphandler
Cacheftphandler
Cookielib modules are generally used in conjunction with URLLIB2 modules, mainly used in Urllib2.build_oper () functions as URLLIB2. The Httpcookieprocessor () parameter.
This allows you to use Python to simulate web site logons.
First, write a demo that gets the Cookiejar instance:
1 #!/usr/bin/env python
2 #-*-coding:utf-8-*-
3
4 Import Urllib
5 Import Urllib2
6 Import Cookielib
7
8 #获取Cookiejar对象 (local cookie message exists)
9 cookies = Cookielib. Cookiejar ()
#自定义opener and bind the opener to the Cookiejar object
One opener = Urllib2.build_opener (urllib2. Httpcookieprocessor (CJ))
#安装opener, the opener object that is installed is used after the call to Urlopen ()
Urllib2.install_opener (opener)
14
url = "Http://www.baidu.com"
Urllib2.urlopen (URL)
Then write a way to use the Post method to access the Web site (using URLLIB2 to simulate a post process):
1 #! /usr/bin/env python
2 #coding =utf-8
3
4 Import Urllib2
5 Import Urllib
6 Import Cookielib
7
8 def login ():
9 email = raw_input ("Please enter user name:")
Ten pwd = raw_input ("Please enter password:")
One data={"email": Email, "Password":p WD} #登陆用户名和密码
Post_data=urllib.urlencode (data) #将post消息化成可以让服务器编码的方式
Cj=cookielib. Cookiejar () #获取cookiejar实例
Opener=urllib2.build_opener (URLLIB2. Httpcookieprocessor (CJ))
#自己设置User-agent (can be used to forge access, prevent certain sites from IP injection)
Headers ={"user-agent": "mozilla/4.0" (compatible; MSIE 6.0; Windows NT 5.1 "}
Website = raw_input (' Please enter URL: ')
Req=urllib2. Request (Website,post_data,headers)
Content=opener.open (req)
Print Content.read () #linux下没有gbk编码, only utf-8 encoding
21st
if __name__ = = ' __main__ ':
Login ()
Note that this example has been tested, found that only Renren and happy net of such websites can, and like Alipay, Baidu network disk, or even our school's academic system can not successfully log on, will display the following error message:
Traceback (most recent call last):
File "login.py", line, in <module>
Login ()
File "login.py", line, in login
Content=opener.open (req)
File "/usr/lib/python2.7/urllib2.py", line 406, in open
Response = Meth (req, response)
File "/usr/lib/python2.7/urllib2.py", line 519, in Http_response
' HTTP ', request, response, code, MSG, HDRs
File "/usr/lib/python2.7/urllib2.py", line 444, in error
Return Self._call_chain (*args)
File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
result = Func (*args)
File "/usr/lib/python2.7/urllib2.py", line 527, in Http_error_default
Raise Httperror (Req.get_full_url (), Code, MSG, HDRs, FP)
Urllib2. Httperror:http Error 405:method not allowed
It is possible that these sites do not accept client requests for this method at the time of writing, and I do not know why. And this program can not automatically pass the verification code verification of the site, so purely learn the principle of it.
And then let's take a look at some examples of using Python to simulate logins (from: http://www.nowamagic.net/academy/detail/1302882)
#-*-Coding:utf-8-*-
#!/usr/bin/python
Import Urllib2
Import Urllib
Import Cookielib
Import re
Auth_url = ' http://www.nowamagic.net/'
Home_url = ' http://www.nowamagic.net/';
# Login Username and password
data={
"username": "Nowamagic",
"Password": "Pass"
}
# Urllib to encode
Post_data=urllib.urlencode (data)
# Send header information
Headers ={
"Host": "Www.nowamagic.net",
"Referer": "Http://www.nowamagic.net"
}
# Initialize a cookiejar to process cookies
Cookiejar=cookielib. Cookiejar ()
# Instantiation of a global opener
Opener=urllib2.build_opener (URLLIB2. Httpcookieprocessor (Cookiejar))
# Get Cookies
Req=urllib2. Request (Auth_url,post_data,headers)
result = Opener.open (req)
# Access to the home page automatically with cookie information
result = Opener.open (Home_url)
# Show Results
Print Result.read ()
1. Use existing cookies to access the website
Import Cookielib, Urllib2
Ckjar = Cookielib. Mozillacookiejar (Os.path.join (' C:\Documents and Settings\tom\application data\mozilla\firefox\profiles\ H5m61j1i.default ', ' cookies.txt ')
req = Urllib2. Request (URL, postdata, header)
Req.add_header (' user-agent ', \
' Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
Opener = Urllib2.build_opener (urllib2. Httpcookieprocessor (Ckjar))
F = Opener.open (req)
htm = F.read ()
F.close ()
2. Visit the website to get cookies and save the cookies in the cookie file
Import Cookielib, Urllib2
req = Urllib2. Request (URL, postdata, header)
Req.add_header (' user-agent ', \
' Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
Ckjar = Cookielib. Mozillacookiejar (filename)
Ckproc = Urllib2. Httpcookieprocessor (Ckjar)
Opener = Urllib2.build_opener (Ckproc)
F = Opener.open (req)
htm = F.read ()
F.close ()
Ckjar.save (Ignore_discard=true, Ignore_expires=true)
3. Generate cookies using the specified parameters and use this cookie to access the Web site
Import Cookielib, Urllib2
Cookiejar = Cookielib. Cookiejar ()
Urlopener = Urllib2.build_opener (urllib2. Httpcookieprocessor (Cookiejar))
Values = {' redirect ': ', ' email ': ' abc@abc.com ',
' Password ': ' Password ', ' rememberme ': ', ' submit ': ' OK, let Me in! '}
data = Urllib.urlencode (values)
Request = Urllib2. Request (URL, data)
url = urlopener.open (request)
Print Url.info ()
page = Url.read ()
Request = Urllib2. Request (URL)
url = urlopener.open (request)
page = Url.read ()
Print page
Also, add the Urllib2 method:
1.geturl ():
This returns the real URL obtained, which is useful because the Urlopen (or the opener object) may be redirected. The URL you get may be different from the request URL.
URL redirection (URL redirection, or Web site redirection or domain name transfer) refers to the technique of directing a user to another Web site when he or she browses to it. It is often used to turn a long web site into a shorter URL. Because when you want to spread a site's web site, often because the URL is too long, bad memory, but also because of the Internet for free web space, the Web site must be changed, uninformed users also think the site closed. Then you can use the Internet to transfer services. This technique enables a Web page to be linked by a different Uniform Resource Locator (URL).
>>> Import Urllib2
>>> url = "Http://www.baidu.com"
>>> req = urllib2. Request (URL)
>>> response = Urllib2.urlopen (req)
>>> Response.geturl ()
' Http://www.baidu.com '
>>> Print Response.info ()
Date:fri, Mar 2014 03:30:01 GMT
Content-type:text/html
Transfer-encoding:chunked
Connection:close
Vary:accept-encoding
set-cookie:baiduid=af7c001fca87716a52b353c500fc45db:fg=1; Expires=thu, 31-dec-37 23:55:55 GMT; max-age=2147483647; path=/; Domain=.baidu.com
set-cookie:bdsvrtm=0; path=/
set-cookie:h_ps_pssid=1466_5225_5288_5723_4261_4759_5659; path=/; Domain=.baidu.com
p3p:cp= "OTI DSP COR IVA our IND COM"
Expires:fri, Mar 2014 03:29:06 GMT
Cache-control:private
server:bws/1.1
Bdpagetype:1
bdqid:0xea1372bf0001780d
bduserid:0
We can automatically redirect actions (URL redirection) for HTTP 3XX return codes by URLLIB2 by default, without human configuration. To detect if a redirect action has occurred, just check the Response url and the URL of the Request to be consistent.
Import Urllib2
My_url = ' http://www.google.cn '
Response = Urllib2.urlopen (My_url)
redirected = Response.geturl () = = My_url
Print redirected
My_url = ' Http://rrurl.cn/b1UZuP '
Response = Urllib2.urlopen (My_url)
redirected = Response.geturl () = = My_url
Print redirected
Debug Log
When using URLLIB2, you can open the debug Log in the following way, so that the contents of the packet will be printed on the screen, convenient debugging, and sometimes can save the work of grasping the bag
Import Urllib2
HttpHandler = Urllib2. HttpHandler (debuglevel=1)
Httpshandler = Urllib2. Httpshandler (debuglevel=1)
Opener = Urllib2.build_opener (HttpHandler, Httpshandler)
Urllib2.install_opener (opener)
Response = Urllib2.urlopen (' http://www.google.com ')