1---urllib2 is a very powerful Python network resource access module that functions like a urllib module
The Urllib2 module in the Python standard library can be said to be an upgraded, complex version of the Urlib module that does not need to be downloaded separately.
For example, access to network resources requires HTTP authentication,
cookie information is required ,
Like a normal browser to access the network, Web resources
This time with URLLIB2
2---URLLIB2 module introduction
1) Set the timeout time-out setting:
Import Urllib2
Test=urllib2.urlopen (' http://www.iplaypy.com/', timeout=15)
#2个参数, one is the URL address, one is time-out, this time the test set value is 15
2) Added header header information when accessing
header={"user-agent": "mozilla-firefox24.0"} #字典类型
Urllib2.urlopen (Url,header)
Like the above operation, you can add header headers to mimic browser behavior, to deal with some network resources to prohibit crawlers, very practical
3) Get HTTP page status code with URLLIB2
Import Urllib2
Test=urllib2.urlopen ("http://baidu.com/")
Test.code
So you can access Baidu's page status code, 200 proof to access, get Web content
4) Use URLLIB2 to process cookies
Import Urllib2
Import Cookielib
Cookie=cookielib. Cookiejar () #后面函数方法要注意C和J是大写的
Opener=urllib2.build_opener (URLLIB2. Httpcookieprocessor (Cookie))
Response=opener.open (' http://www.baidu.com ')
For item in Cookie:
If item.name== "Some--cookie_item_name"
Print Item.value
5) Urlopen ()----is the processing entry function, gets the Openerdiretor object, calls Opener.open ()
The default Opendiretor object is stored in the variable _open, using singleton mode
Build_opener ()----
Install_opener ()---Save the Openerdirector object in the variable _opener as the default opener use
Class Openerdirector
Class Resquest---as an information object, saving and URL-related parameters, including headers,data,proxy, for URL parameter passing
Class HttpHandler---inheritors relationship: basehandler-->abstrachttphandler-->httphandler
Call Httplib. Httpconnection completion of HTTP processing
17.3.12--URLLIB2 Module