I. Urllib MODULE INTRODUCTION
Import Urllib
Let's look at a small example and print the result as a socket connection
Example one:
Import urlliburl=r ' http://www.baidu.com ' fp=urllib.urlopen (URL) print FP
>>>
<addinfourl at 43317888 whose FP = <socket._fileobject object at 0x02947530>>
>>>
1. Basic operation
Urlopen return object Provide method
Read (), ReadLine (), ReadLines (), close () The methods are used exactly as the file object
Info () returns a httplib. Httpmessage object that represents the header information returned by the remote server
GetCode () returns HTTP status code, if HTTP request, 200 request completed successfully, 404 request URL not found
Geturl () returns the requested URL
Example two:
Import urlliburl=r ' http://www.baidu.com ' fp=urllib.urlopen (URL) #print fp.read () #打印网页内容, equivalent to a large string, and right-click to view the page source code effect # Print Fp.readline () #打印一行内容, is an element #print fp.readlines () #全部内容以列表的形式打印出来print fp.info () #打印头信息print Fp.getcode () # Print return status code print Fp.geturl () #打印请求的urlfp. Close () #关闭连接
2.urllib.urlretrieve
Temporary storage urllib.urlretrieve (URL)
Local storage urllib.urlretrieve (URL, ' absolute path of file ')
Example three:
Import urlliburl=r ' http://www.baidu.com ' filename=urllib.urlretrieve (URL) print type (filename) #元组类型print filename # The first parameter is a temporary file path, and the second parameter represents the server's response header information Filename1=urllib.urlretrieve (URL, ' baidu.html ') #在统计目录下生成一个baidu. html file
>>>
<type ' tuple ' >
(' C:\\users\\zhao\\appdata\\local\\temp\\tmpzjmhu6 ',
>>>
3.urllib.cleanup
Clears the cache generated by Urllib.urlretrieve ()
4.urllib.quot and Urllib.quote_plus
The difference is whether the symbol is decoded/
Example four:
Import urlliburl=r ' Http://www.baidu.com/[email protected]#/' #url编码print urllib.quote (URL) print Urllib.quote_plus ( URL) #url解码print urllib.unquote (urllib.quote (URL)) Print urllib.unquote_plus (urllib.quote_plus (URL))
>>>
http%3a//www.baidu.com/%21%40%23/
http%3a%2f%2fwww.baidu.com%2f%21%40%23%2f
Http://www.baidu.com/[email protected]#/
Http://www.baidu.com/[email protected]#/
>>>
5.urllib send get and POST requests
The difference between the two ways is as follows: (Paste from Baidu know, the individual think summary of more comprehensive)
①get is to obtain data from the server, post is to send data to the server;
②get is to add the parameter data queue to the URL that the Action property of the submission form refers to, and the value corresponds to the field one by one in the form, which is visible in the URL. Post is the HTTP post mechanism that places the fields within the form with their contents in the HTML header, along with the URL address referred to by the Action property. The user does not see the process;
③ for Get mode, the server side uses Request.QueryString to get the value of the variable, for the Post method, the server side uses Request.Form to obtain the submitted data;
The amount of data transmitted by ④get is small and cannot be greater than 2KB. Post transmits a large amount of data, which is generally not restricted by default. But theoretically, the maximum amount of IIS4 is 100KB in 80KB,IIS5;
⑤get security is very low and post security is high. But the execution efficiency is better than the Post method.
Suggestions:
The security of the ①get method is less than the Post method, including confidential information, it is recommended to use Post data submission method;
② in the data query, it is recommended to use Get method, and in the data to add, modify or delete, it is recommended to use POST method.
Example five:
Suppose we have a local test.php code as follows:
<?phpprint_r ("Get params:\n"), Var_dump ($_get);p rint_r ("Post params:\n"); Var_dump ($_post);
Now send the Get and POST requests:
Import urlliburl=r ' http://127.0.0.1:8888/test.php ' Get_params=urllib.urlencode ({' name ': ' Zhzhgo ', ' age ': +}) post_ Params=urllib.urlencode ({' I ': 1, ' J ': 2}) #print Get_params,type (get_params) #age =25&name=zhzhgo <type ' str ' >fp=urllib.urlopen (url+ "?") +get_params,post_params) Print Fp.read ()
>>>
Get params:
Array (2) {
["Age"]=>
String (2) "25"
["Name"]=>
String (6) "Zhzhgo"
}
Post params:
Array (2) {
["I"]=>
String (1) "1"
["J"]=>
String (1) "2"
}
>>>
Two. URLLIB2 module
Urllib and URLLIB2 need to be used together, the difference is as follows:
①urllib2 can accept the request class to set URL requests Headers,urllib can only accept URLs;
②urllib provides the UrlEncode method to generate the request parameter string, URLLIB2 not;
③urllib2 only quote no quote_plus related methods;
④urllib2 no Urlretrieve method.
Example SIX:
#应用urllib2接受Request类来设置url请求的headers来跳过登陆import urllib,urllib2import base64url=r ' http://127.0.0.1:8080/test.html The ' #strip () removes the whitespace character by default, at which point the newline character (including ' \ n ', ' \ R ', ' \ t ', ') base64string=base64.encodestring (' admin:123456 ') is removed. Strip () Authheader= "Basic" +base64stringreq=urllib2. Request (URL) req.add_header (' Authorization ', Authheader) Urllib2.urlopen (req)
Three. Urlparse Module
The Urlparse module mainly splits the URL into 6 parts and returns the tuple
Example Seven:
Import urlparseurl=r ' http://www.baidu.com/user/index.html;18?name=zhzhgo&age=25! #8888 ' res=urlparse.urlparse (URL) print Resprint res[0]print res[1]
>>>
Parseresult (scheme= ' http ', netloc= ' www.baidu.com ', path= '/user/index.html ', params= ' + ', query= ' name=zhzhgo& age=25! ', fragment= ' 8888 ')
http
Www.baidu.com
>>>
This article from "Today's efforts, tomorrow's success!" "Blog, be sure to keep this provenance http://zhzhgo.blog.51cto.com/10497096/1678954
Python Network programming