Questions about Urllib and urllib2 in ┱python

Source: Internet
Author: User

Python3 Urllib and Urllib2 were reconstructed.
The main split is divided into:
1,urllib.request
1,urllib.request. Request (URL, Data=none, headers={}, Method=none)

URL = r'Http://www.lagou.com/zhaopin/Python/?labelWords=label'Headers= {'user-agent': R'mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/537.36 (khtml, like Gecko)'R'chrome/45.0.2454.85 safari/537.36 115browser/6.0.3', 'Referer': R'Http://www.lagou.com/zhaopin/Python/?labelWords=label', 'Connection':'keep-alive'}req= Request. Request (URL, headers=headers) Page=Request.urlopen (req). Read () page= Page.decode ('Utf-8')

The data used to wrap the header:
User-agent: This header can carry several pieces of information: Browser name and version number, operating system name and version number, default language
Referer: can be used to prevent hotlinking, there are some Web site image display source http://***. COM, is the check referer to identify the
Connection: Indicates the status of the connection, record session status.
2, Urllib.request. Urlopen (URL, data=none, [Timeout,]*, Cafile=none, Capath=none, Cadefault=false, context=none)
URL : Need to open URL urllib.request.urlopen (' https://www.baidu.com/')

 from   urllib import Request Response  = Request.urlopen (R " http://python.org/" Span style= "COLOR: #800000" > ") # object  at Span style= "COLOR: #800080" >0x00000000048bc908  > httpresponse type page  =  Response.read () page  = Page.decode ( " utf-8   ") 

#此处为何不用page. Encode (' Utf-8 ')
#decode的作用是将其他编码的字符串转换成unicode编码, such as Str1.decode (' gb2312 '), represents the conversion of GB2312 encoded string str1 to Unicode encoding.
#encode的作用是将unicode编码转换成其他编码的字符串, such as Str2.encode (' gb2312 '), represents converting a Unicode-encoded string str2 to gb2312 encoding.
Urlopen provides methods for returning objects:
Read () #读取整个文件,
ReadLine () #每次读取一行内容,
ReadLines () #读取整个文件所有行, saved in a list variable, each row as an element, but reading a large file compares the memory.
Fileno () #方法返回一个整型的文件描述符 (file descriptor FD Integer), which can be used for I/O operations of the underlying operating system.
Close (): Close file
info (): Returns the Httpmessage object that represents the header information returned by the remote server
GetCode (): Returns the HTTP status code. If it is an HTTP request, 200 request completed successfully; 404 URL not Found
Geturl (): Returns the requested URL

Data submitted by Data:post
Timeout: Set the access time-out for a web site
 3, Urllib.request. Proxyhandler ()

data = {         ' First':'true',         'PN':1,         'KD':'Python'}proxy= Request. Proxyhandler ({'http':'5.22.195.215:80'}) # set Proxyopener=Request.build_opener (proxy) # Mount Openerrequest.install_opener (opener) # Install Openerdata= Parse.urlencode (data). Encode ('Utf-8') Age=opener.open (URL, data). Read () page= Page.decode ('Utf-8')returnPage

2, Urllib.urlretrieve (url[, filename[, reporthook[, Data]]):

The Urlretrieve method downloads remote data directly to the local. The parameter filename specifies the path to be saved locally (if the parameter is not specified, Urllib generates a temporary file to hold the data), and the parameter reporthook is a callback function that triggers the callback when the server is connected and the corresponding data block is transferred. We can use this callback function to display the current download progress, as shown in the following example. The parameter data refers to a post to the server. The method returns a tuple of two elements (filename, headers), filename, which represents the local path, and the header represents the server's response header. Here is an example to illustrate the use of this method, this example of the Sina homepage of the HTML crawl to local, saved in the d:/sina.html file, while showing the progress of the download.

Def CBK (A, B, c):  "callback function  @a: Data block  @b: size of data block  @c: Remote file size  ' '  per = 100.0 * A * b/c  if Per >:    per = +  print ('%.2f%% '% per) url = ' http://www.sina.com.cn ' local = ' d://sina.html ' Urllib.urlre Trieve (URL, local, CBK)


3, Urllib.parse
  1, Urllib.parse. UrlEncode ()#将提交的数据encode为byte编码
4, Urllib. Error And so on several sub-modules. #抛出请求错误

Import Urllib fromurllib Import Parse fromurllib import requestdef get_page (URL): Headers= {          'user-agent': R'mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/537.36 (khtml, like Gecko)'R'chrome/45.0.2454.85 safari/537.36 115browser/6.0.3',         'Referer': R'Http://www.lagou.com/zhaopin/Python/?labelWords=label',          'Connection':'keep-alive'} data= {         ' First':'true',         'PN':1,         'KD':'Python'} data= Parse.urlencode (data). Encode ('Utf-8') Req= Request. Request (URL, headers=headers)Try: Page= Request.urlopen (req, data=data). Read () page= Page.decode ('Utf-8') print (page) except error. Httperror asE:print (E.code ()) print (E.read (). Decode ('Utf-8'))    returnPageget_page ('https://www.baidu.com/')

Questions about Urllib and urllib2 in ┱python

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.