MOOC "Python web crawler and Information extraction" learning process note "Requests library" the first week of 1-3

Source: Internet
Author: User

A Get Baidu webpage HTML source code:

>>> Import requests>>> r=requests.get ("http://www.baidu.com") >>> R.status_code #查看状态码, 200 means access is successful, other means access failed 200>>> r.encoding= ' utf-8 ' #更改编码为utf-8 encoding >>> R.text  #打印网页内容
>>> r.headers

  

The main methods of the two requests libraries are:
Requests.request () Constructs a request that supports the underlying methods of the following methods

Requests.get () Gets the main method of the HTML page, which corresponds to the get of the HTTP

Requests.head () method to get HTML page header information, corresponding to the head of HTTP

Requests.post () The method of submitting a POST request to an HTML page, corresponding to the post of the HTTP

Requests.put () to HTML want to also submit a put request method, corresponding to the HTTP put

Requests.patch () submits a local modification request to an HTML Web page that corresponds to the patch for HTTP

Requests.delete () submits a delete request to an HTML page that corresponds to the delete of the HTTP

1requests.get ()

R=requests.get ("Web Address") #get ("Web address") constructs a request object that requests resources from the server

#r是一个包含服务器资源的Response对象

Common properties of the three response objects (here is R):

R.status_code the status returned by the HTTP request, 200 means the link is successful, and 404 indicates the failure ... As long as it's not 200, it's a failure.

R.text A string form of the HTTP response content, that is, the content of the page content of the URL

R.encoding How the response content is encoded from the HTTP header

R.apparent_encoding response content encoding from the content (alternate encoding method)

R.content binary form of HTTP response content

>>> Import requests>>> r=requests.get ("http://www.baidu.com") >>> r.status_code200> >> r.text# garbled >>> r.encoding  #从html代码的header查找charste关键字得到的编码方式, if not present charset is considered encoded as Iso-8859-1 ' Iso-8859-1 ' >>> r.apparent_encoding  #分析后得到的网页的正确编码方式 ' utf-8 ' >>> r.encoding= ' utf-8 ' >> > R.text   #打印出了我们想要的格式

A universal code framework for crawling Web pages

1. What is the generic code framework for crawling Web pages?

is a set of code, can be accurately and reliably crawl the Web page

Exceptions to the Requests library:

R.raise_for_status () If not 200, produces an abnormal requests. Httperror

>>> Import Requests   >>> def gethtmltext (URL): Try:r=requests.get (url,timeout=30) r.raise_ For_status () #如果状态不是200, throw Httperror exception R.encoding=r.apparent_encodingreturn R.textexcept:return "Generate exception">> > If __name__== "__main__": Url= "http://www.baidu.com" Print (Gethtmltext (URL))

The red part is the common code framework for crawling Web pages.

  

MOOC "Python web crawler and Information extraction" learning process note "Requests library" the first week of 1-3

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.