Python crawler: HTTP protocol, requests library

Source: Internet
Author: User
Tags python web crawler
HTTP protocol:

HTTP (Hypertext Transfer Protocol): The Hypertext Transfer Protocol. URLs are Internet paths that access resources through the HTTP protocol, and a URL corresponds to a data resource.

The HTTP protocol operates on resources:

The Requests library provides all the basic request methods for HTTP. Official Introduction:

The 6 main methods of the requests library are:

Exceptions to the Requests library:

Two important objects of the Requests library: request, Response (corresponding). The request object supports a variety of method requests, and the response object contains all the information returned by the server and also contains the requested request information.

Properties of the Response object:

Where r.encoding refers to: If CharSet is not present in the header, the encoding is considered to be iso‐8859‐1.

R.raise_for_status () can directly know if R.status_code equals 200.

HTTP protocol vs. Requests library:

Common code framework for crawling Web pages:

1 try:2     r = requests.get (Url,timeout =     5) 3 R.raise_for_status () 4     # If the status is not 200, the Httperror exception is raised. r.encoding = R.apparent_encoding6     return r.text7 except:8     return ' produces an exception '

For example, get information about the Pmcaff home page:

1 Import Requests 2  3 def gethtmltext (URL): 4     try:5         r = requests.get (Url,timeout =) 6         r.raise_for_ Status () 7         r.encoding = r.apparent_encoding 8         return r.text 9     except:10         return ' generate exception ' if __name __ = = ' __main__ ':     url = ' print '     (gethtmltext (URL))

Common code framework for crawling Web pages: Operating environment: Mac,python 3.6,pycharm 2016.2

Reference: Chinese University MOOC course "Python web crawler and Information extraction"

-----End-----

Author: Du Wangdan, public number: Du Wangdan, Internet Product Manager.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.