Python crawler: HTTP protocol, Requests library, pythonrequests

Source: Internet
Author: User

Python crawler: HTTP protocol, Requests library, pythonrequests

HTTP protocol:

HTTP (Hypertext Transfer Protocol): Hypertext Transfer Protocol. A URL is the Internet path for accessing resources over HTTP. a url corresponds to a data resource.

HTTP operations on resources:

The Requests Library provides all basic HTTP request methods. Introduction: http://www.python-requests.org/en/master

Six main methods of the Requests Library:

Requests library exception:

Two important objects of the Requests Library: Request and Response ). The Request object supports multiple Request methods. The Response object contains all the information returned by the server and the Request information of the Request.

Attributes of the Response object:

R. encoding indicates that if the header does not contain charset, the encoding is ISO-8859-1.

R. raise_for_status () can directly know whether r. status_code is equal to 200.

Comparison between HTTP protocol and Requests Library:

The general code framework for crawling webpages:

1 try: 2 r = requests. get (url, timeout = 30) 3 r. raise_for_status () 4 # If the status is not 200, an HTTPError exception is thrown. encoding = r. apparent_encoding6 return r. text7 handle T: 8 return 'exception occurred'

For example, to obtain the PMCAFF homepage information:

1 import requests 2 3 def getHtmlText (url): 4 try: 5 r = requests. get (url, timeout = 30) 6 r. raise_for_status () 7 r. encoding = r. apparent_encoding 8 return r. text 9 Failed T: 10 return 'produces exception '11 12 if _ name _ =' _ main _ ': 13 url = 'https: // www.pmcaff.com/'14 print (getHtmlText (url ))

Common Code framework for crawling webpages: Mac, Python 3.6, and PyCharm 2016.2

Reference: MOOC course "Python web crawler and information extraction" of Chinese University

----- End -----

Author: du wangdan, Public Account: du wangdan, Internet product manager.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.