Python crawler Knowledge Point--Request

Last Update:2018-07-29 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Request

: Request method, request URL, request header, request body

Request Method:

Common: GET, POST

The main differences between get and post are:

The GET request parameters are included in the URL and can be seen from the URL. The URL of the POST request does not contain parameters, it is transmitted through the form, contained in the request body

Get request submits up to 1024 bytes of data, post is unlimited

Others include head, PUT, DELETE, CONNECT, OPTIONS, TRACE

Requested URL:

That is, the URL, the resource we want to request

Request Header:

Used to describe the server to use the additional information, compared to the total need to have cookies, Referer, user-agent and so on.

Accept: The Request header field, which specifies which types of information the client can accept.

Accept-language: Specifies the language type that the client can accept.

Accept-encoding: Specifies the content encoding acceptable to the client.

host: Used to specify the host IP and port number of the requested resource, which is the bit of the original server or gateway that requested the URL. Starting with the HTTP1.1 version, the request must contain this content.

Cookie: Also commonly used in plural form cookies, is the data that the Web site stores in user local for identify user for session tracking . Its primary function is to maintain the current access session. For example, when we lose a user name and password to successfully log in to a website, the server will use the session to save the login status information, each time we refresh or request other pages of the site, we will find that the log status, this is the credit of the cookies. Cookies have information that identifies the session of the server we correspond to, each time the browser requests the page of the site, it adds cookies to the request header and sends it to the server, and the server identifies itself by means of cookies and detects that the current state is logged in. So the return result is the page content that can be seen after login.

Referer: This content is used to identify which page the request was sent from, the server can get this information and do the corresponding processing, such as source statistics, anti-theft chain processing and so on. The

user-agent: , called UA, is a special string header that enables the server to identify the operating system and version, browser, and version of the client. This information can be disguised as a browser when making a crawler, and if not added, it is likely to be identified as a crawler

Content-type: Also known as the Internet media type (Intermet media type) or MIME type, in the HTTP protocol message header, It is used to represent the media type information in a specific request. More correspondence can see this pair of tables: Http://tool.oschina.net/commons

file name extension	Content-type (Mime-type)
. html,. htx,. htm	Text/html
. gif	Image/gif
. JSON	Application/json

? the relationship between Content-type and the way post submits data

Content-type	How data is submitted
Appication/x-www-urlencodeed	form data
Multipart/form-data	Form File Upload
Applicatiobn/json	Serialization of JSON data
Text/html	XML data

If you construct a POST request, you need to use the correct content-type, or it may cause the post to fail to respond properly after submission

==> therefore, the request header is an important part of the request, and in most cases the request header needs to be set when the crawler is written.

Request Body:

The content that is normally hosted by the request body is the form data in the POST request, and the request body is empty for the GET request.

The article is excerpted from Cia Qingcai's "Python3 Network crawler Development Combat"

Python crawler Knowledge Point--Request

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python crawler Knowledge Point--Request

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python crawler Knowledge Point--Request

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support