Python crawler Knowledge Point--Request

Source: Internet
Author: User

Request

: Request method, request URL, request header, request body

Request Method:

Common: GET, POST

The main differences between get and post are:

  • The GET request parameters are included in the URL and can be seen from the URL. The URL of the POST request does not contain parameters, it is transmitted through the form, contained in the request body
  • Get request submits up to 1024 bytes of data, post is unlimited

Others include head, PUT, DELETE, CONNECT, OPTIONS, TRACE

Requested URL:

That is, the URL, the resource we want to request

Request Header:

Used to describe the server to use the additional information, compared to the total need to have cookies, Referer, user-agent and so on.

Accept: The Request header field, which specifies which types of information the client can accept.

Accept-language: Specifies the language type that the client can accept.

Accept-encoding: Specifies the content encoding acceptable to the client.

 host: Used to specify the host IP and port number of the requested resource, which is the bit    of the original server or gateway that requested the URL. Starting with the HTTP1.1 version, the request must contain this content.

Cookie: Also commonly used in plural form cookies,  is the data that the Web site stores in user local for identify user for session tracking . Its primary function is to maintain the current access session. For example, when we lose a user name and password to successfully log in to a website, the server will use the session to save the login status information, each time we refresh or request other pages of the site, we will find that the log status, this is the credit of the cookies. Cookies have information that identifies the session of the server we correspond to, each time the browser requests the page of the site, it adds cookies  to the request header and sends it to the server, and the server identifies itself by means of cookies and detects that the current state is logged in. So the return result is the page content that can be seen after login.

Referer: This content is used to identify which page the request was sent from, the server can get this information and do the corresponding processing, such as source statistics, anti-theft chain processing and so on. The

user-agent: , called UA, is a special string header that enables the server to identify the operating system and version, browser, and version of the client. This information can be disguised as a browser when making a crawler, and if not added, it is likely to be identified as a crawler

Content-type: Also known as the Internet media type (Intermet media type) or MIME type, in the HTTP protocol message header, It is used to represent the media type information in a specific request. More correspondence can see this pair of tables: Http://tool.oschina.net/commons

file name extension Content-type (Mime-type)
. html,. htx,. htm Text/html
. gif Image/gif
. JSON Application/json

? the relationship between Content-type and the way post submits data

Content-type How data is submitted
Appication/x-www-urlencodeed form data
Multipart/form-data Form File Upload
Applicatiobn/json Serialization of JSON data
Text/html XML data

If you construct a POST request, you need to use the correct content-type, or it may cause the post to fail to respond properly after submission

==> therefore, the request header is an important part of the request, and in most cases the request header needs to be set when the crawler is written.

Request Body:

The content that is normally hosted by the request body is the form data in the POST request, and the request body is empty for the GET request.

The article is excerpted from Cia Qingcai's "Python3 Network crawler Development Combat"

Python crawler Knowledge Point--Request

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.