Python crawler Knowledge Point--response

Source: Internet
Author: User

Response:

Response status Code, response header, response body

Response Status Code:
                                                **表2-3常见的错误代码及错误原因**
Status Code Description Details
100 Go on The requestor shall continue to make the request. The server returns this code to indicate that the first part of the request was received and is waiting for the remainder.
101 Switching protocols The requestor has asked the server to switch protocols and the server has confirmed and is ready to switch.
200 Success The server has successfully processed the request. Typically, this indicates that the server provided the requested Web page.
201 has been created The request was successful and the server created a new resource.
202 has accepted The server has accepted the request but has not yet processed it.
203 Non-authorized information The server has successfully processed the request, but the information returned may be from another source.
204 No content The server successfully processed the request, but did not return any content.
205 Reset Content The server successfully processed the request, but did not return any content.
206 Part of the content The server successfully processed a partial GET request.
300 Multiple options For requests, the server can perform a variety of operations. The server can select an action based on the requestor (user agent) or provide a list of actions for the requestor to select.
301 Permanently moving The requested page has been permanently moved to a new location. When the server returns this response (a response to a GET or HEAD request), the requestor is automatically forwarded to the new location.
60W Temporary move The server currently responds to the request from a different location, but the requestor should continue to use the original location for future requests.
303 See other locations The server returns this code when the requestor should use a separate GET request for a different location to retrieve the response.
304 Not modified The requested page has not been modified since the last request. When the server returns this response, the Web page content is not returned.
305 Using proxies The requestor can only use the proxy to access the requested Web page. If the server returns this response, it also indicates that the requestor should use the proxy.
307 Temporary redirection The server currently responds to the request from a different location, but the requestor should continue to use the original location for future requests.
400 Error request The server does not understand the syntax of the request.
401 Not authorized Request authentication required. The server may return this response for pages that need to log on.
403 Ban The server rejected the request.
404 Not found The server could not find the requested Web page.
405 Method disables Disables the method specified in the request.
50W Do not accept The requested content attribute could not be used to respond to the requested Web page.
407 Requires proxy authorization This status code is similar to 401 (unauthorized), but specifies that the requestor should authorize the use of the proxy.
408 Request timed out A timeout occurred while the server was waiting for a request.
409 Conflict The server encountered a conflict while completing the request. The server must include information about the conflict in the response.
410 has been removed If the requested resource has been permanently deleted, the server returns this response.
411 Effective length Required The server does not accept requests that do not contain a valid Content-Length header field.
412 Prerequisites not met The server did not meet one of the prerequisites set by the requestor in the request.
413 Request entity is too large The server cannot process the request because the request entity is too large to exceed the processing power of the server.
414 The requested URI is too long The requested URI (usually the URL) is too long and the server cannot process it.
415 Unsupported media types The requested format is not supported by the requested page.
416 Request scope does not meet the requirements If the page cannot provide the requested scope, the server returns this status code.
417 Expectations not met The server does not meet the requirements for the expected Request header field.
500 Server Internal Error The server encountered an error and could not complete the request.
501 Not yet implemented The server does not have the capability to complete the request. For example, this code may be returned when the server does not recognize the request method.
502 Error Gateway The server received an invalid response from the upstream server as a gateway or proxy.
40R Service Not available The server is not currently available (due to overloading or downtime maintenance). Typically, this is only a temporary state.
504 Gateway Timeout The server acts as a gateway or proxy, but does not receive requests from the upstream server in a timely manner.
505 HTTP Version not supported The HTTP protocol version used in the request is not supported by the server.
Response header

The response header contains the server's response to the request, such as Cmene-type. Serve. Set-cookie and so on. Here is a brief description of some common header information.

    • Date: Identifies the time that the response was generated.

    • Last-modifed: Specifies the last modification time of the resource. Public Land will

    • Content-encoding: Specifies the encoding of the response content.

    • Server: Contains information about the server, such as name, version number, and so on.
    • Content-type: Document Type, specifies what data type is returned, such as TEXTHTMI represents the return HTML document, Application/x-javascript represents the return JavaScript file, and Imagefjpeg represents the return picture.
    • Set-cookie: Set Cookies. The Set-cookie in the response header tells the browser that this content needs to be placed in Cooke, and the next request is to bring a cookie request.
    • Expires: Specifies the expiration time of the response, which allows the proxy server or browser to update the loaded content to the cache. When the fruit is accessed again, it can be loaded directly from the cache, reducing server load and compiling short load times.

Response body

The most important thing is the content of the response body. The body data of the response is in the response body, such as when the page is requested, its response body is the HTML code of the Web page: When requesting a picture, its response body is the binary data of the picture. The main parsing content of our crawler request is the response body.

The article is excerpted from Cia Qingcai's "Python3 Network crawler Development Combat"

Python crawler Knowledge Point--response

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.