A Free Trial That Lets You Build Big!
Start building with 50+ products and up to 12 months usage for Elastic Compute Service
This article is based on RFC2616 (http/1.1 specification), reference
Typically HTTP messages include client-to-server request messages and server-to-client response messages. These two types of messages consist of a starting line, one or more header fields, a blank line that is just the end of the head field, and an optional message body. The header fields of HTTP include the general header, the request header, the response header, and the four parts of the entity header. Each header field consists of a domain name, a colon (:), and a domain value of three parts. Domain names are case-insensitive, you can add any number of whitespace before the domain value, and the header field can be expanded to multiple lines, at the beginning of each line, with at least one space or tab.
Universal header domain (general headers)
The generic header domain contains the header fields that both the request and response messages support, providing the most basic information related to the message, and the generic header domain contains
Connection allows clients and servers to specify options related to request/response connections
Date and time flag indicating when the message was created
Mime-version gives the MIME version used by the sending side
Trailer If the message uses a chunked transmission encoding (chunked transfer encoding), you can use this header to list the header set in the message trailer (Trailer) section
Transfer-encoding tells the receiving end to ensure the reliable transmission of the message, the use of the message encoding method
Upgrade gives the new version and protocol that the sending side might want to "upgrade"
Via shows the intermediary node through which the message was passed (proxy, web un)
The expansion of the universal header domain requires both parties to support this extension, and if there is an unsupported universal header domain, it will generally be handled as the entity header domain. The following is a brief introduction to several common header domains used in UPnP messages.
Cache-control Header Field
CACHE-CONTROL Specifies the caching mechanism that requests and responses follow. Setting Cache-control in a request message or response message does not modify the caching process in another message processing process. Cache directives on request include No-cache, No-store, Max-age, Max-stale, Min-fresh, only-if-cached, directives in response messages including public, private, No-cache, no- Store, No-transform, Must-revalidate, Proxy-revalidate, Max-age. The instructions in each message have the following meanings:
Public indicates that the response can be cached by any buffer.
Private indicates that the entire or partial response message for a single user cannot be shared with the cache. This allows the server to simply describe a partial response message for the user, and this response message is not valid for another user's request.
No-cache indicates that a request or response message cannot be cached
No-store is used to prevent the inadvertent release of important information. Sending in the request message will make the request and response messages do not use the cache.
Max-age indicates that the client can receive a response that is not longer than the specified time (in seconds).
Min-fresh indicates that the client can receive a response that is less than the current time plus a specified time.
Max-stale indicates that the client can receive a response message that exceeds the timeout period. If you specify a value for the Max-stale message, the client can receive a response message that exceeds the specified value for the timeout period.
Date Header Field
The Date header field represents the time the message was sent, and the time description format was defined by RFC822. For example, Date:mon,31dec200104:25:57gmt. The time described by date represents the world standard, which translates into local time and needs to know the time zone in which the user is located.
pragma header field
The pragma header domain is used to contain implementation-specific instructions, most commonly pragma:no-cache. In the http/1.1 protocol, it has the same meaning as Cache-control:no-cache.
The first behavior of the request message is in the following format:
Methodsprequest-urisphttp-versioncrlfmethod indicates that the field is case-sensitive for the method Request-uri completed, including options,, POST, PUT, DELETE , TRACE. The method get and head should be supported by all common Web servers, and the implementation of all other methods is optional. The GET method retrieves the information identified by the Request-uri. The head method also retrieves the information identified by the Request-uri, but does not return the body of the message when the response is available. The Post method can request that the server receive entity information contained in the request, and can be used to submit the form, sending messages to newsgroups, BBS, mail groups, and databases.
The SP represents a space. Request-uri follows the URI format, where the word Cheweishing (*) indicates that the request is not used for a particular resource address, but rather for the server itself. Http-version represents the supported HTTP version, for example, http/1.1. The CRLF represents a newline carriage return character. The request header domain allows the client to pass additional information about the request or about the client to the server. The Request header field may contain the following fields Accept, Accept-charset, accept-encoding, Accept-language, Authorization, from, Host, If-modified-since, If-match, If-none-match, If-range, If-range, If-unmodified-since, Max-forwards, Proxy-authorization, Range, Referer, User-agent. Extensions to the request header domain are supported by both parties, and if an unsupported request header domain exists, it will generally be handled as the entity header domain.
A typical request message:
User-agent:mozilla/4.04[en] (win95;i; NAV)
The first line in the previous example indicates that the HTTP client (possibly a browser, downloader) obtains the file under the specified URL through the Get method. The brown portion represents the information for the Request header field, and the green section represents the General header section.
Host Header Field
The host header domain specifies the intenet host and port number of the requesting resource, and must represent the location of the originating server or gateway that requested the URL. The http/1.1 request must contain the host header domain or the system will return with a 400 status code.
referer Header Field
The Referer header domain allows the client to specify the source resource address of the request URI, which allows the server to generate a fallback list that can be used to log in, optimize the cache, and so on. He also allows the abolition or wrong connection to be traced for maintenance purposes. If the requested URI does not have its own URI address, Referer cannot be sent. If you specify a partial URI address, this address should be a relative address.
Range Header Field
The Range header field can request one or more child ranges of an entity. For example
Represents the first 500 bytes: bytes=0-499
Represents a second 500 byte: bytes=500-999
Represents the last 500 bytes: bytes=-500
Represents the range after 500 bytes: bytes=500-
First and last byte: Bytes=0-0,-1
Specify several ranges at the same time: bytes=500-600,601-999
However, the server can ignore this request header, and if the unconditional get contains a range request header, the response is returned as a status code of 206 (partialcontent) instead of a (OK).
user-agent Header Field
The contents of the User-agent header domain contain the user information that made the request.
The first behavior of the response message is in the following format:
Http-version represents the supported HTTP version, for example, http/1.1. Status-code is a result code of three numbers. Reason-phrase provides a simple text description for Status-code. Status-code is mainly used for machine automatic identification, reason-phrase is mainly used to help users understand. The first number of Status-code defines the category of the response, and the latter two numbers do not have a role to classify. The first number can take 5 different values:
1XX: Information response class, which indicates receipt of request and continues processing
2XX: Handle the successful response class, indicating that the action was successfully received, understood, and accepted
3XX: Redirect Response class, must accept further processing in order to complete the specified action
4XX: Client error, client request contains syntax error or is not executed correctly
5XX: Server error, servers do not correctly execute a correct request
The Response header field allows the server to pass additional information that cannot be placed on the status line, which primarily describes the server's information and Request-uri further information. The Response header field contains age, location, proxy-authenticate, public, Retry-after, Server, Vary, Warning, and Www-authenticate. The expansion of the response header field is required for both sides of the communication, and if there is an unsupported response header field, it will generally be handled as the Entity header field.
A typical response message:
The first line in the previous example represents an HTTP service-side response to a GET method. The brown part represents the Response header field information, the green part represents the General header section, and the red part represents the Entity header field information.
Location response Header
The location response header is used to redirect the recipient to a new URI address.
Server response Header
The server response header contains software information for the originating server that processed the request. This field can contain multiple product identifiers and annotations, and product identities are generally sorted by importance.
Both the request message and the response message can contain entity information, which generally consists of entity header fields and entities. The Entity header field contains the original information about the entity, including allow, Content-base, content-encoding, Content-language, Content-length, Content-location, CONTENT-MD5, Content-range, Content-type, Etag, Expires, Last-modified, Extension-header. Extension-header allows clients to define new entity headers, but these domains may not be recognized by the recipient. An entity can be a coded stream of bytes encoded by content-encoding or Content-type, whose length is defined by content-length or Content-range.
Content-type Solid Head
The Content-type entity header is used to indicate the media type of the entity to the receiver, specify the entity media type that the head method sends to the receiver, or the request media type that the Get method sends Content-range entity header
The Content-range entity header is used to specify the insertion position of a part of the entire entity, and he also indicates the length of the entire entity. When the server returns a partial response to the customer, it must describe the extent of the response coverage and the entire length of the entity. General format:
For example, the transfer header is in the form of a 500-byte secondary field: content-range:bytes0-499/1234 If an HTTP message contains this section (for example, a response to a range request or an overlapping request to a range of ranges), Content-range represents the range of the transfer, The content-length represents the number of bytes actually transferred.
last-modified Solid Head
|Allow||Which request methods are supported by the server (such as GET, post, etc.).|
|Content-encoding||The encoding (Encode) method of the document. The content type specified by the Content-type header can be obtained only after decoding. Using gzip to compress documents can significantly reduce the download time of HTML documents. Java's gzipoutputstream can be easily gzip compressed, but only on Unix Netscape and IE 4, ie 5 on Windows. Therefore, the servlet should check whether the browser supports gzip by looking at the accept-encoding header (that is, Request.getheader ("accept-encoding")). Returns the gzip-compressed HTML page for a browser that supports gzip, returning a normal page for another browser.|
|Content-length||Represents the content length. This data is only required if the browser is using a persistent HTTP connection. If you want to take advantage of the persistent connection, you can write the output document to Bytearrayoutputstram, look at its size when done, then put that value into the Content-length header and finally pass the Bytearraystream.writeto ( Response.getoutputstream () Send content.|
|Content-type||Indicates what MIME type the following document belongs to. The servlet defaults to Text/plain, but it usually needs to be explicitly specified as text/html. Because Content-type is often set up, HttpServletResponse provides a dedicated method Setcontenttyep.|
|Date||The current GMT time. You can use Setdateheader to set this header to avoid the hassle of converting the time format.|
|Expires||When should I think that the document has expired so that it is no longer cached?|
|Last-modified||The last modification time of the document. The customer can provide a date through the If-modified-since request header, which is treated as a conditional get, and only documents that have been modified later than the specified time are returned, otherwise a 304 (not Modified) state is returned. Last-modified can also be set using the Setdateheader method.|
|Location||Indicates where the customer should go to extract the document. Location is usually not set directly, but by HttpServletResponse's Sendredirect method, which sets the status code to 302.|
|Refresh||Indicates how much time the browser should refresh the document, in seconds. In addition to refreshing the current document, you can also pass SetHeader ("Refresh", "5; Url=http://host/path ") lets the browser read the specified page.
Note This functionality is usually done by setting the HTML page in the head area of the
Note that the meaning of refresh is "refresh this page after n seconds or visit the specified page" instead of "refresh this page every n seconds or visit the specified page". Therefore, continuous refresh requires a refresh header to be sent each time, and sending a 204 status code prevents the browser from continuing to refresh, whether it is using the refresh header or the
Note that the refresh header is not part of the HTTP 1.1 formal specification, but rather an extension, but both Netscape and IE support it.
|Server||Server name. The servlet generally does not set this value, but is set by the Web server itself.|
|Set-cookie||Sets the cookie associated with the page. The servlet should not use Response.setheader ("Set-cookie", ...), but should use the dedicated method Addcookie provided by HttpServletResponse. See below for a discussion of cookie settings.|
|Www-authenticate||What type of authorization information should the customer provide in the authorization header? This header is required in an answer that contains a 401 (unauthorized) status line. For example, Response.setheader ("Www-authenticate", "BASIC realm=\" Executives\ "").
Note that the servlet generally does not handle this, but instead gives the Web server a special mechanism to control access to password-protected pages (for example,. htaccess).
Python crawler: HTTP request Header (header) detailed
Start building with 50+ products and up to 12 months usage for Elastic Compute Service