Full HTTP protocol Overview

Source: Internet
Author: User
Tags response code rfc domain server

HTTP (Hypertext Transfer Protocol) is a non-stateful, application-level protocol based on request and response patterns, often based on TCP connection, The mature version is a continuous connection mechanism in the HTTP1.0 and 1.1,http1.1 versions, and the vast majority of web development is a Web application built on top of the HTTP protocol.
The HTTP protocol sends content in plaintext, does not provide data encryption in any way, and if an attacker intercepts a transmission message between a Web browser and a Web server, it can read the information directly, so the HTTP protocol is not suitable for transmitting sensitive information such as credit card numbers, passwords, etc.
HTTPS (hypertext Transfer Protocol over secure Socket Layer) is a security-targeted HTTP channel and is simply a secure version of HTTP. HTTPS joins the SSL protocol based on HTTP, which relies on certificates to verify the identity of the server and encrypt communication between the browser and the server.

1. HTTP URL

A URL is a special type of URI that contains enough information to look up a resource in the following format:

http[s]://host[":"port][abs_path]

    • HTTP indicates that network resources are to be located through the HTTP protocol, and HTTPS is used
    • Host represents a legitimate Internet host domain name or IP address;

If Abs_path is not given in the URL, it must be given as a "/" when it is the request URI, which is usually done automatically by the working browser.
When input: www.baidu.com
Browser automatically converted to: https://www.baidu.com/

2. HTTP requests

The HTTP request consists of three parts: the request line, the message header, the request body

Request Line<CRLF>Header-Nameheader-value<CRLF>Header-Nameheader-value<CRLF>//一个或多个,均以<CRLF>结尾<CRLF>body//请求正文

1. The request line begins with a method symbol, separated by a space, followed by the requested URI and version of the Protocol, in the following format:

Method Request-URI HTTP-Version CRLF

Where method means the request, Request-uri is a Uniform Resource identifier, http-version represents the HTTP protocol version of the request, CRLF indicates carriage return and newline (except for the CRLF at the end, a separate CR or LF character is not allowed).

2, the request method (all the methods are capitalized) There are many, the explanation of each method is as follows:

    • Get request gets the resource identified by the Request-uri
    • Post appends new data to the resource identified by Request-uri
    • HEAD request Gets the response message header for the resource identified by Request-uri
    • PUT Request server stores a resource and uses Request-uri as its identity
    • Delete Request server deletes the resource identified by the Request-uri
    • TRACE requests the server to echo received request information, primarily for testing or diagnostics
    • CONNECT reserved for future use
    • Options request the performance of the query server, or query for resource-related choices and requirements

Application Examples:
Get method: When the Web page is accessed by entering a URL in the address bar of the browser, the browser uses the Get method to fetch resources to the server eg:
GET /form.html HTTP/1.1 (CRLF)

The Post method requires the requested server to accept the data appended to the request and is often used to submit the form. eg

POST /reg.jsp HTTP/ (CRLF)Accept:image/gif,image/x-xbit,... (CRLF)...HOST:www.guet.edu.cn (CRLF)Content-Length:22 (CRLF)Connection:Keep-Alive (CRLF)Cache-Control:no-cache (CRLF)(CRLF)         //该CRLF表示消息报头已经结束,在此之前为消息报头user=jeffrey&pwd=1234  //此行以下为提交的数据

The head method is almost the same as the Get method, and for the response part of the head request, the information contained in the HTTP header is the same as the information obtained through the GET request. Using this method, you can obtain information about the resources identified by Request-uri without transmitting the entire resource content. This method is commonly used to test the validity of hyperlinks, whether they can be accessed, and whether they have been updated recently.
3. Request Header
See blog: http://blog.csdn.net/u010487568/article/details/17394089.
Common Request Headers
Accept
The Accept Request header field is used to specify which types of information the client accepts. Eg:accept:image/gif, indicating that the client wants to accept a resource in GIF image format; accept:text/html, indicating that the client wants to accept HTML text.
Accept-charset
The Accept-charset request header field is used to specify the character set accepted by the client. eg:accept-charset:iso-8859-1,gb2312. If the field is not set in the request message, the default is to accept any character set.
Accept-encoding
The Accept-encoding request header field is similar to accept, but it is used to specify acceptable content encoding. Eg:accept-encoding:gzip.deflate. If the domain server is not set in the request message, the client is assumed to be acceptable for various content encodings.
Accept-language
The Accept-language request header field is similar to accept, but it is used to specify a natural language. EG:ACCEPT-LANGUAGE:ZH-CN. If the header field is not set in the request message, the server assumes that the client is acceptable for each language.
Authorization
The authorization request header domain is primarily used to prove that a client has permission to view a resource. When a browser accesses a page, if a response code of 401 (unauthorized) is received from the server, a request containing the authorization request header domain can be sent, requiring the server to validate it.
Host (the header field is required when the request is sent)
The host request header domain is primarily used to specify the Internet host and port number of the requested resource, which is usually extracted from the HTTP URL, eg:
We enter in the browser: http://www.guet.edu.cn/index.html
In the request message sent by the browser, the host Request header field is included, as follows:
Host:www.guet.edu.cn
The default port number 80 is used here, and if a port number is specified, it becomes: Host:www.guet.edu.cn: Specify port number
User-agent: Browser identification information.

3. HTTP response

After receiving and interpreting the request message, the server returns an HTTP response message. The HTTP response is also made up of three parts: the status line, the message header, and the response body:

Response Line<CRLF>Header-Nameheader-value<CRLF>Header-Nameheader-value<CRLF>//一个或多个,均以<CRLF>结尾<CRLF>body//响应正文

1, the status line format is as follows:

HTTP-Version Status-Code Reason-Phrase CRLF

Where http-version represents the version of the server HTTP protocol, Status-code represents the response status code sent back by the server, and Reason-phrase represents a textual description of the status code.
The status code consists of three digits, the first number defines the category of the response, and there are five possible values:
1XX: Indication information – Indicates that the request has been received and continues processing
2XX: Success – Indicates that the request has been successfully received, understood, accepted
3XX: Redirect – A further step must be made to complete the request
4XX: Client Error – Request syntax error or request not implemented
5XX: Server-side Error – Server failed to implement legitimate request
Common status codes, status descriptions, descriptions:
$ OK//client request succeeded
Bad Request//client requests have syntax errors and cannot be understood by the server
401 Unauthorized//request unauthorized, this status code must be used with the Www-authenticate header field
403 Forbidden//server receives request, but refuses to provide service
404 Not Found//request resource not present, eg: Wrong URL entered
Internal Server error//server unexpected errors
503 Server Unavailable//server is currently unable to process client requests and may return to normal after some time
eg:http/1.1 OK (CRLF)
See blog http://blog.csdn.net/u010487568/article/details/17149589.
2. Response header
See blog: http://blog.csdn.net/u010487568/article/details/17394089.
Common response Headers
The Location:location response header field is used to redirect the recipient to a new location. Location response header fields are commonly used when changing domain names.
The Server:server response header field contains the software information that the server uses to process the request. Corresponds to the User-agent request header field. Below is
An example of the server Response header field:
server:apache-coyote/1.1
Www-authenticate
The www-authenticate response header domain must be included in the 401 (unauthorized) response message, the client receives a 401 response message, and when the authorization header domain is sent to the request server to validate it, the service-side response header contains the header domain.
Eg:www-authenticate:basic realm= "Basic Auth test!"//You can see that the server is using a Basic authentication mechanism for the requested resource.

4. HTTP normal header and solid head

The most important of the HTTP protocol is the request and response described earlier, and the corresponding request header and response header. In addition to this, the general header and entity headers are defined, as well as extension headers that support other recommendations and user customizations.

1. Normal Head

In the normal header, a small number of header fields are used for all request and response messages, but not for the transferred entity, only for the transmitted messages.

    • The Cache-control is used to specify the cache instruction, the cache instruction is unidirectional (the cache instruction appearing in the response may not appear in the request), and is independent (the cache instruction of one message does not affect the caching mechanism of another message processing), and HTTP1.0 uses a similar header domain of pragma. Cache directives on request include: No-cache (used to indicate that a request or response message cannot be cached), No-store, Max-age, Max-stale, Min-fresh, only-if-cached; Private, No-cache, No-store, No-transform, Must-revalidate, Proxy-revalidate, Max-age, S-maxage. Eg: in order to instruct IE browser (client) not to cache the page, the server-side JSP program can be written as follows: Response.sehheader ("Cache-control", "No-cache");//response.setheader (" Pragma "," no-cache "); function equivalent to the above code, usually both//share this code will set the normal header field in the Sent response message: Cache-control:no-cache
    • Date: The normal header field indicates the day and time the message was generated
    • Connection: The normal header field allows the option to send a specified connection. For example, specify that the connection is continuous "keep-alive", or specify the "close" option to notify the server to close the connection after the response is complete. Which keep-alive again HTTP1.1 support, but also the default way.
2. Solid Head

Both request and response messages can send an entity. An entity consists of an Entity header field and an entity body, but it does not mean that the entity header fields and entity bodies are sent together, and only the entity header fields can be sent. The entity header defines the meta-information about the entity body (eg: there is no entity body) and the resource identified by the request.
Common entity Headers

    • Content-encoding: A modifier used as a media type whose value indicates the encoding of additional content that has been applied to the entity body, so that the corresponding decoding mechanism must be used to obtain the media type referenced in the Content-type header domain. Content-encoding This method of compressing the document, Eg:content-encoding:gzip
    • Content-language: Describes the natural language used by the resource. The domain is not set and the entity content is considered to be available to all language readers. Eg:content-language:zh-cn
    • Content-length: Used to indicate the length of the entity body, expressed as a decimal number stored in bytes.
    • Content-type: Specifies the type of media that is sent to the recipient's entity body. eg:content-type:text/html;charset=iso-8859-1;content-type:text/html;charset=gb2312
    • Last-modified: Used to indicate the last modification date and time of the resource.
    • Expires: Gives the date and time when the response expires.
3. Expansion Head

Some of the recommended extensions in the HTTP protocol, defined in other RFC documents, such as cookies, Set-cookie, Referer, content-disposition, etc., are used to address a class of important issues, which are described separately using RFC documents. These headers are implemented in general modern browsers. In addition, the protocol also supports the user-defined extension header, the name and meaning of the header are set by the user according to the application scenario, which brings great flexibility to the application scope of the HTTP protocol.

5. HTTP Caching

The server can cache the user's response, and the browser caches the browser. These are based on the HTTP protocol cache negotiation policy.

    • Expires policy: Expires is a Web server response message header field that, in response to an HTTP request, tells the browser to cache data directly from the browser before the expiration time, without having to request it again. But expires is the HTTP 1.0 thing, pragma is also in HTTP1.0, now the default browser is used by default HTTP 1.1, so its role is basically ignored. One drawback of Expires is that the return expiration time is the server-side time, there is a problem, if the client's time and the server time difference is very large (such as clock out of sync, or cross-time zone), then the error is very large, so in HTTP version 1.1, Use cache-control:max-age= seconds instead.
    • Cache-control strategy (HTTP1.1): Cache-control is consistent with expires, indicating the validity of the current resource, controlling whether the browser caches data directly from the browser or re-sends the request to the server. But Cache-control more choice, more detailed settings, if set at the same time, its priority is higher than expires. Last-modified/if-modified-since These two responses, the request header mates Cache-control use, respectively obtains the last update time, thus calculates the distance current interval, and match the expiration time set in Cache-control to determine if the cache is out of date.
    • Etag/if-none-match:etag is a unique identifier on the server side of the server automatically generated or generated by the developer, allowing more accurate control of the cache. When Last-modified is used with the ETag, the server prioritizes the ETag. Etag/if-none-match should also be used in conjunction with Cache-control.

The parameters used in detail are as follows:

The

Cache-control value can be public, private, No-cache, No-store, No-transform, Must-revalidate, proxy-revalidate , Max-age. The instructions in each message have the following meanings:
Public indicates that the response can be cached by any buffer. The
private indicates that the entire or partial response message for a single user cannot be shared with the cache. This allows the server to simply describe a partial response message for the user, and this response message is not valid for another user's request.
No-cache indicates that a request or response message cannot be cached, which is not to say that you can set "do not cache" and that it is easy to words too literally ~
No-store to prevent important information from being inadvertently published. Sending in the request message will make the request and response messages not use the cache at all. The
Max-age indicates that the client can receive a response that is not longer than the specified time (in seconds). The
Min-fresh indicates that the client can receive a response that is less than the current time plus a specified time. The
Max-stale indicates that the client can receive a response message that exceeds the timeout period. If you specify a value for the Max-stale message, the client can receive a response message that exceeds the specified value for the timeout period.
Last-modified: Indicates the last modification time for this response resource. When the Web server responds to a request, it tells the browser the last modification time of the resource. When the
Etag:web server responds to a request, it tells the browser the unique identity of the current resource on the server (the build rule is determined by the server). In Apache, the value of the ETag, by default, is obtained by hashing the file's index section (INode), size, and last modified time (MTime).
If-none-match: When a resource expires (using Cache-control-identified max-age) and the discovery resource has a etage claim, it is requested again with the top If-none-match (the ETag value) to the Web server. When the Web server receives the request, it finds that the header if-none-match is compared to the corresponding check string for the requested resource and decides to return 200 or 304.

HTTP1.1 in the meantime proposed Last-modify and etag to cooperate with the Cache-control cache control reason is: The last modification of the last-modified callout can only be accurate to the second level, if some files within 1 seconds, is modified many times, it will not accurately mark the file modification time; Some files are generated periodically, and sometimes the content does not change, but last-modified changes, causing the file to not use the cache
There may be situations where the server is not getting the file modification time accurately or inconsistent with the proxy server time.

Summary of caching and user behavior:

User Actions Expires/cache-control Last-modified/etag
Address Bar Enter Effective Effective
Page link Jump Effective Effective
New open Window Effective Effective
Forward and backward Effective Effective
f5/Button Refresh Invalid (br reset max-age=0) Effective
Ctrl+f5 Refresh Invalid (reset Cc=no-cache) Invalid (Request header discards this option)
6, HTTP breakpoint continued to pass

The Get method of HTTP1.1 supports requesting a portion of a resource, responding to status code identified by 206 (partial), using Content-range to identify portions of the requested resource. Format:
Content-Range:306320-604047/606060
Indicates that the resource has a total of 606060 bytes, the current request is the No. 306320 to No. 604047 byte of content. The client can simultaneously request multiple parts of a large file through multiple threads to implement concurrent downloads of the file, such as FlashGet, Thunderbolt, etc., using this principle.

The HTTP protocol forms the most popular Web applications and multiple applications based on the HTTP protocol implementation, so there are many aspects to be concerned about.

Full HTTP protocol Overview

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.