Reprint Source: Talking about HTTP protocol
Introduction
There are a lot of good articles on the web that introduce HTTP, some of the details of HTTP are better, so this article does not delve into the details of HTTP, but from a high enough and a more structured perspective of the elements of the HTTP protocol to be classified.
definition and History of HTTP
In a network. There are three issues to transfer data:
1. How does the client know where to find the content?
2. When the client knows where to find the content, how to get what it asks for?
3. What is the form of the content that is being asked to be identified by the client?
For the web, there are three different techniques for answering the above three questions: Uniform Resource Locator (URIS), Hypertext Transfer Protocol (HTTP), and Hypertext Markup Language (HTML). URIs and HTML are very familiar to most web developers. The HTTP protocol, which is encapsulated too much in many web technologies, makes HTTP the least familiar.
HTTP as a transport protocol, as well as HTML as the evolution of the time, the current popular HTTP1.1 is the third version of the HTTP protocol.
HTTP 0.9
HTTP 0.9 as the first version of the HTTP protocol. is very weak. The request has only one line, such as:
From such a simple request body, there is no post method, no HTTP header can be seen, the HTTP client of that era can only receive one type: Plain text. And, if you don't get the information you're asking for, there are no 404 500 errors.
Although HTTP 0.9 looks so weak, it has been able to meet the needs of that era.
HTTP 1.0
With the need for Web applications after 1996, HTTP 0.9 has been unable to meet the requirements. The biggest change in HTTP1.0 is the introduction of the Post method, which makes it possible for clients to send data through an HTML table one-way server, which is also a basis for Web applications. Another big change is the introduction of HTTP headers, which allow HTTP to return not only the error code, but also the contents of the HTTP protocol that are not limited to plain text, but can be images, animations, and a series of formats.
In addition, the connection is allowed to remain connected, once a TCP connection can be communicated multiple times, although HTTP1.0 is switched off by default when the data is transmitted.
HTTP 1.1
In May 2000, HTTP1.1 was established. HTTP1.1 is not as revolutionary as HTTP1.0 for HTTP0.9. But there are also many enhancements.
First, add the host header, such as Visit my blog:
12 |
GET /Careyson HTTP/ 1.1 Host: www.cnblogs.com |
Get back only requires a relative path. This seems to be just like the sense of syntactic sugar, but in fact, this promotion makes it possible for a host on the web to exist in multiple domains. Otherwise, multiple domain names that point to the same IP can become confusing.
In addition, the range header is introduced to allow the client to download only part of the content through HTTP, which makes multi-threaded downloads possible.
It is also worth mentioning that the HTTP1.1 default connection is always maintained, this concept I will be described in detail below.
Network hierarchy of HTTP
All transmissions in the Internet are made through TCP/IP. The HTTP protocol is no exception to the protocol used as the application layer in the TCP/IP model. HTTP is shown in Level 1 in the network.
Figure 1. Levels of HTTP in TCP/IP
As can be seen, HTTP is based on the transport layer of the TCP protocol, and TCP is an end-to-end connection-oriented protocol. The so-called end-to-end can be understood as process-to-process communication. So HTTP begins with a TCP connection before starting the transfer, and the TCP connection process requires a so-called "three handshake". As shown in concept 2.
Figure 2. Three-time handshake for TCP connections
After the TCP three handshake, a TCP connection is established, at which point the HTTP can be transmitted. An important concept is connection-oriented, where HTTP is not disconnected from the TCP connection between completion of the transfer. In HTTP1.1 (set by connection header) This is the default behavior. The so-called HTTP transmission is done through a concrete example of what we see.
For example, visit my blog and use fiddler to intercept corresponding requests and responses. As shown in 3.
Figure 3: Fetching requests with fiddler and corresponding
As can be seen, although the only access to my blog, but the lock is not only a HTML, but the browser to the HTML parsing process, if the discovery needs to obtain the content, will again initiate the HTTP request to the server to obtain, than the 2 in the Common2.css. This above 19 HTTP requests, relying on only one TCP connection is enough, this is called persistent connection. It is also called an HTTP request complete.
HTTP requests (HTTP request)
The so-called HTTP request, which is the Web client sending information to the Web server, consists of three parts:
1. Request Line
2.HTTP Head
3. Content
A typical request line, such as:
1 |
GET www.cnblogs.com HTTP/ 1.1 |
The request line writing is fixed, consists of three parts, the first part is the request method, the second part is the request URL, the third part is the HTTP version.
The second part of the HTTP header in the HTTP request can be 3 HTTP headers: 1. Request Header 2. General Header 3. Entity header
Generally, because get requests often do not contain content entities, there is no entity header.
The third part exists only in the POST request because the GET request does not contain any entities.
We take a specific post request to look at these three parts, I put a button on a normal ASPX page, and when committed, a POST request is generated, as shown in 4.
Figure 4. HTTP requests are made up of three parts
HTTP request method
While there are only get and post methods that are common to us, there are actually many HTTP request methods, such as: Put, delete, head, connect, and Trace methods. I'm not going to go into the detail here, Bing on my own.
Here's a look at the get and post methods, the difference between get and post online is flying. But a lot of them don't talk about the point. The biggest difference between get and post is that post has the third part: content. And get does not exist for this content. So, as the get and post names show, get is used to fetch content from the server, although it is possible to send information to the server through QueryString, but this violates the original intent of get, and the information in QueryString appears to be just a parameter to get the content that is obtained in HTTP. Post is the way the client sends content to the server side. So there is the third part of the request: content.
HTTP response (HTTP Response)
When the Web server receives an HTTP request, it does some processing based on the requested information (the processing may be just a static return page, or it will return with a language such as asp.net,php,jsp), returning an HTTP response accordingly. HTTP responses are structurally similar to HTTP requests and are made up of three parts, namely:
1. Status line
2.HTTP Head
3. Return content
First look at the status line, a typical HTTP status is as follows:
The first part is the HTTP version, the second part is the response status code, the third part is the description of the status code, so you can also consider the second and third parts as a part.
For the HTTP version there is nothing to say, and the status code is worth mentioning, the Internet for each specific HTTP status code represents the meaning of the explanation, here I say the classification.
- Information Classes (100-199)
- Response Successful (200-299)
- Redirect Class (300-399)
- Client Error Class (400-499)
- Service-Side Error Class (500-599)
Headers included in the HTTP response include 1. Response header (response header) 2. General Header 3. Entity header.
The third part of the HTTP response content is the information requested by the HTTP request. This information can be either an HTML or a picture. For example, I visit Baidu, HTTP Response5 is shown.
Figure 5. A typical HTTP response
The response in Figure 5 is an HTML, which can of course be other types, as shown in slices, 6.
Figure 6. HTTP response content is a picture
Here is a question, since the content of the HTTP response is not only HTML, but also other types, then how the browser correctly docking the received information to process?
This is determined by the media type, specifically, the HTTP header corresponds to Content-type, which is text/html than 5, and Figure 6 is image/jpeg.
The format of the media type is: Large class/Small analogy 5 HTML is a small class, and text is a large class.
The IANA (the Internet Assigned Numbers Authority, the Internet Digital Distribution Agency) defines 8 large-class media types, namely:
- application-(for example: Application/vnd.ms-excel.)
- Audio (for example: Audio/mpeg.)
- Image (for example: Image/png.)
- Message (for example,: Message/http.)
- Model (for example: MODEL/VRML.)
- Multipart (for example: Multipart/form-data.)
- Text (for example: text/html.)
- Video (for example: Video/quicktime.)
HTTP Header
The HTTP header is just a label, such as I add code to ASPX:
Response.AddHeader ("Test Head", "Test value");
Corresponding to the information we can catch in the Fiddler 7 shown.
Figure 7. HTTP Header
It is not difficult to see that the HTTP header is not strict, just a label, if the browser can be parsed according to certain criteria (such as the browser's own standards, the standard of the web) to explain the head, otherwise the unrecognized head will be ignored by the browser. The same is true for servers. If you write a browser, you can interpret the above header as any effect you want to smile
Below we say the HTTP header is the standard of the head, I will not be a detailed description of the role of each head, about the HTTP header effect of the article on the internet has been a lot, please do it yourself bing. HTTP headers can be divided into four categories according to their different functions.
Universal header (General header)
A generic header can be included in an HTTP request, or it can be included in an HTTP response. The function of the general header is to describe the HTTP protocol itself. For example, the connection header that describes whether HTTP is persistent, the date header of the HTTP send day, the keep-alive header that describes the TCP connection time for HTTP, Cache-control first class for cache control.
Solid Head (Entity header)
The entity header is the header that describes the HTTP information. It can appear in the request of the HTTP POST method, or it can appear in the HTTP response. Both Content-type and Content-length in the 5 and Figure 6 are the entity headers that describe the type and size of the entity. There are other content-language,content-md5,content-encoding that describe entities and expires and last-modifies that control the entity cache.
Request Header (HTTP request header)
A request header is a header that is sent by the client to the server to help the service side better satisfy the client request. The request header can only appear in the HTTP request. For example, to tell the server to receive only a certain response content of the Accept header, the cookie header to send cookies, display the host domain of the request master, for the cache of the If-match,if-match-since,if-none-match header, A range header that is used to fetch only part of the information in the HTTP response, and is used for the Referer first-class reference to the attached HTML-related request.
Response Header (HTTP Response header)
The HTTP response header is the header that describes the HTTP response itself, which does not contain a header that describes the third part of the HTTP response, which is the HTTP message (this is partly the responsibility of the entity header). For example, the refresh header is refreshed periodically, when encountering a 503 error, the Retry-after header is automatically retried, the server header is displayed, and the Set-cookie header of the cookie is set, which tells the client that Accept-ranges first class can be partially requested.
Status Hold
It is also worth noting that the HTTP protocol is stateless, which means that for servers that receive HTTP requests, it is not known whether each request is from the same client or a different client, and each request is the same for the server. There is therefore a need for additional means to enable the server to know that the request is coming from a client when it receives a request. As shown in 8.
Figure 8. The server does not know that request 1 and request 2 are from the same client
Keep Status with cookies
In order to solve this problem, the HTTP protocol remains in the state via cookies, and for the request in Figure 8, if the cookie is used for state control, it becomes 9.
Figure 9. With cookies, the server can clearly know that request 2 and request 1 are from the same client
Holding state through form variables
In addition to cookies, you can use form variables to maintain state, such as ASP. NET to maintain state through a box called ViewState input= "hidden", for example:
1 |
< input type = "hidden" name = "__VIEWSTATE" id = "__VIEWSTATE" value = "/wEPDwUKMjA0OTM4MTAwNGRkXUfhlDv1Cs7/qhBlyZROCzlvf5U=" /> |
This principle is similar to cookies, except that the information that accompanies each request and response becomes a form variable.
Hold status through QueryString
This principle is the same as the principle of the two state-preserving methods, querystring to the server by storing information at the end of the requested address to send information, usually in conjunction with the form, a typical querystring such as:
1 |
www.xxx.com/xxx.aspx? var 1=value& var 2=value2 |
Summary
This article from a relatively high point of view of the HTTP protocol, the details of the HTTP protocol is not deep digging, but for the HTTP large framework has a comparison system, more information about HTTP details, please go to Bing or refer to the relevant books:-)
Talking about HTTP protocol