Source: http://www.cnblogs.com/cuihongyu3503319/archive/2008/07/24/1250248.html
Anyone who is engaged in Web development cannot do without HTTP (Hypertext Transfer Protocol). To understand HTTP, apart from HTML, there is also a part of the HTTP message header that cannot be ignored.
Anyone who has done socket programming knows that when we design a communication protocol, the "Message Header/Message Body" split method is very common. The message header tells the other party what the message is, the message body tells the recipient how to do it. The same is true for messages transmitted over HTTP. Each HTTP packet is divided into an HTTP header and an HTTP body. The latter is optional, and the former is required. Each time we open a webpage, right-click on it and select "View Source File ". Code That is, the HTTP message body. Where is the message header? IE browser does not let us see this part, but we can see it by intercepting data packets and other methods.
Here is a simple example:
First, create a very simple web page with only one line of content:
<HTML> <body> Hello World </body> Put it on a Web server, such as IIS, and then request this page (http: // localhost: 8080/simple.htm) in the IE browser. When we request this page, the browser has actually done the following four tasks:
1. parse the entered address and break down the protocol name, host name, port, and Object Path. For this address, the resolution result is as follows:
Protocol name: HTTP
Host Name: localhost
Port 8080
Object Path:/simple.htm
2. Combine the above part with the local information and encapsulate it into an HTTP request packet
3. Use the TCP protocol to connect to the specified port (localhost, 8080) of the host and send encapsulated packets.
4. Wait for the server to return data, parse the returned data, and display it.
From the intercepted data packets, it is not difficult to find that the content of the HTTP data packet generated by the browser is as follows:
GET/simple.htm HTTP/1.1 <CR>
Accept: image/GIF, image/X-xbitmap, image/JPEG, image/pjpeg, application/X-Shockwave-flash, application/vnd. MS-Excel, application/vnd. MS-PowerPoint, application/MSWord, */* <CR>
Accept-language: ZH-CN <CR>
Accept-encoding: gzip, deflate <CR>
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; sv1;. Net CLR 1.1.4322;. Net CLR 2.0.50727) <CR>
HOST: localhost: 8080 <CR>
Connection: keep-alive <CR>
<CR>
To make it clear, I added "<CR>" to all the carriage returns. Note that there is an empty line and a carriage return, this blank line is the line between the message header and the message body specified by HTTP. The content below the first blank line is the message body, and the request packet has no message body.
The first line of a message, "get", indicates the HTTP action we use. Other possible actions include "Post". The get message has no message body, the post message has a message body, and the content of the message body is the data to be post. Later,/simple.htm is the object we want to request. Then, http1.1 indicates that http1.1 is used.
The second line indicates the Content-Type acceptable to the browser we use, and the third and fourth lines indicate the language and encoding information. The fifth line shows the link information of the local machine, including the browser type and operating system information. Many websites can display the browser and operating system version you are using because you can obtain the information from this page.
The sixth line indicates the host and port we requested, and the seventh line indicates that the keep-alive method is used, that is, the connection is not closed immediately after data transmission.
After receiving such a data packet, the server will perform corresponding processing based on its content, such as checking whether there is a "/simple.htm" object. If yes, it will decide how to handle it based on the server settings, if it is htm, you can directly return its content without any complicated processing. However, before direct return, an HTTP message header must be added.
The complete HTTP message sent from the server is as follows:
HTTP/1.1 200 OK <CR>
Server: Microsoft-IIS/5.1 <CR>
X-powered-by: ASP. NET <CR>
Date: Fri, 03 Mar 2006 06:34:03 GMT <CR>
Content-Type: text/html <CR>
Accept-ranges: bytes <CR>
Last-modified: Fri, 03 Mar 2006 06:33:18 GMT <CR>
Etag: "5ca4f75b8c3ec61: 9ee" <CR>
Content-Length: 37 <CR>
<CR>
<HTML> <body> Hello World </body> Similarly, I use "<CR>" to indicate carriage return. As you can see, this message is also divided into two parts: the message header and the message body. The part of the message body is the HTML code we wrote earlier.
The first line of the Message Header "HTTP/1.1" also indicates the protocol used. The "200 OK" is the HTTP return code, and "200" indicates that the operation is successful, other common examples are as follows: 404 indicates that the object is not found, 500 indicates a server error, and 403 indicates that the directory cannot be viewed.
The second line indicates the web server software used by the server, which is IIS 5.1. The third line is an additional prompt of ASP. NET, which is of no practical use. The fourth line is the time when the request is processed. The fifth line is the Content-Type of the returned message. The browser determines how to process the content in the message body based on it. For example, text/html is used here, the browser will enable the HTML Parser to process it. For image/JPEG, the JPEG decoder will be used for processing.
The last line "Content-Length" in the message header indicates the length of the message body, measured in bytes from the content after the empty line, after receiving the specified number of bytes, the browser determines that the message has been fully received.