HTTP message format
HTTP specifications 1.0 [rpcl945] and 1.1 [RFC 2616] define the HTTP message format. HTTP messages are classified into request messages and response messages. Next we will introduce them separately.
HTTP Request Message
The following is a typical HTTP Request Message:
GET/somedir/page.html parse TP/1.1
HOST: www.yesky.com
Connection: Close
User-Agent: Mozilla/4.0
Accept-language: ZH-CN
(Additional carriage returns and line breaks)
We can learn a lot from this simple request message carefully. First, the message is written in plain ASCII text. Secondly, there are five lines in the message (each line ends with a carriage return and a line feed), and an additional carriage return and line break are next to the last line. Of course, a single request message can contain more than one row or only one row. The first line of the request message is called the request line, and the subsequent lines are called the header line ). The request line has three Ning segments: Method Field, URL field, and HTTP Version segment. The method field has several values to choose from, including get, post, and head. The vast majority of HTTP request messages use the get method, which is used by the browser to request objects. The requested objects are identified in the URL field. This example shows that the browser is in the request object/somedir/page.html. The version field is self-explanatory. In this example, the browser implements HTTP/1.1.
Now let's take a look at each header row in this example. HOST: www.yesky.com is the host that stores the requested object. The request message contains the header connection: Close is to inform the server that the browser does not want to use a persistent connection; the server should close the connection after sending the requested object. Although the browser that generates this request message implements HTTP/1.1, it still does not want to use persistent connections. The User-Agent header specifies the user proxy, that is, the type of the browser that generates the current request. In this example, the user agent is Mozilla/4.0, which is a version of the nelscape browser. This header line is useful because the server can actually send different versions of the same object to different types of user proxies (these versions are addressed using the same URL ). Finally, accept-signature AG: the header line indicates that if the requested object has a Simplified Chinese version, the user would rather receive the version. If the language version is not available, the server should send its default version. Accept-signature AG: it is only one of the header of HTTP content negotiation.
Next, let's take a look at the general format of the request message.
Figure 2: http request format
The above request message example conforms to this format, but the general format also contains a "affiliated body" (adjacent body) located after each header (and additional carriage returns and line breaks ). The attachment is not used in the get method, but in the POST method. The post method is applicable to scenarios where users need to fill in forms, such as entering words to be searched in the Google search engine. After a user submits a form, the browser still requests a web page from the server as the user clicks the hyperlink. However, the specific content of the page depends on the values of the fields filled in by the user. If the browser uses the POST method to make this request, the request message body contains the values filled in by the user in each field of the form. The head method is similar to the get method. The difference between the two is that the server removes the requested object from the Response Message of the head method, and the other content is the same as the Response Message of the get method. The head method is usually used by HTTP server software developers for debugging.
HTTP Response Message
The following is a typical HTTP Response Message:
HTTP/1.1 200 0 K
Connectlon: Close
Date: Thu, 13 Oct 2005 03:17:33 GMT
Server: Apache/2.0.54 (UNIX)
Last-nodified: Mon, 22 jun 1998 09; 23; 24 GMT
Content-Length: 682l
Content-Type: text/html
(Data data ............)
This response message is divided into three parts: 1 Starting Status line, 6 header lines, and 1 affiliated body containing the requested object. The status line has three fields: Protocol version field, status code segment, and reason phrase field. The status line in this example indicates that the server uses HTTP/1.1 and the response process is completely normal (that is, the server finds the requested object and is sending it ).
Now let's take a look at each header row in this example. The server uses connectlon: Close the header line to notify the customer that the TCP connection will be closed after the message is sent. Date: Specifies the date and time when the server created and sent the Response Message. Note that this is not the creation time or last modification time of the object, but the time when the server extracts the object from its file system and inserts it into the response message to send it out. Server: the header line indicates that the message is generated by the Apache server. It is similar to the User-Agent: header line in the HTTP request message. Last-nodified: Specifies the date or time when the object was created or last modified in the header line. Last-nodified: the header is critical to the object's high-speed cache, regardless of whether the cache occurs on the local client host or on the network high-speed cache server host (that is, the proxy server host). Content-Length: specifies the number of bytes of the sent object in the header line. Content-Type: the header line indicates that the object contained in the object is HTML text. The object type is officially indicated by Content-Type: Header rather than by the file extension.
Note: If the server receives an HTTP/1.0 request, even an HTTP/1.1 server will not use persistent connections. On the contrary, such an HTTP/1.1 server will close the TCP connection after the requested object is sent. This is necessary because the HTTP/1.0 client expects the server to immediately close the connection.
Next, let's take a look at the general format of the Response Message as shown in. The preceding Response Message example fully complies with this format. The status code and cause phrase in the Response Message indicate the processing result of the corresponding request. The following lists some common status codes and related cause phrases:
Figure 3: General Response Message format
● 200 K; the request is successful, and the requested information is returned in the Response Message.
● 301 moved permanently: the requested object has been permanently migrated. The new URL is indicated in the location: header of the Response Message. The customer software automatically requests this new URL.
● 400 bad request; indicates that the server cannot understand the normal error status code of the request.
● 404 Not found: the requested document does not exist on the server.
● HTTP Version Not support: the server does not support the requested HTTP Protocol version.
How do you see a real h1tp response message? This is very simple. You can use the NC Tool to connect to your favorite server (NC/Netcat is a tool that hackers like to use to establish TCP connections between hosts), and then enter a line of request message, used to request an object on the server. For example, if you can enter the following command:
NC www.yesky.com 80
GET/index.shtml HTTP/1.0
(After entering the second line, press ENTER twice). This opens a TCP connection to port 80 of the host www.yesky.com, and then sends an http get command. You should be able to see the moss message containing the basic HTML file on the yesky homepage. If you want to see only the HTTP message line without receiving the object itself, replace the above get with head. Finally, let's take a look at the response message.
Here we discuss a large number of header rows that can be used in HTTP requests and response messages. HTTP specifications (especially HTTP/1.1) define more header rows that can be inserted by browsers, Web servers, and network buffer servers.
We can use the NC Tool to completely control the headers contained in the request message. How does the browser decide which headers should be included in the request message? How does the Web server decide which headers should be included in the response message? The browser is based on its own user proxy type and supported HTTP Version (HTTP/1.0 browsers naturally do not generate HTTP/1.1 headers), the user's browser configuration (such as the preferred language) and other factors to generate each header in the request message. Web servers have similar situations: they have different products, versions, and configurations, all of which affect the headers contained in the response message.
The header used in HTTP request messages and response messages discussed in this article is only a small part. More available headers are defined in the HTTP specification, for more information, see the RFC documentation.