This article is reproduced to http://blog.csdn.net/kfanning/article/details/6062118/
HTTP consists of two parts: request and response. When you enter a URL in a Web browser, the browser creates and sends a request based on your request, which contains the URL you entered and some information about the browser itself. When the server receives this request, it returns a response that includes information related to the request and data at the specified URL, if any. Until the browser resolves the response and displays the Web page (or other resources).
HTTP request
The format of the HTTP request is as follows:
<request-line>
<blank line>
[<request-body>]
In an HTTP request, the first line must be a request line, which describes the type of request, the resources to be accessed, and the HTTP version used. This is followed by a header section that describes the additional information that the server will use. After the header is a blank line, then you can add any additional data [called the body].
In HTTP, multiple request types are defined, and usually we are concerned only with get requests and post requests. Whenever you enter a URL on a Web browser, the browser sends a GET request to the server based on that URL to tell the server what resources to get and return. The GET request for www.baidu.com is as follows:
get/http/1.1
Host:www.baidu.com
user-agent:mozilla/5.0 (Windows; U Windows NT 5.1; En-us; rv:1.7.6)
gecko/20050225 firefox/1.0.1
Connection:keep-alive
The first part of the request line illustrates that the request is a GET request. The second part of the line is a slash (/) to indicate that the root of the domain is requested. The last part of the line indicates that you are using the HTTP 1.1 version (another option is 1.0). So where is the request sent? This is the second line of content.
Line 2nd is the first header of the request, HOST. The header host will indicate the destination of the request. Combining the slash (/) in the host and the previous row, you can tell the server to request www.baidu.com/(HTTP 1.1 requires the header host and the original version 1.0 does not need to be used). The third line contains the header user-agent, which both the server side and client script can access, which is an important basis for browser type detection logic. This information is defined by the browser you are using (in this case, Firefox 1.0.1) and is sent automatically in each request. The last line is the first connection, and the browser action is usually set to keep-alive (and, of course, to other values). Notice that there is a blank line after the last header. This empty line is required even if the request body does not exist.
To send parameters for a GET request, you must append the additional information to the URL itself. The format is similar to the following:
Url? Name1=value1&name2=value2&, .... &namen=valuen
This information is called the query string, and it is copied in the request line of the HTTP request as follows:
Get/books/?name=professional%20ajax http/1.1
Host:www.baidu.com
user-agent:mozilla/5.0 (Windows; U Windows NT 5.1; En-us; rv:1.7.6)
gecko/20050225 firefox/1.0.1
Connection:keep-alive
Note that in order to use the text "Professional Ajax" as a parameter to the URL, it needs to encode its contents and replace the space with%20, which is called URL encoding (URL encoding). Many places that are commonly used in HTTP (JavaScript provides built-in functions to handle URL encoding and decoding). The "name-value" (Name-value) pair is separated by &. Most of the server-side technologies can automatically decode the request body and provide some logical ways to access the values. Of course, how this data is used is determined by the server.
On the other hand, the POST request provides some additional information to the server in the request body. Typically, when an online form is filled in and submitted, the data that is filled in will be sent to the server in the form of a POST request.
The following is a typical POST request:
post/http/1.1
Host:www.baidu.com
user-agent:mozilla/5.0 (Windows; U Windows NT 5.1; En-us; rv:1.7.6)
gecko/20050225 firefox/1.0.1
content-type:application/x-www-form-urlencoded
Content-length:40
Connection:keep-alive
Name=professional%20ajax&publisher=wiley
As you can see from above, there are some differences between a POST request and a GET request. First, the get at the beginning of the request line is changed to post to indicate a different request type. You will find that the first host and user-agent still exist, with two new rows in the back. The first Content-type explains how the content of the request body is encoded. The browser always transmits data in application/x-www-form-urlencoded format encoding, which is a MIME type for simple URL encoding. The first content-length describes the number of bytes in the request body. After the first connection is a blank line, followed by the request body. As with most browsers ' post requests, this is given as a simple "name-value" pair, where name is professional Ajax,publisher is Wiley. You can organize query string parameters for URLs in the same format.
Here are some of the most common request headers:
Accept: The MIME type acceptable to the browser.
Accept-charset: The acceptable character set of the browser.
Accept-encoding: The way the browser can decode data encoding, such as gzip. The servlet can return a GZIP-encoded HTML page to a browser that supports gzip. In many cases this can reduce download time by 5 to 10 times times.
Accept-language: The type of language the browser wishes to use when the server is able to provide more than one language version.
Authorization: Authorization information, which typically occurs in an answer to the Www-authenticate header sent to the server.
Connection: Indicates whether a persistent connection is required. If the servlet sees the value here as "keep-alive", or sees the request using an HTTP 1.1 (HTTP 1.1 is persistent by default), it can take advantage of the persistent connection, when the page contains multiple elements (such as applets, pictures), Significantly reduce the time it takes to download. To do this, the servlet needs to send a content-length header in the answer, and the simplest implementation is to write the content to Bytearrayoutputstream first and then calculate its size before formally writing the content.
Content-length: Represents the length of the request message body.
Cookies: This is one of the most important request header information, as discussed in the following chapter, "Cookie Processing".
From: The email address of the requesting sender, used by some special Web client, is not used by the browser.
Host: The hosts and ports in the initial URL.
If-modified-since: Returns a 304 "not Modified" answer only if the requested content has been modified after the specified date.
Pragma: Specifying a value of "no-cache" means that the server must return a refreshed document, even if it is a proxy server and has a local copy of the page.
Referer: Contains a URL from which the user accesses the currently requested page from the page represented by the URL.
User-agent: Browser type, this value is useful if the content returned by the servlet is related to the browser type.
UA-PIXELS,UA-COLOR,UA-OS,UA-CPU: A nonstandard request header sent by some versions of Internet Explorer to indicate screen size, color depth, operating system, and CPU type.
HTTP response
The format of the HTTP response is similar to the format of the request, as shown below:
<status-line>
<blank line>
[<response-body>]
As you can see, the only real difference in response is that the first line uses state information instead of the request information. Status line describes the requested resource situation by providing a status code. Here is an example of an HTTP response:
http/1.1 OK
Date:sat, Dec 2005 23:59:59 GMT
Content-type:text/html;charset=iso-8859-1
content-length:122
<title>wrox homepage</title>
<body>
<!--body goes here--
</body>
In this example, the status line gives the HTTP status code of 200, and the message OK. The status line always contains a status code and a corresponding short message to avoid confusion. The most commonly used status codes are:
(OK): The resource was found and everything is OK.
304 (not MODIFIED): The resource has not been modified since the last request. This is commonly used for browser caching mechanisms.
401 (Unauthorized): The client does not have permission to access the resource. This usually causes the browser to require the user to enter a user name and password to log on to the server.
403 (FORBIDDEN): Client failed to get authorization. This is usually followed by an incorrect user name or password that was entered after 401.
404 (Not FOUND): The requested resource does not exist at the specified location.
After the status line is some header. Typically, the server returns a header named data that describes the date and time the response was generated (the server usually returns some information about itself, although it is not required). The next two first people should be familiar with the same content-type and Content-length as in the POST request. In this example, the first content-type specifies the MIME-type HTML (text/html) whose encoding type is iso-8859-1 (which is the encoding standard for U.S. English resources). The response body contains the HTML source file for the requested resource (although it may also contain binary data for plain text or other resource types). The browser will display this data to the user.
Note that this does not indicate the type of request for the response, but this is not important for the server. The client knows what type of data each type of request will return and decides how to use that data.
HTTP request Header