Http://www.360doc.cn/article/3554006_144394033.html
The work of WWW is based on the client/server computing model, which consists of a Web browser (client) and a Web server (server) that communicate using Hypertext Transfer Protocol (HTTP). HTTP protocol is a protocol based on TCP/IP protocol, which is the application layer protocol between Web browser and Web server, and is a general, stateless, object-oriented protocol.
The principle of the HTTP protocol consists of four steps:
(1) Connection: the Web browser establishes a connection to the Web server and opens a virtual file called a socket (socket), which establishes a successful connection establishment.
(2) Request: The Web browser submits a request to the Web server through a socket. HTTP requests are typically GET or post commands (the post is used for the pass of the form parameter). The Get command is in the form of a Get path/file name http/1.0 The file name that is accessed, and http/1.0 indicates the HTTP version used by the Web browser.
(3) Answer: After the Web browser submits the request, it is routed to the Web server via the HTTP protocol. After the Web server is received, the transaction is processed, and the results are passed back to the Web browser via HTTP to display the requested page on the Web browser.
(4) Close connection: When the answer is finished, the Web browser and Web server must be disconnected to ensure that other Web browsers can connect to the Web server.
The commonly used HTTP header information is:
①http 1.0 OK this is the first line of the Web server answer that lists the HTTP version number and the answer code that the server is running. The code "OK" indicates that the request is complete.
②mime_version:1.0 it indicates the version of the MIME type.
③content_type: Type This header information is very important, it indicates the MIME type of the HTTP body information. Such as: Content_type:text/html indicates that the transmitted data is an HTML document.
④content_length: Length value It indicates the length of the HTTP body information (in bytes).
Second, the HTTP protocol communication process
When we enter "www.baidu.com" in the address bar of the browser and press ENTER, what happens after that, what we see directly is that the corresponding Web page is opened, so how does the internal client and server communicate?
1. Automatic URL parsing
The HTTP URL contains enough information to find a resource in the following basic format: http://host[":" Port][abs_path], where HTTP represents the bucket Lid HTTP protocol to locate network resources; Host represents a legitimate hostname or IP address. PORT specifies a port number, the default 80;abs_path specifies the URI of the requested resource, and if Abs_path is not given in the URL, it must be given as a "/" when it is the request URI, which is usually done automatically by the working browser.
For example: Enter www.163.com; The browser will automatically convert to: HTTP://www.163.com/
2. Acquiring IP, establishing a TCP connection
After you enter "HTTP://www.xxx.com/" in the browser's address bar and submit it, first it looks in the DNS local cache table and, if there is one, tells the IP address directly. If not, the gateway DNS is required to look up, so after the corresponding IP is found, it will be returned to the browser.
When the IP is acquired, a three-time handshake Connection is initiated with the requested TCP, and an HTTP request is made to the server after the connection is established.
3. The client browser makes an HTTP request to the server
Once a TCP connection is established, the Web browser sends a request command to the Web server, and then sends some other information to the Web server in the form of header information, after which the browser sends a blank line to notify the server that it has ended sending the header message.
4. The Web server answers and sends the data to the browser
After the client makes a request to the server, the server sends a reply back to the client,
http/1.1 OK
The first part of the answer is the version number of the Protocol and the response status code , just as the client sends information about itself along with the request, and the server sends the user with the answer about its own data and the requested document.
After the Web server sends a header message to the browser, it sends a blank line to indicate that the header information is sent to the end, and then it sends the actual data requested by the user in the format described by the content-type reply header information
5. The Web server shuts down the TCP connection
In general, once the Web server sends the request data to the browser, it closes the TCP connection and then if the browser or server joins this line of code in its header information
connection:keep-alive
The TCP connection remains open after it is sent, so the browser can continue to send requests through the same connection.
Third, example analysis of HTTP communication
Introduce a tool, HTTP Analyzer, as a tool for real-time analysis of HTTP/HTTPS data streams . It captures HTTP/HTTPS protocol data in real time and can display a lot of information (including: header, content, Cookie, query string, submitted data, redirected URL address), provides buffer information, cleans up conversation content, HTTP status information, and other filtering options. It is also a very useful development tool for analysis, debugging, and diagnostics.
Below we visit http://www.google.cn/, Httpanalyzer will grab packets to analyze the process of accessing browser and server communication.
1, run HTTP Analyzer, select the menu action-start start to grab the packet;
2, the browser input http://www.google.cn/, the Web page opens, in the HTTP Analyzer select action-stop stop grasping the packet, the tool has detailed the access to the packet information. See the catch package information by screenshot
L Capture the results of the package and file header information
L HTML body content of one request
• Is there any cookie information for this request?
L The entire packet information for a single request, including header information and body text.
You will find that only one hyperlink is clicked in the browser, but multiple packets are sent. That is because, we request the Web file has a lot of pictures, music, movies and other information, the server returned by the information does not directly include the picture data, but only to save the image of the link, when the browser to explain, when the URL encountered the image, the server issued a request for pictures.
Let's analyze the HTTP request and response information in detail:
1) HTTP request message, when the client and the server to establish a TCP connection, the client will send a request for information, such as:
[1] get/http/1.1
[2] accept:image/gif, Image/x-xbitmap,image/jpeg, Image/pjpeg, application/x-shockwave-flash,application/ X-silverlight, Application/vnd.ms-excel,application/vnd.ms-powerpoint, Application/msword, */* the list of content types that the client can identify.
[3] ACCEPT-LANGUAGE:ZH-CN client can interpret the language: Simplified Chinese
[4] Ua-cpu:x86
[5] accept-encoding:gzip, deflate client can interpret the type
[6] user-agent:mozilla/4.0 (compatible; MSIE7.0; Windows NT 5.1; Maxthon; EMBEDDEDWB 14.52 FROM:HTTP://WWW.BSALSA.COM/EMBEDDEDWB 14.52;. NET CLR 2.0.50727; Infopath.1; CIBA) client Browser model
[7] host:http://www.google.cn/Submission Request page
[8] connection:keep-alive TCP connection remains open
[9] The request information is mainly composed of 4 parts:
L Request Method URI Protocol/version: the above code [1] Line "Get" represents the request method, "http/1.1 represents the version of the Protocol and Protocol, HTTP requests can use a variety of request methods, most commonly used for the GET and post methods
L Request Header: [2]-[8] line, which contains many useful information about the client environment and the request body.
L Blank line: [9] between the request header and the request body is a blank line, this line is very important, indicating that the request header has ended, followed by the body, this line is very important, it means that the request header has ended, followed by the request body.
L Request body. The request body can contain query string information submitted by the customer, such as a user name and password. Not here.
Here's one thing to note: The get and post methods in the request method;
The Get method is the default HTTP request method, and we routinely use the Get method to submit form data, but the form data submitted with the Get method is simply encoded, and it is sent to the Web server as part of the URL, so If you use the Get method to submit form data there is a security risk, and there is a limit to the length of the URL, not to allow more than 1k.
The Post method is an alternative to the Get method, which is primarily to submit form data to the Web server, especially large batches of data. The Post method overcomes some of the drawbacks of the Get method. When submitting form data through the Post method, the data is not sent as part of the URL request but as standard data to the Web server, which overcomes the drawback that the information in the Get method is not confidential and the amount of data is too small. Therefore, for security reasons and respect for user privacy, the Post method is usually used for form submission.
2) HTTP response message, similar to the request, such as:
[1] http/1.1 OK
[2] Cache-control:private, max-age=0
[3] Date:fri, 07:53:36 GMT
[4] Expires:-1
[5] content-type:text/html; Charset=utf-8
[6] Set-cookie:pref=id=cc4a31ab6792ef2c:nw=1:tm=1235721216:lm=1235721216:s=q1hqbu-1kdamawk-;expires=sun, 27- Feb-2011 07:53:36 GMT; path=/; domain=.google.cn
[7] Content-encoding:gzip
[8] Server:gws
[9] Transfer-encoding:chunked
[10]
[11]DDC
The response information is also made up of the corresponding 4 parts:
L Protocol Status Description, http/1.1 represents the protocol version, and a. OK indicates that the server has successfully processed the request made by the client. 200 indicates that the HTTP response code was successful. The HTTP answer code consists of 3 digits, with the first number defining the type of the answer code:
1xx-Information Class (information), which indicates receipt of a Web browser request, is being further processed
2xx-Success Class (successful), which indicates that user requests are received correctly, understood and processed for example: OK
The 3xx-redirect Class (redirection) indicates that the request was unsuccessful and the customer must take further action.
4xx-Client error, which indicates that the client submitted a request with an error such as: 404 Not Found, means that the document referenced in the request does not exist.
5xx-Server error indicates that the server was unable to complete the processing of the request: 500
L Response Header: As with the request header, it indicates the function of the server and identifies the details of the response data.
L Blank line: is also a blank line that must exist between the response header and the response body, indicating that the response header ends, followed by the response body
L