WWW is an application system that uses the Internet as the transmission media. The most basic transmission unit on WWW is web pages. WWW is based on the client/server computing model. It consists of a Web browser (client) and a web server (server). The two use Hypertext Transfer Protocol (HTTP) for communication. HTTP is a protocol based on TCP/IP. It is an application layer protocol between a web browser and a Web server. It is a common, stateless, and object-oriented protocol.
The principle of HTTP protocol includes four steps:
(1) connection: the Web browser establishes a connection with the Web server and opens a virtual file called socket. The establishment of this file indicates that the connection is established successfully.
(2) request: the Web browser submits a request to the Web server through socket. HTTP requests are generally get or POST commands (post is used for passing form parameters ). Get Command Format: Get path/File Name HTTP/1.0 file name indicates the accessed file, HTTP/1.0 indicates the HTTP Version Used by the web browser.
(3) response: After a web browser submits a request, the request is sent to the Web server over HTTP. After the Web server receives the request, it processes the transaction and returns the result to the Web browser over HTTP, so that the requested page is displayed on the web browser.
For example, if the client establishes a connection with www.mycompany.com: 8080/mydir/index.html, the GET command: Get/mydir/index.html HTTP/1.0 is sent. The Web server named www.mycompany.comsearches for the file index.html of mydirin the file space of the website. If the file is found, the web server sends the file content to the corresponding web browser. To inform the web browser of the type of content transmitted, the web server first Transmits some HTTP header information and then transmits the specific content (that is, the HTTP body information ), the HTTP header and HTTP body are separated by a blank line.
Common HTTP header information:
① HTTP 1.0 200 OK: This is the first line of Web server response. It lists the HTTP Version Number and response code that the server is running. The Code "200 OK" indicates that the request is complete.
② Mime_version: 1.0 indicates the MIME Version.
③ Content_type: the header information of the type is very important. It indicates the MIME type of the HTTP body information. For example, content_type: text/html indicates that the transmitted data is an HTML document.
④ Content_length: it indicates the length (in bytes) of the HTTP body information ).
(4) Close the connection: After the response is completed, the web browser and the Web server must be disconnected to ensure that other Web browsers can establish a connection with the Web server.
HTTP protocol analysis
1. http protocol Overview
HTTP is a standard (TCP) for client and server requests and responses ). The client is an end user and the server is a website. By using a Web browser, web crawler, or other tools, the client initiates an HTTP request to the specified port on the server (the default port is 80. (We call this client) User Agent ). The response server stores (some) resources, such as HTML files and images. This response server is the origin server ). There may be multiple middle layers between the user proxy and the source server, such as the proxy, gateway, or tunnels ). Although TCP/IP is the most popular application on the internet, HTTP does not stipulate that it must be used and (based on) the layer it supports. In fact, HTTP can be implemented on any other Internet protocol or on another network. HTTP only assumes that (provided by its lower-layer protocol) reliable transmission, any protocol that can provide such assurance can be used by it.
Ii. HTTP Communication Process
When we enter "www.baidu.com" in the address bar of the browser and press enter, what happens after that is, we can directly see that the corresponding webpage is opened, how does the internal client communicate with the server?
1. Automatic URL resolution
The http url contains sufficient information for searching a resource. The basic format is as follows: http: // host [":" port] [abs_path], HTTP indicates the HTTP protocol of the bucket to locate network resources. Host indicates a valid host domain name or IP address. Port indicates a port number. The default value is 80. abs_path indicates the URI of the requested resource; if the URL does not provide abs_path, it must be given in the form of "/" when it is used as the request URI. Generally, this job is automatically completed by the browser.
For example, if you enter www.163.com, the browser automatically converts it to: http: // www.163.com/2、, and sets up a tcpconnection.
Enter "http: // www.xxx.com/" in the address bar of the browser and submit the file. First, it searches for the domain name in the local DNS Cache table. If yes, it tells the IP address directly. If no IP address is found, the gateway DNS must be searched. If no IP address is found, the corresponding IP address is returned to the browser.
After obtaining the IP address, you can establish a three-way handshake connection with the requested TCP address. After the connection is established, an HTTP request is sent to the server.
3. The client browser sends an HTTP request to the server
Once a TCP connection is established, the Web browser sends a request command to the Web server, and then sends some other information to the web server in the form of header information, then the browser sends a blank line to notify the server that it has ended sending the header information.
4. The web server responds and sends data to the browser.
After the client sends a request to the server, the server returns the response from the client,
HTTP/1.1 200 OK
The first part of the response is the Protocol version number and response status code, just as the client sends information about itself along with the request, the server also sends a response to the user about its own data and requested documents.
After the Web server sends the header information to the browser, it will send a blank line to indicate that the header information is sent to this end, and then, it sends the actual data requested by the user in the format described in the Content-Type response header information.
5. The web server closes the TCP connection.
In general, once the Web server sends the request data to the browser, it must close the TCP connection. Then, if the browser or Server adds this line of code to its header information
Connection: keep-alive
The TCP connection remains open after being sent. Therefore, the browser can continue to send requests through the same connection. Keeping connections saves the time required to create new connections for each request and reduces network bandwidth.
Iii. Analyze HTTP Communication by instance
First, we will introduce HTTP analyzer as a tool for real-time analysis of HTTP/HTTPS data streams. It can capture HTTP/HTTPS data in real time and display a lot of information (including file header, content, Cookie, query character seek, submitted data, and redirected URL address ), provides buffer information, cleanup dialog content, HTTP status information, and other filtering options. It is also a very useful development tool for analysis, debugging, and diagnosis.
Below we access http://www.google.cn/, HTTP analyzer will capture packets to analyze the process of accessing the browser and server communication.
1. Run HTTP analyzer and select action-start to capture packets;
2. Enter http://www.google.cn/in the browser. After the webpage is opened, select action-stop in HTTP analyzer to stop packet capture. The tool has detailed information about the accessed packets. See the packet capture information
L packet capture results and file header information ()
L The html body of a request
L whether cookies exist in this request
L The entire packet information of a request, including the header information and body information.
You will find that only one hyperlink is clicked in the browser, but multiple data packets are sent. This is because when the requested webpage file contains many images, music, movies, and other information, the information returned by the server does not directly contain image data, but simply stores the link of the image, when the browser explains the image URL, it sends a request to the server for the image.
. Next we will analyze the HTTP request and response information in detail:
1) HTTP request message. After a TCP connection is established between the client and the server, the client sends a request message to the server, for example:
[1] GET/HTTP/1.1
[2] accept: image/GIF, image/X-xbitmap, image/JPEG, image/pjpeg, application/X-Shockwave-flash, application/X-Silverlight, application/vnd. MS-Excel, application/vnd. lists the types that can be recognized by the MS-PowerPoint, application/MSWord, and */* clients.
[3] language that the accept-language: ZH-CN client can interpret: Simplified Chinese
UA-CPU: x86
[5] accept-encoding: types that can be interpreted by the deflate Client
[6] User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Maxthon; embeddedwb 14.52 from: http://www.bsalsa.com/embeddedwb 14.52 ;. net CLR 2.0.50727; infopath.1; CBA) client browser Model
[7] HOST: http://www.google.cn/submit request page
[8] connection: Keep the keep-alive TCP connection open
[9] The request information consists of four parts:
L request method URI protocol/version: The above code indicates the Request Method in line [1]. "HTTP/1.1 indicates the protocol and Protocol version, HTTP requests can use multiple request methods. The most common methods are get and post.
L request header: [2]-[8] lines, containing many useful information about the client environment and request body.
L blank line: [9] There is a blank line between the request header and the request body. This line is very important, indicating that the request header has ended, followed by the body. This line is very important, it indicates that the request header has ended, followed by the request body.
L request body. The request body can contain the query string information submitted by the customer, such as the user name and password. No.
Here, it is worth noting that the get and post methods in the request method;
The get method is the default HTTP request method. We use the get method to submit form data. However, the form data submitted by the get method is only encoded, at the same time, it will be sent to the Web server as part of the URL. Therefore, if you use the get method to submit form data, there is a security risk, and the URL length is limited, cannot exceed 1 K.
The post method is an alternative to the get method. It mainly submits form data to the Web server, especially a large volume of data. The post method overcomes some shortcomings of the get method. When the form data is submitted through the POST method, the data is not part of the URL request but is transmitted to the Web server as standard data, which overcomes the disadvantages of the information in the get method being unable to be kept confidential and the data volume is too small. Therefore, for security considerations and respect for user privacy, the POST method is usually used for form submission.
2) The HTTP Response Message is similar to the request, for example:
[1] HTTP/1.1 200 OK
[2] cache-control: private, Max-age = 0
[3] Date: Fri, 27 Feb 2009 07:53:36 GMT
[4] expires:-1
[5] Content-Type: text/html; charset = UTF-8
[6] Set-COOKIE: Pref = id = cc4a31ab6792ef2c: nw = 1: TM = 1235721216: LM = 1235721216: S = q1hQBu-1KdamAWK-; expires = sun, 27-feb-2011 07:53:36 GMT; path =/; domain = .google.cn
[7] cont
Ent-encoding: Gzip
[8] server: GWs
[9] transfer-encoding: chunked
[10]
[11] DDC
The response information is composed of four parts:
L protocol status description. HTTP/1.1 indicates the Protocol version, and 200 OK indicates that the server has successfully processed the requests sent by the client. 200 indicates that the HTTP response code is successful. An HTTP response code consists of three digits. the first digit defines the type of the response code:
1xx-information indicates that a Web browser request is received and is being processed.
2XX-successful indicates that user requests are correctly received, understood, and processed, for example, 200 OK.
3xx-redirection class (redirection), indicating that the request failed and the customer must take further actions.
4xx-client error indicates that the request submitted by the client has an error such as 404 not found, which means that the document referenced in the request does not exist.
5xx-Server Error indicates that the server cannot process the request: for example, 500
L Response Header: like the request header, it indicates the server's functions and identifies the details of the response data.
L blank line: it is also an empty line that must exist between the response header and the response body, indicating that the response header ends, followed by the response body
L response body: the webpage content returned by the server.
Based on the above description, combined with the actual verification of the tool, I believe that there should be a general understanding of the HTTP protocol and its communication process. Turn: [http:// I .cnblogs.com/EditPosts.aspx? Opt = 1]
How HTTP works (transfer)