When you hit "http://www.cnblogs.com/" in the browser's address bar, and then press ENTER, it will be the homepage of the blog Park (This is nonsense, you will take it for granted). As a developer, especially a web developer, I think you need to understand this series of processes, during which the browser and server exactly how to deal with it? How does the server handle it? How does the browser display the Web page to the user? ......
There are so many doubts and details. Frankly speaking, in order to thoroughly understand each of these doubts and processing details, at least 10 book thickness, so-called "bottom No Limit" well, and different Web server and server-side programming language implementation and processing process is not the same (but the essence is the same). In this article, I will explain some of the nature of web development to you based on the knowledge of the HTTP protocol. Whether you're in. NET, or Java EE or PHP development, and so on, are inseparable from these essence. I hope you will finish reading this article and have a new harvest and insight. Because of my level and experience is limited, inevitably wrong, hope readers forgive me.
What is the HTTP protocol (hypertext Transfer Protocol, Hypertext Transfer Protocol)?
The so-called agreement refers to the norms that both parties follow. The HTTP protocol is a specification of "communication" between a browser and a server. We are looking at space, brush micro bo ... are using the HTTP protocol, of course, far more than these applications.
I have always heard that HTTP belongs to the "Application layer protocol" and is based on TCP/IP protocol. This is not difficult to understand, if you have learned the "computer network" course in college, you must know the OSI Seven Layer Reference protocol (I was rote). If you are in touch with socket network programming, you should understand that both TCP and UDP use a wide range of communication protocols (establishing connections, three handshakes, and so on, which, of course, are not the focus of this article).
Since TCP/UDP is a widely used network communication protocol, why is there more than one HTTP protocol?
The author has written a simple Web server processing software, according to my inference (not necessarily accurate). UDP protocol is unreliable and unsafe, obviously it is difficult to meet the needs of Web applications.
The TCP protocol is based on the connection and the three-time handshake, although it has the reliability, but the person has certain flaw. But imagine, ordinary C/s architecture software, at most thousands of clients at the same time connected, and B/s architecture of the site, 100,000 people at the same time online is also very common thing. If 100,000 clients and servers remain connected, how does the server meet the load?
This generates the HTTP protocol. TCP-based reliability connections. The popular point is that after the request, the server closes the connection immediately and frees the resources. This ensures that the resources are available and the advantages of TCP reliability are also absorbed.
Because of this, it is often said that the HTTP protocol is "stateless", that is, "the server does not know what your client is doing", in fact, largely based on performance considerations. So that later had the session and the like.
Actual Combat preparation work:
On the monitoring network, the Windows platform has a good software called Sniffer, which is also a lot of "hackers" often used sniffer tools. When studying the HTTP protocol, we recommend that you use a
A tool called HttpWatch. (Unfortunately, the tool is chargeable.) What to do, you know). After the installation is complete, you can open it directly in IE's tools (Firefox is also currently supported). :
Click Record to start monitoring and logging HTTP messages. Stop, clear and so on the function of the button, here is not introduced. Take the example to speak, the following is my record to visit the Main.aspx page when recorded, can clearly see the HTTP message information details,
learn the HTTP protocol, the main need to understand the HTTP request and response (of course, get, post and other requests, status code, URI, MIME, etc.) First look at the HTTP request message (that is, the browser dropped to the server):An HTTP request represents the data that the client browser sends to the server. A complete HTTP request message that contains a request line, several message headers (request headers), line breaks, entity content request lines: Describes how the client is requested, the name of the request resource, and the version number of the HTTP protocol. For example: Get/book/java. The HTML http/1.1 request header (message header) contains (the server host name that the client requested, the client's environment information, and so on): Accept: Used to tell the server that the client supports the data type (for example: accept:text/html,image/*) Accept-charset: Used to tell the server, the client uses the encoding format accept-encoding: Used to tell the server, the client supports the data compression format accept-language: Client locale Host: Client through this server, Host name to access if-modified-since: Client through this header tells the server, the resource cache time Referer: The client through this header tells the server, it (the client) is from which resources to access the server (anti-theft chain) User-agent: The client tells the server through this header, the client's software environment (operating system, browser version, etc.) Cookie: The client through this header, the Coockie information to the server Connection: Tell the server, after the request is complete, whether to remain connected Date: Tells the server the time (newline) entity content of the current request: the Entity data that the browser sends to the server over the HTTP protocol. For example: name=dylan&id=110 (GET request, the value is passed to the server via the URL.) When a post is requested, the value is sent to the server via the form)then look at the HTTP response message (the server returned to the browser):An HTTP response represents the data that the server sends back to the client, which includes: A status line, several message headers, and the Entity Content Response Header (message header) contains: Location: This header with 302 state, used to tell the client who to look for server: Server through this header, Tell the browser server the type content-encoding: Tell the browser, the data compression format of the server content-length: Tell the browser, the length of the loopback data content-type: Tell the browser, the type of loopback data Last-modified: Tells the browser the current resource cache time refresh: Tell the browser, how often refresh content-disposition: Tell the browser to download the way to open the data. For example: Context. Response.AddHeader ("Content-disposition", "attachment:filename=aa.jpg"); Context. Response.WriteFile ("aa.jpg"); Transfer-encoding: tells the browser to transmit the encoded format of the data ETag: Cache-related headers (can be updated in real time) Expries: How long to tell the browser to send back the resource cache. If 1 or 0 indicates that Cache-control is not cached: Control browser does not cache data No-cache Pragma: Control browser do not cache data No-cache Connection: If the response is complete, disconnect. Close/keep-alive Date: Tells the browser that the server response time
Status line: For example: http/1.1 OK (Protocol version number is 1.1 response status Code 200 response result is OK)
Entity content (Entity header): Responses contain static content that the browser can parse, such as: HTML, plain text, pictures, and so on
Understanding the above HTTP request messages and response messages, I believe you have understood the HTTP protocol is deep enough. For more specific details about the HTTP protocol, you can refer to the HTTP RFC documentation .
The approximate step is: The browser first sends the request to the server, the server receives the request, does the corresponding processing, then encapsulates the response message, and then returns it to the browser. After the browser has received the response message, then through the browser engine to render the Web page, parse the DOM tree, the JavaScript engine parsing and executing script operations, plug-ins to do the work of the plug-in ... For browser rendering, the principle of parsing, you can refer to http://kb.cnblogs.com/page/129756/
Frankly speaking, the nature of the so-called web is nothing more than: request/processing/response, any Web server, any service-side programming language, can not be divorced from this essence. and the browser side parsing the HTML, pictures and other static content, presented to the user, the script engine executes script code, the completion of the script code to do things (such as DOM operations, CSS property changes, send AJAX requests, etc.).
I think that, in fact, the browser is a special client, and b/s architecture is a special C/s architecture. It is worth mentioning that different Web servers and programming languages, but also how to receive user HTTP requests. How to handle, how to respond to it? The author takes the familiar ASP. NET as an example, through the Anti-compilation tool to view the source code (Microsoft This guy is really packaged too good) from the bottom of the analysis,
Due to space constraints, the details of the ASP. NET, IIS Web server, and the underlying implementation can no longer be further dissected. Because Microsoft's ASP. NET technology system is huge and complex. The author will continue to update the series of articles, readers are welcome to continue to pay attention.
Http://www.cnblogs.com/dinglang/archive/2012/02/11/2346430.html
HTTP protocol and Web nature (RPM)