From the user input a Web site to the end of the Web page to show the user, what happened in the middle? This is a simple and complex question.
The approximate process is summarized as follows:
1. Enter the address
2. The browser finds the IP address of the domain name (this step includes the DNS specific lookup process, including: Browser cache, system cache, router cache ...) )
3. The browser sends an HTTP request to the Web server
4. Permanent redirect response of the server (from http://example.com to http://www.example.com)
5. Browser Tracing REDIRECT Address
6. Server processing Requests
7. The server returns an HTTP response
8. Browser Display HTML
9. The browser sends the request to get the resources embedded in the HTML (slices, audio, video, CSS, JS, etc.)
10. The browser sends an asynchronous request
The following is a detailed analysis of the key appeals process:
(1) The browser finds the IP address of the domain name
Locate the IP address of the domain by accessing it.
The DNS lookup process is as follows:
* Browser Cache – The browser caches DNS records for a period of time, but the operating system does not tell the browser when to store DNS records, so that different browsers store a self-fixed time (ranging from 2 minutes to 30 minutes).
* System Cache – If the required records are not found in the browser cache, the browser makes a system call (gethostbyname in Windows) so that the records in the system cache can be obtained.
* Router Caching – Next, the previous query request is sent to the router, which generally has its own DNS cache.
* ISP DNS Cache – The next check is the ISP cache DNS server. The corresponding cache record can be found in this general.
* Recursive search – Your ISP's DNS server starts a recursive search with a domain name server, from a. com top-level domain name server to a example domain name server. In the general DNS server cache there will be domain names in the. com domain name server, so the match process to the top level server is not so necessary.
(2) The browser sends an HTTP request to the Web server
Because dynamic pages such as the Facebook page, which are opened and soon expire in the browser cache, are no doubt they cannot be read from. So, the browser will send a request to the server on which Facebook is located:
GET HTTP://facebook.com/HTTP/1.1
Accept:application/x-ms-application, Image/jpeg, Application/xaml+xml, [...] user-agent:mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; [...] Accept-encoding:gzip, deflate
Connection:keep-alive
Host:facebook.com
Cookie:datr=1265876274-[...]; Locale=en_us; Lsd=ww[...]; C_user=2101[...] GET This request defines the URL to read: "HTTP://facebook.com/". The browser itself defines (user-agent header), and what type of corresponding (accept and accept-encoding headers) it wants to accept. The connection header requires the server not to close the TCP connection in order to request it behind.
The request also contains cookies stored by the browser for that domain name. As you may already know, in different page requests, cookies are key values that match the status of a website. This allows cookies to store login usernames, server-assigned passwords, and some user settings. Cookies are stored in the client computer as text documents and sent to the server each time it is requested.
The original HTTP request and its corresponding tools are used in many ways. The author prefers to use fiddler, and of course there are other tools like Firebug. These software can be very helpful when it comes to website optimization.
In addition to getting the request, there is another way to send the request, which is often used in the submission form alone. The sending request passes its parameters via a URL (e.g.: HTTP://robozzle.com/puzzle.aspx?id=85). The send request sends its arguments after the request body header.
Slashes like "HTTP://facebook.com/" are critical. In this case, the browser can safely add slashes. And like "HTTP://example.com/folderorfile" Such an address, because the browser is not clear whether Folderorfile is a folder or file, so cannot automatically add slashes. At this point, the browser does not have a slash directly access to the address, the server responds to a redirect, resulting in an unnecessary handshake.
(3) Establishment of HTTP request
Establish a TCP connection: Before HTTP work begins, the Web browser first establishes a connection to the Web server over the network, which is done through TCP, which works with the IP protocol to build the Internet, known as the TCP/IP protocol family, So the internet is also known as a TCP/IP network. HTTP is a higher level of application-level protocol than TCP, according to the rules, only the lower layer protocol is established before the protocol can be more connected, so the first to establish a TCP connection, the port number of the general TCP connection is 80. In the TCP/IP protocol, the TCP protocol provides a reliable connection service with a three-time handshake to establish a connection.
First handshake: Host a send bit code for syn=1, randomly generate SEQ number=1234567 packet to the server, Host B by Syn=1 know, a requirements to establish online;
Second handshake: Host B receives the request to confirm the online information, send an ACK to a number= (host A's seq+1), syn=1,ack=1, randomly generate seq=7654321 packets
Third handshake: Host a after receiving check ACK number is correct, that is, the first sent Seq Number+1, and the bit code ACK is 1, if correct, host A will send an ACK number= (Host B seq+1), ack= 1, Host B is received after confirming the SEQ value and ack=1 The connection was established successfully.
To complete the three handshake, host A and Host B start transmitting data.
Once a TCP connection is established, the Web browser sends a request command to the Web server.
After the browser sends its request command, it also sends some other information to the Web server in the form of header information, and then the browser sends a blank line to notify the server that it has ended sending the header information.
(4) Permanent redirect response for the service
http/1.1 301 Moved Permanently
Cache-control:private, No-store, No-cache, Must-revalidate, Post-check=0,
Pre-check=0
Expires:sat, 00:00:00 GMT
location:http://www.facebook.com/
p3p:cp= "DSP Law"
Pragma:no-cache
set-cookie:made_write_conn=deleted; Expires=thu, 12-feb-2009 05:09:50 GMT;
path=/; domain=.facebook.com; HttpOnly
content-type:text/html; Charset=utf-8
X-cnection:close
Date:fri, 05:09:51 GMT
content-length:0
The server responds with a 301 permanent redirect response to the browser so that the browser accesses "HTTP://www.facebook.com/" rather than "HTTP://facebook.com/".
Why does the server have to redirect rather than directly send the Web content that the user wants to see? There are many interesting answers to this question.
One of the reasons is related to search engine rankings. You see, if a page has two addresses, like HTTP://www.igoro.com/and HTTP://igoro.com/, the search engine will think of them as two sites, resulting in fewer search links and less rankings. and search engine know 301 permanent redirect is what meaning, so will visit with www and without WWW address to the same site ranking.
Another is that using a different address will result in a poor cache-friendliness. When a page has several names, it may appear several times in the cache.
(5) Browser tracking REDIRECT Address
Now, the browser knows that "HTTP://www.facebook.com/" is the correct address to access, so it sends another fetch request:
GET HTTP://www.facebook.com/HTTP/1.1
Accept:application/x-ms-application, Image/jpeg, Application/xaml+xml, [...] Accept-language:en-us
user-agent:mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; [...] Accept-encoding:gzip, deflate
Connection:keep-alive
Cookie:lsd=xw[...]; C_user=21[...]; X-referer=[...] Host:www.facebook.com
The header information is in the same meaning as in the previous request.
(6) Server "Processing" request
The server receives the fetch request, and then processes and returns a response.
This appears to be a forward-looking task, but there are a lot of interesting things going on in the middle-just like the simple website of the author's blog, not to mention the large-scale website like Facebook.
* Web server Software Web server software (like IIS and Apache) receives an HTTP request and then determines what request processing is performed to handle it. Request processing is a program that can read requests and generate HTML to respond (like Asp.net,php,ruby ... )。
For the simplest example, the requirement processing can be stored in a file hierarchy that maps the address structure of a Web site. Like HTTP://example.com/folder1/page1.aspx this address will map/httpdocs/folder1/page1.aspx this file. The Web server software can be set as the address manual for the corresponding request processing, so that the Page1.aspx publishing address can be HTTP://example.com/folder1/page1. * Request processing request processing read request and its parameters and cookies. It will read and possibly update some data, and say that the data is stored on the server. Then, the requirement processing generates an HTML response.
All dynamic sites face an interesting challenge-how to store data. Half of a small site has a SQL database to store data, and a website that stores a large amount of data and/or visits has to find some way to allocate the database to multiple machines. The solution is: sharding (based on the primary key value of the data table scattered across multiple databases), replication, the use of weak semantic consistency of the simplified database.
Delegating work to batch processing is a cheap technology to keep data updated. For example, Fackbook has to update the news feed in a timely fashion, but the "people you might know" feature in the data support only needs to be updated every night (as the author guesses, it's unclear how the changes will be perfected). batch job updates can cause some of the less important data to be stale, but it makes it faster and cleaner to keep data updated.
(7) The server sends back an HTML response
http/1.1 okcache-control:private, No-store, No-cache, Must-revalidate, Post-check=0,pre-check=0expires:sat, Jan 00:00:00 gmtp3p:cp= "DSP law" pragma:no-cachecontent-encoding:gzipcontent-type:text/html; Charset=utf-8x-cnection:closetransfer-encoding:chunkeddate:fri, 09:05:55 GMT
[Email protected] [...] The entire response size is 35kB, most of which is transferred as BLOB type after finishing.
The content encoding header tells the browser that the entire response body is compressed with the GZIP algorithm. After extracting the BLOB block, you can see the following HTML as expected:
"HTTP://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd" >lang= "en" id= "Facebook" >
...
With regard to compression, the header information indicates whether the page is cached or not, and if so, what cookies are to be set (not in the previous response) and private information.
Note that the Content-type is set to "text/html" in the header. The header lets the browser render the response content in HTML instead of downloading it as a file. The browser determines how the response is interpreted based on the header information, but it also considers other factors such as URL extension content.
(8) The browser starts to display HTML
When the browser does not fully accept the entire HTML document, it has already started to display this page.
(9) The browser sends the object embedded in the HTML
When the browser displays HTML, it will notice the need to get a label for other address content. At this point, the browser sends a FETCH request to retrieve the files.
Here are a few URLs we need to get back when we visit facebook.com:
* Picture
HTTP://static.ak.fbcdn.net/rsrc.php/z12E0/hash/8q2anwu7.gif
HTTP://static.ak.fbcdn.net/rsrc.php/zBS5C/hash/7hwy7at6.gif
... * CSS style sheet
HTTP://static.ak.fbcdn.net/rsrc.php/z448Z/hash/2plh8s4n.css
HTTP://static.ak.fbcdn.net/rsrc.php/zANE1/hash/cvtutcee.css
.. * JavaScript File
HTTP://static.ak.fbcdn.net/rsrc.php/zEMOA/hash/c8yzb6ub.js
HTTP://static.ak.fbcdn.net/rsrc.php/z6R9L/hash/cq2lgbs8.js
...
These addresses are going through a process similar to HTML reading. So the browser will find these domain names in DNS, send requests, redirect, etc...
However, unlike dynamic pages, static files allow the browser to cache them. Some files may not need to be communicated to the server and read directly from the cache. The server's response contains the term information for static file retention, so the browser knows how long to cache them. Also, each response may contain an ETag header that works like a version number (the entity value of the requested variable), and if the browser observes that the version ETag information for the file already exists, stop the file transfer immediately.
Try to guess what "fbcdn.net" means in the address. The smart answer is "Facebook content distribution Network". Facebook uses content distribution networks (CDNs) to distribute static files like images, CSS tables, and JavaScript files. As a result, these files will be backed up in many CDN data centers around the world.
Static content often represents the bandwidth size of the site and can be easily replicated through a CDN. A third-party CDN is usually used by the website. For example, Facebook's static files are hosted by Akamai, the largest CDN provider.
For example, you might get a response from a akamai.net server when you try to ping static.ak.fbcdn.net. Interestingly, when you ping again, the server may be different, which means that the load balance behind the scenes is starting to work.
(10) The browser sends an asynchronous (AJAX) request
Under the guidance of the Great Spirit of Web 2.0, the client remains in contact with the server after the page is displayed.
Take the Facebook chat feature as an example, and it will keep in touch with the server to update your shiny gray friend status in a timely fashion. In order to update the status of these avatar-lit friends, the JavaScript code executed in the browser sends an asynchronous request to the server. This asynchronous request is sent to a specific address, which is a fetch or send request constructed by the program. Or in the case of Facebook, the client sends HTTP://www.facebook.com/ajax/chat/buddy_list.php a publish request to get the status information about which online in your friend.
When you mention this pattern, you have to talk about "AJAX" – "Asynchronous JavaScript and XML", although the reason why the server responds in XML format is not casualgirlfriend. For example, for an asynchronous request, Facebook will return some JavaScript code snippets.
Among other fiddler, this tool allows you to see the asynchronous requests sent by the browser. In fact, you can not only passively as a spectator of these requests, but also proactively make changes and resend them. Ajax requests are so easy to be blindfolded that it's really frustrating for those scoring online game developers. (Of course, don't lie to others like that)
The Facebook Chat feature provides an interesting case for Ajax: Pushing data from the server side to the client. Because HTTP is a request-response protocol, the chat server cannot send new messages to customers. Instead, the client has to poll the server every few seconds to see if he has any new messages.
These situations occur when long polling is a very interesting technique for mitigating server load. If the server does not have new messages when polled, it will ignore the client. When a new message is received from the customer without a time-out, the server finds an outstanding request and returns the new message as a response to the client. Summarize
Hope to read this article, you can understand how different network modules work together.
What happened in the process of entering the URL and displaying the Web page?