Original: http://igoro.com/archive/what-really-happens-when-you-navigate-to-a-url/
As a software developer, you will certainly have a complete hierarchical understanding of how Web applications work, as well as the technologies used in these applications: browsers, http,html, Web servers, requirements processing, and so on.
This article will be more in-depth study when you enter a URL, the background exactly what happened to the thing ~
1. First of all, you need to enter the URL in the browser:
2. The browser finds the IP address of the domain name
The first step in navigation is to find its IP address by the domain name that is accessed. The DNS lookup process is as follows:
- Browser cache – The browser caches DNS records for a period of time. Interestingly, the operating system does not tell the browser when to store DNS records, so that different browsers store a self-fixed time (ranging from 2 minutes to 30 minutes).
- System cache – If the required records are not found in the browser cache, the browser makes a system call (gethostbyname in Windows). This will get the records in the system cache.
- Router caching – Next, the previous query request is sent to the router, which generally has its own DNS cache.
- ISP DNS Cache – The next check is the ISP cache DNS server. The corresponding cache record can be found in this general.
- Recursive search – Your ISP's DNS server starts with a recursive search with a domain name server, from a. com top-level domain name server to a Facebook domain name server. In the general DNS server cache there will be domain names in the. com domain name server, so the match process to the top level server is not so necessary.
DNS recursive lookups are as follows:
DNS is a bit worrying, which is that the entire domain name, such as Wikipedia.org or facebook.com, appears to correspond to a single IP address. Fortunately, there are several ways to eliminate this bottleneck:
- Circular DNS is a solution when DNS lookups return multiple IPs. For example, facebook.com actually corresponds to four IP addresses.
- A load balancer is a hardware device that listens on a specific IP address and forwards network requests to a clustered server. Some large sites typically use this expensive, high-performance load balancer.
- Geographic DNS improves scalability by mapping domain names to multiple different IP addresses, depending on the geographic location of the user. Such a different server is not able to update the synchronization state, but it is good to map the static content.
- Anycast is a routing technology that maps multiple physical hosts to an IP address. In the ointment, anycast and TCP protocols are not well adapted, so they are rarely used in those scenarios.
Most DNS servers use anycast to obtain efficient, low-latency DNS lookups.
3. The browser sends an HTTP request to the Web server
Because dynamic pages such as the Facebook page, which are opened and soon expire in the browser cache, are no doubt they cannot be read from.
So, the browser will send a request to the server on which Facebook is located:
GET http://facebook.com/HTTP/1.1
Accept:application/x-ms-application, Image/jpeg, Application/xaml+xml, [...]
user-agent:mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; [...]
Accept-encoding:gzip, deflate
Connection:keep-alive
Host:facebook.com
Cookie:datr=1265876274-[...]; Locale=en_us; Lsd=ww[...]; C_user=2101[...]
GET This request defines the URL to read: "http://facebook.com/". The browser itself defines (user-agent header), and what type of corresponding (accept andaccept-encoding header) it wants to accept. The connection header requires the server not to close the TCP connection in order to request it behind.
The request also contains cookies stored by the browser for that domain name. As you may already know, in different page requests, cookies are key values that match the status of a website. This allows cookies to store login usernames, server-assigned passwords, and some user settings. Cookies are stored in the client computer as text documents and sent to the server each time it is requested.
The original HTTP request and its corresponding tools are used in many ways. The author prefers to use fiddler, and of course there are other tools like Firebug. These software can be very helpful when it comes to website optimization.
In addition to getting the request, there is another way to send the request, which is often used in the submission form alone. The sending request passes its parameters via a URL (e.g.: http://robozzle.com/puzzle.aspx?id=85). The send request sends its arguments after the request body header.
Slashes like "http://facebook.com/" are critical. In this case, the browser can safely add slashes. And like "http://example.com/folderorfile" Such an address, because the browser is not clear whether Folderorfile is a folder or file, so cannot automatically add slashes. At this point, the browser does not have a slash directly access to the address, the server responds to a redirect, resulting in an unnecessary handshake.
4. Permanent redirect response for Facebook services
The figure shows the response that the Facebook server sends back to the browser:
http/1.1 301 Moved Permanently
Cache-control:private, No-store, No-cache, Must-revalidate, Post-check=0,
Pre-check=0
Expires:sat, 00:00:00 GMT
location:http://www.facebook.com/
p3p:cp= "DSP Law"
Pragma:no-cache
set-cookie:made_write_conn=deleted; Expires=thu, 12-feb-2009 05:09:50 GMT;
path=/; domain=.facebook.com; HttpOnly
content-type:text/html; Charset=utf-8
X-cnection:close
Date:fri, 05:09:51 GMT
content-length:0
The server responds with a 301 permanent redirect response to the browser so that the browser accesses "http://www.facebook.com/" rather than "http://facebook.com/".
Why does the server have to redirect rather than directly send the Web content that the user wants to see? There are many interesting answers to this question.
One of the reasons is related to search engine rankings. You see, if a page has two addresses, like http://www.igoro.com/and http://igoro.com/, the search engine will think of them as two sites, resulting in fewer search links and less rankings. and search engine know 301 permanent redirect is what meaning, so will visit with www and without WWW address to the same site ranking.
Another is that using a different address will result in a poor cache-friendliness. When a page has several names, it may appear several times in the cache.
5. Browser Tracing REDIRECT Address
Now, the browser knows that "http://www.facebook.com/" is the correct address to access, so it sends another fetch request:
GET http://www.facebook.com/HTTP/1.1
Accept:application/x-ms-application, Image/jpeg, Application/xaml+xml, [...]
Accept-language:en-us
user-agent:mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; [...]
Accept-encoding:gzip, deflate
Connection:keep-alive
Cookie:lsd=xw[...]; C_user=21[...]; X-referer=[...]
Host:www.facebook.com
The header information is in the same meaning as in the previous request.
6. Server "Processing" requests
The server receives the fetch request, and then processes and returns a response.
This appears to be a forward-looking task, but there are a lot of interesting things going on in the middle-just like the simple website of the author's blog, not to mention the large-scale website like Facebook.
- Web Server Software
Web server Software (like IIS and Apache) receives an HTTP request and then determines what request processing is performed to handle it. Request processing is a program that can read requests and generate HTML to respond (like Asp.net,php,ruby ... )。For the simplest example, the requirement processing can be stored in a file hierarchy that maps the address structure of a Web site. Like http://example.com/folder1/page1.aspx this address will map/httpdocs/folder1/page1.aspx this file. The Web server software can be set as the address manual for the corresponding request processing, so that the Page1.aspx publishing address can be http://example.com/folder1/page1.
- Request Processing
Request processing of Read requests and its parameters and cookies. It will read and possibly update some data, and say that the data is stored on the server. Then, the requirement processing generates an HTML response.
All dynamic sites face an interesting challenge-how to store data. Half of a small site has a SQL database to store data, and a website that stores a large amount of data and/or visits has to find some way to allocate the database to multiple machines. The solution is: sharding (based on the primary key value of the data table scattered across multiple databases), replication, the use of weak semantic consistency of the simplified database.
Delegating work to batch processing is a cheap technology to keep data updated. For example, Fackbook has to update the news feed in a timely fashion, but the "people you might know" feature in the data support only needs to be updated every night (as the author guesses, it's unclear how the changes will be perfected). batch job updates can cause some of the less important data to be stale, but it makes it faster and cleaner to keep data updated.
7. The server sends back an HTML response
The response generated and returned by the server in the figure:
http/1.1 OK
Cache-control:private, No-store, No-cache, Must-revalidate, Post-check=0,
Pre-check=0
Expires:sat, 00:00:00 GMT
p3p:cp= "DSP Law"
Pragma:no-cache
Content-encoding:gzip
content-type:text/html; Charset=utf-8
X-cnection:close
Transfer-encoding:chunked
Date:fri, 09:05:55 GMT
[Email protected] [...]
The entire response size is 35kB, most of which is transferred as BLOB type after finishing.
The content encoding header tells the browser that the entire response body is compressed with the GZIP algorithm. After extracting the BLOB block, you can see the following HTML as expected:
<! DOCTYPE HTML PUBLIC "-//W3C//DTD XHTML 1.0 strict//en"
"Http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd" >
Lang= "en" id= "Facebook" class= "No_js" >
<meta http-equiv= "Content-type" content= "text/html; Charset=utf-8 "/>
<meta http-equiv= "Content-language" content= "en"/>
...
With regard to compression, the header information indicates whether the page is cached or not, and if so, what cookies are to be set (not in the previous response) and private information.
Note that the Content-type is set to "text/html" in the header. The header lets the browser render the response content in HTML instead of downloading it as a file. The browser determines how the response is interpreted based on the header information, but it also considers other factors such as URL extension content.
8. The browser starts to display HTML
When the browser does not fully accept the entire HTML document, it has already started to display this page:
9. The browser sends the object embedded in the HTML
When the browser displays HTML, it will notice the need to get a label for other address content. At this point, the browser sends a FETCH request to retrieve the files.
Here are a few URLs we need to get back when we visit facebook.com:
- Image
Http://static.ak.fbcdn.net/rsrc.php/z12E0/hash/8q2anwu7.gif
Http://static.ak.fbcdn.net/rsrc.php/zBS5C/hash/7hwy7at6.gif
...
- CSS style Sheets
Http://static.ak.fbcdn.net/rsrc.php/z448Z/hash/2plh8s4n.css
Http://static.ak.fbcdn.net/rsrc.php/zANE1/hash/cvtutcee.css
...
- JavaScript files
Http://static.ak.fbcdn.net/rsrc.php/zEMOA/hash/c8yzb6ub.js
Http://static.ak.fbcdn.net/rsrc.php/z6R9L/hash/cq2lgbs8.js
...
These addresses are going through a process similar to HTML reading. So the browser will find these domain names in DNS, send requests, redirect, etc...
However, unlike dynamic pages, static files allow the browser to cache them. Some files may not need to be communicated to the server and read directly from the cache. The server's response contains the term information for static file retention, so the browser knows how long to cache them. Also, each response may contain an ETag header that works like a version number (the entity value of the requested variable), and if the browser observes that the version ETag information for the file already exists, stop the file transfer immediately.
Try to guess what "fbcdn.net" means in the address. The smart answer is "Facebook content distribution Network". Facebook uses content distribution networks (CDNs) to distribute static files like images, CSS tables, and JavaScript files. As a result, these files will be backed up in many CDN data centers around the world.
Static content often represents the bandwidth size of the site and can be easily replicated through a CDN. A third-party CDN is usually used by the website. For example, Facebook's static files are hosted by Akamai, the largest CDN provider.
For example, you might get a response from a akamai.net server when you try to ping static.ak.fbcdn.net. Interestingly, when you ping again, the server may be different, which means that the load balance behind the scenes is starting to work.
10. The browser sends an asynchronous (AJAX) request
Under the guidance of the Great Spirit of Web 2.0, the client remains in contact with the server after the page is displayed.
Take the Facebook chat feature as an example, and it will keep in touch with the server to update your shiny gray friend status in a timely fashion. In order to update the status of these avatar-lit friends, the JavaScript code executed in the browser sends an asynchronous request to the server. This asynchronous request is sent to a specific address, which is a fetch or send request constructed by the program. Or in the case of Facebook, the client sends http://www.facebook.com/ajax/chat/buddy_list.php a publish request to get the status information about which online in your friend.
When you bring up this pattern, you have to talk about "AJAX"-"Asynchronous JavaScript and XML," although the reason why the server responds in XML format is casualgirlfriend. For example, for an asynchronous request, Facebook will return some JavaScript code snippets.
Among other fiddler, this tool allows you to see the asynchronous requests sent by the browser. In fact, you can not only passively as a spectator of these requests, but also proactively make changes and resend them. Ajax requests are so easy to be blindfolded that it's really frustrating for those scoring online game developers. (Of course, don't lie to others like that)
The Facebook Chat feature provides an interesting case for Ajax: Pushing data from the server side to the client. Because HTTP is a request-response protocol, the chat server cannot send new messages to customers. Instead, the client has to poll the server every few seconds to see if he has any new messages.
These situations occur when long polling is a very interesting technique for mitigating server load. If the server does not have new messages when polled, it will ignore the client. When a new message is received from the customer without a time-out, the server finds an outstanding request and returns the new message as a response to the client.
Summarize
Hope to read this article, you can understand how different network modules work together
What happens when you enter a URL in the browser's address bar and return to it?