Original: http://igoro.com/archive/what-really-happens-when-you-navigate-to-a-url/
As a software developer, you will certainly have a complete hierarchical understanding of how Web applications work, as well as the technologies used in these applications: browsers, http,html, Web servers, requirements processing, and so on.
This article will be more in-depth study when you enter a URL, the background exactly what happened to the thing ~
1. First of all, you need to enter the URL in the browser:
2. The browser finds the IP address of the domain name
The first step in navigation is to find its IP address by the domain name that is accessed. The DNS lookup process is as follows:
- Browser Cache – The browser caches DNS records for a period of time. Interestingly, the operating system does not tell the browser when to store DNS records, so that different browsers store a self-fixed time (ranging from 2 minutes to 30 minutes).
- System Cache – If the required records are not found in the browser cache, the browser makes a system call (gethostbyname in Windows). This will get the records in the system cache.
- Router Caching – Next, the previous query request is sent to the router, which generally has its own DNS cache.
- ISP DNS Cache – The next check is the ISP cache DNS server. The corresponding cache record can be found in this general.
- Recursive Search – your ISP's DNS server starts with a recursive search with a domain name server, from a. com top-level domain name server to a Facebook domain name server. In the general DNS server cache there will be domain names in the. com domain name server, so the match process to the top level server is not so necessary.
DNS recursive lookups are as follows:
DNS is a bit worrying, which is that the entire domain name, such as Wikipedia.org or facebook.com, appears to correspond to a single IP address. Fortunately, there are several ways to eliminate this bottleneck:
- Circular DNS is a solution when DNS lookups return multiple IPs. For example, facebook.com actually corresponds to four IP addresses.
- A load balancer is a hardware device that listens on a specific IP address and forwards network requests to a clustered server. Some large sites typically use this expensive, high-performance load balancer.
- geographic DNS improves scalability by mapping domain names to multiple different IP addresses, depending on the geographic location of the user. Such a different server is not able to update the synchronization state, but it is good to map the static content.
- Anycast is a routing technology that maps multiple physical hosts to an IP address. In the ointment, anycast and TCP protocols are not well adapted, so they are rarely used in those scenarios.
Most DNS servers use anycast to obtain efficient, low-latency DNS lookups.
3. The browser sends an HTTP request to the Web server
Because dynamic pages such as the Facebook page, which are opened and soon expire in the browser cache, are no doubt they cannot be read from.
So, the browser will send a request to the server on which Facebook is located:
GET http://facebook.com/HTTP/1.1
Accept:application/x-ms-application, Image/jpeg, Application/xaml+xml, [...]
user-agent:mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; [...]
Accept-encoding:gzip, deflate
Connection:keep-alive
Host:facebook.com
Cookie:datr=1265876274-[...]; Locale=en_us; Lsd=ww[...]; C_user=2101[...]
GET This request defines the URLto read: "http://facebook.com/". The browser itself defines (user-agent header), and what type of corresponding (Accept and accept-encoding headers) it wants to accept. The Connection header requires the server not to close the TCP connection in order to request it behind.
The request also contains cookiesstored by the browser for that domain name. As you may already know, in different page requests, cookies are key values that match the status of a website. This allows cookies to store login usernames, server-assigned passwords, and some user settings. Cookies are stored in the client computer as text documents and sent to the server each time it is requested.
The original HTTP request and its corresponding tools are used in many ways. The author prefers to use fiddler, and of course there are other tools like Firebug. These software can be very helpful when it comes to website optimization.
In addition to getting the request, there is another way to send the request, which is often used in the submission form alone. The sending request passes its parameters via a URL (e.g.: http://robozzle.com/puzzle.aspx?id=85). The send request sends its arguments after the request body header.
Slashes like "http://facebook.com/" are critical. In this case, the browser can safely add slashes. And like "http://example.com/folderorfile" Such an address, because the browser is not clear whether Folderorfile is a folder or file, so cannot automatically add slashes. At this point, the browser does not have a slash directly access to the address, the server responds to a redirect, resulting in an unnecessary handshake.
4. Permanent redirect response for Facebook services
The figure shows the response that the Facebook server sends back to the browser:
http/1.1 301 Moved Permanently
Cache-control:private, No-store, No-cache, Must-revalidate, Post-check=0,
Pre-check=0
Expires:sat, 00:00:00 GMT
location:http://www.facebook.com/
p3p:cp= "DSP Law"
Pragma:no-cache
set-cookie:made_write_conn=deleted; Expires=thu, 12-feb-2009 05:09:50 GMT;
path=/; domain=.facebook.com; HttpOnly
content-type:text/html; Charset=utf-8
X-cnection:close
Date:fri, 05:09:51 GMT
content-length:0
The server responds with a 301 permanent redirect response to the browser so that the browser accesses "http://www.facebook.com/" rather than "http://facebook.com/".
Why does the server have to redirect rather than directly send the Web content that the user wants to see? There are many interesting answers to this question.
One of the reasons is related to search engine rankings . You see, if a page has two addresses, like http://www.igoro.com/and http://igoro.com/, the search engine will think of them as two sites, resulting in fewer search links and less rankings. and search engine know 301 permanent redirect is what meaning, so will visit with www and without WWW address to the same site ranking.
Another is that using a different address will result in a poor cache-friendliness . When a page has several names, it may appear several times in the cache.
5. Browser Tracing REDIRECT Address
Now, the browser knows that "http://www.facebook.com/" is the correct address to access, so it sends another fetch request:
GET http://www.facebook.com/HTTP/1.1
Accept:application/x-ms-application, Image/jpeg, Application/xaml+xml, [...]
Accept-language:en-us
user-agent:mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; [...]
Accept-encoding:gzip, deflate
Connection:keep-alive
Cookie:lsd=xw[...]; C_user=21[...]; X-referer=[...]
Host:www.facebook.com
The header information is in the same meaning as in the previous request.
6. Server "Processing" requests
The server receives the fetch request, and then processes and returns a response.
This appears to be a forward-looking task, but there are a lot of interesting things going on in the middle-just like the simple website of the author's blog, not to mention the large-scale website like Facebook.
- Web Server Software
Web server Software (like IIS and Apache) receives an HTTP request and then determines what request processing is performed to handle it. Request processing is a program that can read requests and generate HTML to respond (like Asp.net,php,ruby ... )。For the simplest example, the requirement processing can be stored in a file hierarchy that maps the address structure of a Web site. Like http://example.com/folder1/page1.aspx this address will map/httpdocs/folder1/page1.aspx this file. The Web server software can be set as the address manual for the corresponding request processing, so that the Page1.aspx publishing address can be http://example.com/folder1/page1.
- Request Processing
Request processing of Read requests and its parameters and cookies. It will read and possibly update some data, and say that the data is stored on the server. Then, the requirement processing generates an HTML response.
All dynamic sites face an interesting challenge-how to store data. Half of a small site has a SQL database to store data, and a website that stores a large amount of data and/or visits has to find some way to allocate the database to multiple machines. The solution is: sharding (based on the primary key value of the data table scattered across multiple databases), replication, the use of weak semantic consistency of the simplified database.
Delegating work to batch processing is a cheap technology to keep data updated. For example, Fackbook has to update the news feed in a timely fashion, but the "people you might know" feature in the data support only needs to be updated every night (as the author guesses, it's unclear how the changes will be perfected). batch job updates can cause some of the less important data to be stale, but it makes it faster and cleaner to keep data updated.
7. The server sends back an HTML response
The response generated and returned by the server in the figure:
http/1.1 OK
Cache-control:private, No-store, No-cache, Must-revalidate, Post-check=0,
Pre-check=0
Expires:sat, 00:00:00 GMT
p3p:cp= "DSP Law"
Pragma:no-cache
Content-encoding:gzip
content-type:text/html; Charset=utf-8
X-cnection:close
Transfer-encoding:chunked
Date:fri, 09:05:55 GMT
[Email protected] [...]
The entire response size is 35kB, most of which is transferred as BLOB type after finishing.
The content encoding header tells the browser that the entire response body is compressed with the GZIP algorithm. After extracting the BLOB block, you can see the following HTML as expected:
<! DOCTYPE HTML PUBLIC "-//W3C//DTD XHTML 1.0 strict//en"
"Http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd" >
Lang= "en" id= "Facebook" class= "No_js" >
<meta http-equiv= "Content-type" content= "text/html; Charset=utf-8 "/>
<meta http-equiv= "Content-language" content= "en"/>
...
With regard to compression, the header information indicates whether the page is cached or not, and if so, what cookies are to be set (not in the previous response) and private information.
Note that the Content-type is set to "text/html" in the header. The header lets the browser render the response content in HTML instead of downloading it as a file. The browser determines how the response is interpreted based on the header information, but it also considers other factors such as URL extension content.
8. The browser starts to display HTML
When the browser does not fully accept the entire HTML document, it has already started to display this page:
9. The browser sends the object embedded in the HTML
When the browser displays HTML, it will notice the need to get a label for other address content. At this point, the browser sends a FETCH request to retrieve the files.
Here are a few URLs we need to get back when we visit facebook.com:
- Image
Http://static.ak.fbcdn.net/rsrc.php/z12E0/hash/8q2anwu7.gif
Http://static.ak.fbcdn.net/rsrc.php/zBS5C/hash/7hwy7at6.gif
...
- CSS style Sheets
Http://static.ak.fbcdn.net/rsrc.php/z448Z/hash/2plh8s4n.css
Http://static.ak.fbcdn.net/rsrc.php/zANE1/hash/cvtutcee.css
...
- JavaScript files
Http://static.ak.fbcdn.net/rsrc.php/zEMOA/hash/c8yzb6ub.js
Http://static.ak.fbcdn.net/rsrc.php/z6R9L/hash/cq2lgbs8.js
...
These addresses are going through a process similar to HTML reading. So the browser will find these domain names in DNS, send requests, redirect, etc...
However, unlike dynamic pages, static files allow the browser to cache them. Some files may not need to be communicated to the server and read directly from the cache. The server's response contains the term information for static file retention, so the browser knows how long to cache them. Also, each response may contain an ETag header that works like a version number (the entity value of the requested variable), and if the browser observes that the version ETag information for the file already exists, stop the file transfer immediately.
Try to guess what "fbcdn.net" means in the address. The smart answer is "Facebook content distribution Network". Facebook uses content distribution networks (CDNs) to distribute static files like images, CSS tables, and JavaScript files. As a result, these files will be backed up in many CDN data centers around the world.
Static content often represents the bandwidth size of the site and can be easily replicated through a CDN. A third-party CDN is usually used by the website. For example, Facebook's static files are hosted by Akamai, the largest CDN provider.
For example, you might get a response from a akamai.net server when you try to ping static.ak.fbcdn.net. Interestingly, when you ping again, the server may be different, which means that the load balance behind the scenes is starting to work.
10. The browser sends an asynchronous (AJAX) request
Under the guidance of the Great Spirit of Web 2.0, the client remains in contact with the server after the page is displayed.
Take the Facebook chat feature as an example, and it will keep in touch with the server to update your shiny gray friend status in a timely fashion. In order to update the status of these avatar-lit friends, the JavaScript code executed in the browser sends an asynchronous request to the server. This asynchronous request is sent to a specific address, which is a fetch or send request constructed by the program. Or in the case of Facebook, the client sends http://www.facebook.com/ajax/chat/buddy_list.php a publish request to get the status information about which online in your friend.
When you bring up this pattern, you have to talk about "AJAX"-"Asynchronous JavaScript and XML," although the reason why the server responds in XML format is casualgirlfriend. For example, for an asynchronous request, Facebook will return some JavaScript code snippets.
Among other fiddler, this tool allows you to see the asynchronous requests sent by the browser. In fact, you can not only passively as a spectator of these requests, but also proactively make changes and resend them. Ajax requests are so easy to be blindfolded that it's really frustrating for those scoring online game developers. (Of course, don't lie to others like that)
The Facebook Chat feature provides an interesting case for Ajax: Pushing data from the server side to the client. Because HTTP is a request-response protocol, the chat server cannot send new messages to customers. Instead, the client has to poll the server every few seconds to see if he has any new messages.
These situations occur when long polling is a very interesting technique for mitigating server load. If the server does not have new messages when polled, it will ignore the client. When a new message is received from the customer without a time-out, the server finds an outstanding request and returns the new message as a response to the client.
Summarize
Hope to read this article, you can understand how different network modules work together
Http://www.cnblogs.com/wenanry/archive/2010/02/25/1673368.html
Inadvertently see the cold winter about the front-end of the nine questions, carefully think I was just 第一、二、九 ask some understanding, just take advantage of this opportunity to comb their knowledge system. Because I do not understand the HTTP protocol and DNS for the resolution of the URL, so here to explore the URL request loading to the browser side, the browser to the HTML parsing to the rendering process, later after several friends to share, organized the URL parsing process, as follows:
- The user enters the URL address and the browser looks for the IP address based on the domain
- The browser sends an HTTP request to the server, and if the server segment returns a redirect such as 301, the browser sends the request again based on the location in the corresponding header
- Server-side accept requests, processing requests to generate HTML code, return to the browser, the HTML page code may be compressed
- The browser receives the server response results, if there is compression first decompression processing, followed by page parsing rendering
The process of parsing rendering is mainly divided into the following steps:
- Parsing HTML
- Building the DOM tree
- The DOM tree is attached to the CSS style to construct the rendering tree
- Layout
- Draw
Parsing and building the DOM tree
The first two steps we put together to discuss, the actual work of the browser also put them together. For HTML browsers There is a dedicated HTML parser to parse the HTML and build the DOM tree during parsing. Here we discuss the parsing of two DOM elements, the style (link, style) and the script file. Because the browser is parsed from top to bottom, the parsing of the browser is blocked when both elements are encountered, until the external resource is loaded and parsed or executed to continue parsing the HTML down. The sequence of styles and scripts can also affect the parsing process of the browser, the main reason is that the script execution process may modify the HTML interface (such as the document.write function), the DOM node CSS style will affect the results of JS execution. In my test, I got the following four conclusions:
1) The external style blocks subsequent script execution until the external style is loaded and parsed.
<! DOCTYPE html>
2) external styles do not block the loading of subsequent external scripts, but will block the execution of external scripts.
<! DOCTYPE html>
var loadtime = document.createelement (' div '); loadtime.innertext = document.currentScript.src + ' executed @ ' + Window.per Formance.now (); loadTime.style.color = ' Blue ';d ocument.body.appendChild (loadtime);
As we can see from the waterfall diagram, the external script is loaded in parallel with the external style, but the external script starts executing until the external style is loaded.
3) If the subsequent external script contains an async attribute (ie, defer), the external style does not block the script's loading and execution
<! DOCTYPE html>
From the waterfall diagram, you can see that the loading and execution of external scripts is not blocked by link.
4) The dynamically created link tag does not block the loading and execution of dynamically created script, regardless of whether the script tag has an async attribute, but for other non-dynamically created script, the above three conclusions still apply
<! DOCTYPE html>
This is the final page structure
With the Waterfall graph and page results, you can see that the dynamically created external script is not blocked by link.
Link or style tags are parsed into DOM nodes. The browser also generates a Cssstylesheet object (c + + code) for the stylesheet, which integrates the child CSSStyle, which is the stylesheet object regardless of whether the object is from a style or a link. This object mainly contains the following important properties and methods
- Cssrules CSS Style Code
- Type represents a string of style sheet types. For CSS style sheets, this string is "Type/css".
- HREF link generated by link, otherwise undefined
- Insertrule (Rule,index): Inserts a rule string into the location specified in the Cssrules collection. IE does not support this method, but supports a similar addrule () method.
- DeleteRule (Index): Removes the rule from the specified location in the Cssrules collection. IE does not support this method, but supports a similar removerule () method.
The collection of all the stylesheets in the document can be accessed through document.stylesheets. The Cssstylesheet object can also be accessed through Element.sheet for the style or link DOM element, which is accessed through element.stylesheet in IE.
After parsing the HTML, the Domcontentloaded event is triggered when the DOM tree is created, and the DOM node can be manipulated using script.
Building a rendering tree
After parsing the HTML, start building the rendering tree Rendertree, which is the main task of applying CSS styles to the DOM nodes, the WebKit kernel calls this process an attachment, and other browsers have different concepts. This process involves CSS cascading issues for front-end engineers.
The first is to sort by the importance of the style, from low to High:
-
- Browser declaration
- User General statement
- General statement of the author
- Author's important statement
- User Important statement
For the same important level, priority is determined based on the specificity of the CSS selector, and the degree of specificity of a style declaration is determined by the following four sections: S-I-C-E
-
- Declares that the style attribute from the inline is s+1;
- The declaration contains the ID attribute i+1;
- The declaration contains classes, pseudo-classes, attribute selectors c+1;
- Life contains elements, pseudo-element selectors are e+1;
A comparison of the degree of specificity is similar to the comparison size between two strings.
Each node of the rendering tree is the CSS box corresponding to its DOM node, which is related to the display property of the DOM node, the block element generates a block box, and the inline element generates an inline box. Each render tree node has a DOM node corresponding to it, but the DOM node does not necessarily have a corresponding rendering tree node, such as a DOM node with the display property of None, and the render tree node's position in the render tree is not necessarily the same as the position in the DOM tree. such as float and absolute positioning elements.
The DOM tree node that corresponds to the rendering tree
Layout
After the rendering tree is constructed, the browser will layout and calculate the size and location information for each render tree node. A Youdao friend might ask that the style has been attached to the DOM node before, and that the style information is not already in place to calculate the size. It can be understood here that the above-mentioned style information is only in memory, and is not actually used, the browser will be based on the actual size of the window to handle the actual display tree node size and location, such as the margin for auto processing.
Layout is a recursive process, starting with the presentation node, recursively traverse the child nodes, calculate the set geometry information. The specific process is still relatively complex I do not know much, friends or other information to check it.
Draw
Once the layout is complete, the rendering tree is drawn and displayed on the screen. For each rendering tree node, the main drawing order is as follows:
- Background color
- Background image
- Border
- Child Render tree Node
- Contour
Resources:
- Http://velocity.oreilly.com.cn/2010/ppts/limufromTaobao.pdf
- http://lifesinger.wordpress.com/
- http://hikejun.com/blog/2012/02/02/js%E5%92%8Ccss%E7%9A%84%E9%A1%BA%E5%BA%8F%E5%85%B3%E7%B3%BB/
- http://www.html5rocks.com/zh/tutorials/internals/howbrowserswork/
- Http://www.2cto.com/kf/201406/305852.html
- Http://www.w3cmm.com/dom/document-stylesheets-getstylesheet.html
- Http://www.cnblogs.com/wenanry/archive/2010/02/25/1673368.html
There are mistakes in the article welcome you friends
Http://www.cnblogs.com/dojo-lzz/p/3983335.html
b/S client and Server interaction (RPM)