Process of entering URL to page return

Source: Internet
Author: User
Tags domain name server anycast browser cache domain server

After entering the url, you see the homepage of baidu, then how does all this happen?

before, recently, I think I will certainly be asked later, or ask such questions, if the Baidu box input query string start, is how to return the things you Need.

So what's the process (the question I wrote in my blog post later)?

There are all kinds of sayings on the Internet.

The first simple thing to say is this:

The first step: the client presents the domain name resolution request and sends the request to the local domain name Server. Second step: when the local domain name server receives the request, first queries the local cache, if has the record entry, then the local domain name server directly returns the result of the Query. Third Step: If the local cache does not have this record, the local domain name server sends the request directly to the ROOT name server, and then the root name server returns the address of the primary domain server to the local domain name server for the domain (the root subdomain) of the Query. Fourth Step: The local server then returns the domain name server to send the request, and then accepts the requested server to query its own cache, if there is no record, returns the address of the associated subordinate domain name Server. Fifth Step: Repeat the fourth step until you find the correct Record.
The second kind, in more detail, in English but I believe you can easily see
1.enter the URL to the address bar
2.a request would be sent to the DNS server based on your network configuration
3.DNS'll route to the real IP of the domain name
4.a request (with Complete Http header) would be sent to the server (with 3 's IP to Identify) ' s (suppose we don t spec Ify another port)
5.server would search the listening ports and forward the request to the app which are listening to + port (let's say Nginx Here) or to another server (then 3 ' s server would be like a load balancer)
6.nginx would try to match the URL to its configuration and serve as a static page directly, or invoke the corresponding s Cript intepreter (e.g Php/python) or other app-to-get the dynamic content (with DB query, or other Logics)
7.A HTML would be sent back to browser with a complete Http response header
8.browser would parse the DOM of HTML using its parser
9.external Resources (js/css/images/flash/videos.) would be requested in sequence (or not?)
10.for js, it'll be executed by JS engine
11.for css, it'll be a rendered by CSS engine and HTML ' s display would be adjusted based on the CSS (also in sequence or not ?)
12.if there ' an IFRAME in the DOM, then a separate same process would be executed from step 1-12
The third kind,
1.browser Checks cache; If requested object is in the cache and is fresh, skip to #9
2.browser asks OS for server ' s IP address
3.OS makes a DNS lookup and replies the IP address to the browser
4.browser opens a TCP connection to server (this step is much + complex with HTTPS)
5.browser sends the HTTP request through TCP connection
6.browser receives HTTP response and may close the TCP connection, or reuse it for another request
7.browser checks if the response is a redirect (3xx result status codes), authorization request (401), error (4xx and 5xx) , etc.; These is handled differently from normal responses (2xx)
8.if cacheable, response is stored in cache
9.browser Decodes response (E.G. if it ' s gzipped)
10.browser determines what does with response (E.G. was it a HTML page, is it an image, is it a sound clip?)
11.browser renders response, or offers a download dialog for unrecognized types
Other of
So many words, the basic meaning is correct, in the written test is not a problem directly, but asked in the interview, the interviewer will certainly ask some details, which requires us to seriously study.
ok, let's take a closer look at the whole process, here I refer to a foreign article, is this Daniel's

, the topic is what really happens if you navigate to a URL, link to receive the final reference of the Article.

Let's get started, translate the article, see reference 4 the Author's translation,

1. You enter a URL into the browser (enter a URL Address)

2.The Browser looks up the IP address for the domain name (browser to find IP addresses for domain names)

The first step in navigation is to find its IP address by the domain name that is Accessed. The DNS lookup process is as Follows:

    • Browser cache – The browser caches DNS records for a period of Time. interestingly, the operating system does not tell the browser when to store DNS records, so that different browsers store a self-fixed time (ranging from 2 minutes to 30 minutes).
    • System cache – If the required records are not found in the browser cache, the browser makes a system call (gethostbyname in windows). This will get the records in the system Cache.
    • Router Caching – next, the previous query request is sent to the router, which generally has its own DNS Cache.
    • ISP DNS Cache – The next check is the ISP cache DNS Server. The corresponding cache record can be found in this general.
    • Recursive search – your ISP's DNS server starts with a recursive search with a domain name server, from a. com top-level domain name server to a Facebook domain name Server. In the general DNS server cache there will be domain names in the. com domain name server, so the match process to the top level server is not so necessary.

DNS recursive lookups are as Follows:

DNS is a bit worrying, which is that the entire domain name, such as wikipedia.org or facebook.com, appears to correspond to a single IP address. fortunately, There are several ways to eliminate this bottleneck:

    • Circular DNS is a solution when DNS lookups return multiple ips. For example, facebook.com actually corresponds to four IP addresses.
    • A load balancer is a hardware device that listens on a specific IP address and forwards network requests to a clustered Server. Some large sites typically use this expensive, high-performance load balancer.
    • Geographic DNS improves scalability by mapping domain names to multiple different IP addresses, depending on the geographic location of the User. Such a different server is not able to update the synchronization state, but it is good to map the static Content.
    • Anycast is a routing technology that maps multiple physical hosts to an IP address. In the ointment, anycast and TCP protocols are not well adapted, so they are rarely used in those scenarios.

Most DNS servers use anycast to obtain efficient, low-latency DNS Lookups.

3. The browser sends a HTTP request to the Web server (the browser sends an HTTP call to the Web Servers)

Because dynamic pages such as the Facebook page, which are opened and soon expire in the browser cache, are no doubt they cannot be read from.

so, the browser will send a request to the server on which Facebook is located:

GET http://facebook.com/HTTP/1.1   accept:application/x-ms-application, image/jpeg, application/xaml+xml, [...]   user-agent:mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; [...]   accept-encoding:gzip, deflate   connection:keep-alive   Host:facebook.com   cookie:datr=1265876274-[...]; locale=en_us; lsd=ww[...]; c_user=2101[...]  

GET This request defines the URL to read: "http://facebook.com/". The browser itself is defined (user-agent header), and what type of corresponding (acceptand accept-encoding header) it wants to accept. The connection header requires the server not to close the TCP connection in order to request it behind.

The request also contains cookies stored by the browser for that domain Name. As you may already know, in different page requests, cookies are key values that match the status of a Website. This allows cookies to store login usernames, server-assigned passwords, and some user settings. Cookies are stored in the client computer as text documents and sent to the server each time it is Requested.

The original HTTP request and its corresponding tools are used in many ways. The author prefers to use fiddler, and of course there are other tools like Firebug. These software can be very helpful when it comes to website Optimization.

In addition to getting the request, there is another way to send the request, which is often used in the submission form Alone. The sending request passes its parameters via a URL (e.g.: http://robozzle.com/puzzle.aspx?id=85). The send request sends its arguments after the request body Header.

Slashes like "http://facebook.com/" are critical. In this case, the browser can safely add Slashes. And like "http://example.com/folderorfile" such an address, because the browser is not clear whether Folderorfile is a folder or file, so cannot automatically add Slashes. At this point, the browser does not have a slash directly access to the address, the server responds to a redirect, resulting in an unnecessary handshake.

4. The Facebook server responds with a permanent redirect (permanent redirect response for Facebook Service)

The figure shows the response that the Facebook server sends back to the Browser:

http/1.1 301 Moved Permanently
cache-control:private, no-store, no-cache, must-revalidate, post-check=0,
Pre-check=0
expires:sat, 00:00:00 GMT
location:http://www.facebook.com/
p3p:cp= "DSP law"
Pragma:no-cache
set-cookie:made_write_conn=deleted; expires=thu, 12-feb-2009 05:09:50 GMT;
path=/; domain=.facebook.com; HttpOnly
content-type:text/html; Charset=utf-8
X-cnection:close
date:fri, 05:09:51 GMT
content-length:0

The server responds with a 301 permanent redirect response to the browser so that the browser accesses "http://www.facebook.com/" rather than "http://facebook.com/".

Why does the server have to redirect rather than directly send the Web content that the user wants to see? There are many interesting answers to this Question.

One of the reasons is related to search engine rankings. You see, if a page has two addresses, like Http://www.igoro.com/and http://igoro.com/, the search engine will think of them as two sites, resulting in fewer search links and less rankings. and search engine know 301 Permanent redirect is what meaning, so will visit with www and without WWW address to the same site Ranking.

Another is that using a different address will result in a poor cache-friendliness. When a page has several names, it may appear several times in the Cache.

5. The browser follows the redirect (browser tracking redirect Address)

now, The browser knows that "http://www.facebook.com/" is the correct address to access, so it sends another fetch request:

GET http://www.facebook.com/HTTP/1.1
accept:application/x-ms-application, image/jpeg, application/xaml+xml, [...]
Accept-language:en-us
user-agent:mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; [...]
accept-encoding:gzip, deflate
Connection:keep-alive
cookie:lsd=xw[...]; c_user=21[...]; x-referer=[...]
Host:www.facebook.com

The header information is in the same meaning as in the previous Request.

6. The server ' handles ' the request ("processing")

The server receives the fetch request, and then processes and returns a Response.

This appears to be a forward-looking task, but there are a lot of interesting things going on in the middle-just like the simple website of the Author's blog, not to mention the large-scale website like Facebook.

    • Web Server Software
      Web server Software (like IIS and Apache) receives an HTTP request and then determines what request processing is performed to handle it. Request processing is a program that can read requests and generate HTML to respond (like Asp.net,php,ruby ... )。

      For the simplest example, the requirement processing can be stored in a file hierarchy that maps the address structure of a web Site. Like http://example.com/folder1/page1.aspx this address will map/httpdocs/folder1/page1.aspx this file. The Web server software can be set as the address manual for the corresponding request processing, so that the Page1.aspx publishing address can be http://example.com/folder1/page1.

    • Request Processing
      Request processing of Read requests and its parameters and Cookies. It will read and possibly update some data, and say that the data is stored on the Server. then, The requirement processing generates an HTML Response.

All dynamic sites face an interesting challenge-how to store Data. Half of a small site has a SQL database to store data, and a website that stores a large amount of data and/or visits has to find some way to allocate the database to multiple Machines. The solution is: sharding (based on the primary key value of the data table scattered across multiple databases), replication, the use of weak semantic consistency of the simplified database.

Delegating work to batch processing is a cheap technology to keep data updated. For example, Fackbook has to update the news feed in a timely fashion, but the "people you might know" feature in the data support only needs to be updated every night (as the author guesses, It's unclear how the changes will be perfected). batch job updates can cause some of the less important data to be stale, but it makes it faster and cleaner to keep data updated.

7. The server sends back a HTML response (a response to an HTML reply)

The response generated and returned by the server in the Figure:

http/1.1 OK
cache-control:private, no-store, no-cache, must-revalidate, post-check=0,
Pre-check=0
expires:sat, 00:00:00 GMT
p3p:cp= "DSP law"
Pragma:no-cache
Content-encoding:gzip
content-type:text/html; Charset=utf-8
X-cnection:close
Transfer-encoding:chunked
date:fri, 09:05:55 GMT

[email protected] [...]

The entire response size is 35kB, most of which is transferred as BLOB type after finishing.

The content encoding header tells the browser that the entire response body is compressed with the gzip algorithm. After extracting the BLOB block, you can see the following HTML as Expected:

<! DOCTYPE HTML Public "-//W3C//DTD XHTML 1.0 strict//en"    
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd" >
lang= "en" id= "facebook" class= "no_js" >
<meta http-equiv= "content-type" content= "text/html; Charset=utf-8 "/>
<meta http-equiv= "content-language" content= "en"/>
...

With regard to compression, the header information indicates whether the page is cached or not, and if so, what cookies are to be set (not in the previous Response) and private Information.

Note that the Content-type is set to "text/html" in the Header. The header lets the browser render the response content in HTML instead of downloading it as a file. The browser determines how the response is interpreted based on the header information, but it also considers other factors such as URL extension content.

8. The browser begins rendering the HTML (the browser begins to display HTML)

When the browser does not fully accept the entire HTML document, it has already started to display this page:

9. The browser sends requests for objects embedded in HTML (the browser sends an object embedded in Html)

When the browser displays html, it will notice the need to get a label for other address Content. At this point, the browser sends a FETCH request to retrieve the Files.

Here are a few URLs we need to get back when we visit facebook.com:

    • Image
      Http://static.ak.fbcdn.net/rsrc.php/z12E0/hash/8q2anwu7.gif
      Http://static.ak.fbcdn.net/rsrc.php/zBS5C/hash/7hwy7at6.gif
      ...
    • CSS style Sheets
      Http://static.ak.fbcdn.net/rsrc.php/z448Z/hash/2plh8s4n.css
      Http://static.ak.fbcdn.net/rsrc.php/zANE1/hash/cvtutcee.css
      ...
    • JavaScript files
      Http://static.ak.fbcdn.net/rsrc.php/zEMOA/hash/c8yzb6ub.js
      Http://static.ak.fbcdn.net/rsrc.php/z6R9L/hash/cq2lgbs8.js
      ...

These addresses are going through a process similar to HTML reading. So the browser will find these domain names in dns, send requests, redirect, etc...

however, Unlike dynamic pages, static files allow the browser to cache Them. Some files may not need to be communicated to the server and read directly from the Cache. The Server's response contains the term information for static file retention, so the browser knows how long to cache Them. also, Each response may contain an ETag header that works like a version number (the entity value of the requested variable), and if the browser observes that the version ETag information for the file already exists, stop the file transfer immediately.

Try to guess what "fbcdn.net" means in the Address. The smart answer is "facebook content distribution network". Facebook uses content distribution networks (CDNS) to distribute static files like images, css tables, and JavaScript files. As a result, these files will be backed up in many CDN data centers around the WORLD.

Static content often represents the bandwidth size of the site and can be easily replicated through a cdn. A third-party CDN is usually used by the Website. For example, Facebook's static files are hosted by akamai, the largest CDN provider.

For example, you might get a response from a akamai.net server when you try to ping static.ak.fbcdn.net. interestingly, When you ping again, the server may be different, which means that the load balance behind the scenes is starting to Work.

The browser sends further asynchronous (AJAX) requests (browser send asynchronous (ajax) Request)

Under the guidance of the Great Spirit of Web 2.0, the client remains in contact with the server after the page is Displayed.

Take the Facebook chat feature as an example, and it will keep in touch with the server to update your shiny gray friend status in a timely Fashion. In order to update the status of these avatar-lit friends, the JavaScript code executed in the browser sends an asynchronous request to the Server. This asynchronous request is sent to a specific address, which is a fetch or send request constructed by the Program. Or in the case of facebook, the client sends http://www.facebook.com/ajax/chat/buddy_list.php a publish request to get the status information about which online in your friend.

When you bring up this pattern, you have to talk about "AJAX"-"asynchronous JavaScript and XML," Although the reason why the server responds in XML format is casualgirlfriend. For example, for an asynchronous request, Facebook will return some JavaScript code snippets.

Among other fiddler, This tool allows you to see the asynchronous requests sent by the Browser. In fact, you can not only passively as a spectator of these requests, but also proactively make changes and resend Them. Ajax requests are so easy to be blindfolded that it's really frustrating for those scoring online game Developers. (of course, don't lie to others like That)

The Facebook Chat feature provides an interesting case for ajax: pushing data from the server side to the Client. Because HTTP is a request-response protocol, the chat server cannot send new messages to Customers. instead, the client has to poll the server every few seconds to see if he has any new messages.

These situations occur when long polling is a very interesting technique for mitigating server Load. If the server does not have new messages when polled, it will ignore the Client. When a new message is received from the customer without a time-out, the server finds an outstanding request and returns the new message as a response to the Client.

OK, Here we know the detailed procedure to return to the request page, starting with the input url. If you can reach this level, I think the interviewer will give you an offer! Ha ha.
Reference http://blog.csdn.net/wdzxl198/article/details/11265475

Process of entering URL to page return

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.