Baidu 2015 interview: Enter the URL in the address bar of the browser and press Enter. What exactly happened ?, 2015url

Source: Internet
Author: User
Tags domain name server anycast

Baidu 2015 interview: Enter the URL in the address bar of the browser and press Enter. What exactly happened ?, 2015url

As a software developer, you will surely have a complete hierarchical understanding of how network applications work. The technologies used by these applications include browsers, HTTP, HTML, network servers, demand processing, and so on.

This article will take a deeper look at what happened in the background when you enter a website ~

1. First, you have to enter the URL in your browser:

2. Search for the IP address of the domain name in the browser

The first step in the navigation is to find the IP address of the accessed domain name. The DNS search process is as follows:

  • Browser cache-the browser caches DNS records for a period of time. Interestingly, the operating system does not tell the browser how long it will take to store DNS records, so that different browsers will save a fixed period of time (from 2 minutes to 30 minutes ).
  • System cache-if no required record is found in the browser cache, the browser will make a system call (gethostbyname in windows ). In this way, records in the system cache can be obtained.
  • Router cache-Next, the previous query request is sent to the router, which generally has its own DNS cache.
  • Isp dns Cache-check the server where the ISP caches DNS. The corresponding cache records can be found here.
  • Recursive search-your ISP's DNS server performs recursive search from the Domain Name Server to the Domain Name Server on Facebook. Generally, the DNS server cache contains domain names in the. com Domain Name Server. Therefore, the matching process to the top-level server is not necessary.

Shows recursive DNS lookup:

DNS is a bit worrying, that is, the whole domain name like wikipedia.org or facebook.com looks only corresponding to a separate IP address. Fortunately, there are several ways to eliminate this bottleneck:

  • Cyclic DNS is the solution when multiple IP addresses are returned during DNS lookup. For example, Facebook.com actually corresponds to four IP addresses.
  • A server Load balancer listens on a specific IP address and forwards network requests to hardware devices on the cluster server. Some large websites generally use this expensive high-performance Load balancer.
  • Geographic DNS Maps domain names to multiple IP addresses based on their geographical locations to improve scalability. In this way, different servers cannot update the synchronization status, but it is very good to map static content.
  • Anycast is a routing technology that maps IP addresses to multiple physical hosts. In the US, Anycast and TCP Protocols are not well adapted, so they are rarely used in those solutions.

Most DNS servers use Anycast for efficient and low-latency DNS lookup.

3. the browser sends an HTTP request to the web server

Because dynamic pages like the Facebook homepage will expire soon or even immediately after they are opened in the browser cache, and they cannot be read from them without a doubt.

Therefore, the browser sends the following request to the server where Facebook is located:

GET http://facebook.com/ HTTP/1.1 Accept: application/x-ms-application, image/jpeg, application/xaml+xml, [...] User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; [...] Accept-Encoding: gzip, deflate Connection: Keep-Alive Host: facebook.com Cookie: datr=1265876274-[...]; locale=en_US; lsd=WW[...]; c_user=2101[...]

The GET request defines the URL to be read: "http://facebook.com /". The browser's own definition (User-Agent header), and the corresponding type (Accept andAccept-Encoding header) it wants to Accept. The Connection header requires the server not to close the TCP Connection for later requests.

The request also contains cookies for this domain name stored by the browser. As you may already know, cookies are key values that match the status of a website. In this way, cookies store the login user name, the password assigned by the server, and some user settings. Cookies are stored in the client as text documents and sent to the server each time a request is sent.

There are many tools used to view the original HTTP requests and their corresponding tools. The author prefers to use fiddler, and of course there are other tools like FireBug. These software will be very helpful for website optimization.

In addition to obtaining requests, another method is to send requests, which are often used in Form submission. Send a request to pass its parameter via URL (e.g.: http://robozzle.com/puzzle.aspx? Id = 85 ). The request body header sends its parameters.
As in http://facebook.com/#, the oblique barrier is important. In this case, the browser can safely Add a slash. For example, "http: // slash. In this case, the browser directly accesses the address without adding a slash, and the server will respond to a redirection, resulting in an unnecessary handshake.

4. Permanent redirect response of the facebook Service

The figure shows the response sent from the Facebook server to the browser:

HTTP/1.1 301 Moved Permanently Cache-Control: private, no-store, no-cache, must-revalidate, post-check=0, pre-check=0 Expires: Sat, 01 Jan 2000 00:00:00 GMT Location: http://www.facebook.com/ P3P: CP="DSP LAW" Pragma: no-cache Set-Cookie: made_write_conn=deleted; expires=Thu, 12-Feb-2009 05:09:50 GMT; path=/; domain=.facebook.com; httponly Content-Type: text/html; charset=utf-8 X-Cnection: close Date: Fri, 12 Feb 2010 05:09:51 GMT Content-Length: 0

The server returns a 301 permanent redirect response to the browser, so that the browser will access "http://www.facebook.com/" instead of" http://facebook.com /".

Why must the server redirect instead of sending the webpage content that the user wants to view directly? There are many interesting answers to this question.

One of the reasons is related to the search engine ranking. You see, if a page has two addresses, just like http://www.igoro.com/and Baidu. The search engine knows what 301 permanent redirection means, so that the addresses with and without www will be ranked under the same website.

Another reason is that different addresses may cause poor cache friendliness. When a page has several names, it may appear in the cache several times.

5. browser tracking redirection address

Now, the browser knows that "http://www.facebook.com/?" is the address of the website to be accessed, and the website will send another request:

GET http://www.facebook.com/ HTTP/1.1 Accept: application/x-ms-application, image/jpeg, application/xaml+xml, [...] Accept-Language: en-US User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; [...] Accept-Encoding: gzip, deflate Connection: Keep-Alive Cookie: lsd=XW[...]; c_user=21[...]; x-referer=[...] Host: www.facebook.com

The header information has the same meaning as in the previous request.

6. The server "processes" requests

The server receives the request and then processes it and returns a response.

This seems like a smooth task, but there are a lot of interesting things in the middle-a simple website like the author's blog, not to mention a website with a huge access volume like facebook!

  • Web Server Software
    Web server software (such as IIS and Apache) receives HTTP requests and determines what requests are processed to process them. Request processing is a program that can read requests and generate HTML for response (such as ASP. NET, PHP, RUBY ...).

    To give a simple example, you can map the website address structure to hierarchical file storage. The address http://example.com/folder1/page1.aspxwill map the/httpdocs/folder1/page1.aspx file. The web server software can be set to manually process the corresponding request with the address, so that the publishing address of page1.aspx can be http://example.com/folder1/page1.

  • Request Processing
    The request processes the read request and its parameters and cookies. It reads or updates some data and stores the data on the server. Then, the request processing generates an HTML response.

All dynamic websites face an interesting challenge-how to store data. Half of a small website will have a SQL database to store data. websites that store a large amount of data and/or access traffic have to find some way to allocate the database to multiple machines. Solutions: sharding (data tables are distributed to multiple databases based on primary key values), replication, and simplified database with weak semantic consistency.

Batch processing is a technology that keeps data updated at a low cost. For example, Fackbook needs to update the news feed in a timely manner, but the "people you may know" function supported by data only needs to be updated every night (I guess this is the case, how to Improve the function ). Updating batch processing jobs will lead to obsolete data that is not very important, but it will make the data update farming faster and more concise.

7. The server sends back an HTML response

In the figure, the response is generated and returned by the server:

HTTP/1.1 200 OK Cache-Control: private, no-store, no-cache, must-revalidate, post-check=0, pre-check=0 Expires: Sat, 01 Jan 2000 00:00:00 GMT P3P: CP="DSP LAW" Pragma: no-cache Content-Encoding: gzip Content-Type: text/html; charset=utf-8 X-Cnection: close Transfer-Encoding: chunked Date: Fri, 12 Feb 2010 09:05:55 GMT  2b3Tn@[...]

The overall response size is 35 Kb, most of which are transmitted as blob after sorting.

The content encoding header tells the browser to use the gzip algorithm to compress the entire response body. After extracting the blob block, you can see the following expected HTML:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> 

For compression, the header information shows whether to cache this page, how to do it if it is cached, what cookies should be set (this is not found in the previous response), and privacy information.

Note that the Content-type is set to "text/html" in the header ". The header allows the browser to display the response content in HTML format, rather than downloading it as a file. The browser will decide how to explain the response based on the header information, but it will also consider other factors such as URL extension content.

8. The browser starts displaying HTML

When the browser does not fully accept all HTML documents, it will start to display this page:

9. the browser sends an object to obtain the object embedded in HTML.

When the browser displays HTML, it will notice the tags that need to be obtained from other addresses. In this case, the browser sends a request to obtain these files again.

The following are the URLs we need to retrieve when accessing facebook.com:

  • Image
    Http://static.ak.fbcdn.net/rsrc.php/z12E0/hash/8q2anwu7.gif
    Http://static.ak.fbcdn.net/rsrc.php/zBS5C/hash/7hwy7at6.gif
    ...
  • CSS style table
    Http://static.ak.fbcdn.net/rsrc.php/z448Z/hash/2plh8s4n.css
    Http://static.ak.fbcdn.net/rsrc.php/zANE1/hash/cvtutcee.css
    ...
  • JavaScript files
    Http://static.ak.fbcdn.net/rsrc.php/zEMOA/hash/c8yzb6ub.js
    Http://static.ak.fbcdn.net/rsrc.php/z6R9L/hash/cq2lgbs8.js
    ...

These addresses all go through a process similar to HTML reading. So the browser will find these domain names in DNS, send requests, redirect, and so on...

But unlike dynamic pages, static files can be cached by browsers. Some files may not need to communicate with the server, but can be directly read from the cache. The server response contains the retention period of static files, so the browser knows how long it will take to cache them. In addition, each response may contain the ETag header (the object Value of the requested variable) that works like the version number. If the browser observes that the file version ETag already exists, stop the transfer of this file immediately.

I tried to guess what "fbcdn.net" represents in the address? The smart answer is "Facebook content delivery network ". Facebook uses the content delivery network (CDN) to distribute static files such as images, CSS tables, and JavaScript files. Therefore, these files will be backed up in many CDN data centers around the world.

Static content often represents the bandwidth of the site and can be easily copied through CDN. Websites usually use third-party CDN. For example, Facebook's static files are hosted by Akamai, the largest CDN provider.

For example, when you try to ping static.ak.fbcdn.net, you may obtain a response from an akamai.net server. Interestingly, when you ping the server again, the response server may be different. This shows that the load balancing function starts to take effect.

10. The browser sends an asynchronous (AJAX) Request

Guided by the great spirit of Web 2.0, the page shows that the client is still in touch with the server.

Taking Facebook chat as an example, it will keep in touch with the server to update your bright and gray friends in a timely manner. In order to update the friend status of these pictures, the JavaScript code executed in the browser will send an asynchronous request to the server. This asynchronous request is sent to a specific address, which is a get or send request constructed by program. In the Facebook example, the client sends a request to http://www.facebook.com/ajax/chat/buddy_list.php to obtain the online status information of the friend.

When it comes to this mode, you must talk about "AJAX" -- "Asynchronous JavaScript and XML". Although the server does not have a clear reason for responding in XML format. For another example, Facebook will return some JavaScript code snippets for asynchronous requests.

Among others, the fiddler tool allows you to see asynchronous requests sent by the browser. In fact, you can not only passively serve as a visitor to these requests, but also take the initiative to modify and resend them. AJAX requests are so easy to gain, which can make the online game developers who have scored a lot more depressing. (Of course, don't lie to anyone like that ~)

Facebook chat provides an interesting case about AJAX: Pushing data from the server to the client. Because HTTP is a request-response protocol, the chat server cannot send new messages to the customer. Instead, the client has to poll the server every few seconds to check whether there are any new messages.

In these cases, long polling is an interesting technique to reduce server load. If the server does not receive any new message when it is poll, it will ignore this client. When the client receives a new message that has not timed out, the server will find the unfinished request and return the new message to the client as a response.

To sum up

I hope you can see how different network modules work together.


Original article: http://igoro.com/archive/what-really-happens-when-you-navigate-to-a-url/


When you open the browser, from entering a URL in the address bar to the entire page, what happened on the network

Open the browser, enter the URL, press enter, and you will find the router on the network. Find the router and read the route table. Theoretically, you can find the URL you entered in 12 route tables, and then empty link to this address. then, the default listening port 80 will be used. If it is an FTP header, it will listen to other ports for normal listening, and then it will shake hands, that is, determine the protocol type, and then session your computer like URL to submit the required information A form computer said I want you to address all the WEB pages under the WEB site will ask your computer to show your ID card to see if your computer is good. If it is good, say OK, I'll give it to you. however, I will give you a copy of the document on my WEB, which is cookies. At the same time, the WEB site calculates the webpage code into a window webpage acceptable to the user, and then your computer obtains this document. then, return the computed webpage to your browser based on the content in the document and the WEB.

This is probably the case.

This operation is denied by the browser to enter "about: config" in the address bar of the browser and press Enter.

I used the sogou browser to solve this problem and changed it to compatible mode. This option is available at the end of the address bar.
 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.