A Free Trial That Lets You Build Big!
Start building with 50+ products and up to 12 months usage for Elastic Compute Service
Baidu 2015 interview: Enter the URL in the address bar of the browser and press Enter. What exactly happened ?, 2015url
As a software developer, you will surely have a complete hierarchical understanding of how network applications work. The technologies used by these applications include browsers, HTTP, HTML, network servers, demand processing, and so on.
This article will take a deeper look at what happened in the background when you enter a website ~1. First, you have to enter the URL in your browser: 2. Search for the IP address of the domain name in the browser
The first step in the navigation is to find the IP address of the accessed domain name. The DNS search process is as follows:
Shows recursive DNS lookup:
DNS is a bit worrying, that is, the whole domain name like wikipedia.org or facebook.com looks only corresponding to a separate IP address. Fortunately, there are several ways to eliminate this bottleneck:
Most DNS servers use Anycast for efficient and low-latency DNS lookup.3. the browser sends an HTTP request to the web server
Because dynamic pages like the Facebook homepage will expire soon or even immediately after they are opened in the browser cache, and they cannot be read from them without a doubt.
Therefore, the browser sends the following request to the server where Facebook is located:
GET http://facebook.com/ HTTP/1.1 Accept: application/x-ms-application, image/jpeg, application/xaml+xml, [...] User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; [...] Accept-Encoding: gzip, deflate Connection: Keep-Alive Host: facebook.com Cookie: datr=1265876274-[...]; locale=en_US; lsd=WW[...]; c_user=2101[...]
The GET request defines the URL to be read: "http://facebook.com /". The browser's own definition (User-Agent header), and the corresponding type (Accept andAccept-Encoding header) it wants to Accept. The Connection header requires the server not to close the TCP Connection for later requests.
The request also contains cookies for this domain name stored by the browser. As you may already know, cookies are key values that match the status of a website. In this way, cookies store the login user name, the password assigned by the server, and some user settings. Cookies are stored in the client as text documents and sent to the server each time a request is sent.
There are many tools used to view the original HTTP requests and their corresponding tools. The author prefers to use fiddler, and of course there are other tools like FireBug. These software will be very helpful for website optimization.
In addition to obtaining requests, another method is to send requests, which are often used in Form submission. Send a request to pass its parameter via URL (e.g.: http://robozzle.com/puzzle.aspx? Id = 85 ). The request body header sends its parameters.
As in http://facebook.com/#, the oblique barrier is important. In this case, the browser can safely Add a slash. For example, "http: // slash. In this case, the browser directly accesses the address without adding a slash, and the server will respond to a redirection, resulting in an unnecessary handshake.
The figure shows the response sent from the Facebook server to the browser:
HTTP/1.1 301 Moved Permanently Cache-Control: private, no-store, no-cache, must-revalidate, post-check=0, pre-check=0 Expires: Sat, 01 Jan 2000 00:00:00 GMT Location: http://www.facebook.com/ P3P: CP="DSP LAW" Pragma: no-cache Set-Cookie: made_write_conn=deleted; expires=Thu, 12-Feb-2009 05:09:50 GMT; path=/; domain=.facebook.com; httponly Content-Type: text/html; charset=utf-8 X-Cnection: close Date: Fri, 12 Feb 2010 05:09:51 GMT Content-Length: 0
The server returns a 301 permanent redirect response to the browser, so that the browser will access "http://www.facebook.com/" instead of" http://facebook.com /".
Why must the server redirect instead of sending the webpage content that the user wants to view directly? There are many interesting answers to this question.
One of the reasons is related to the search engine ranking. You see, if a page has two addresses, just like http://www.igoro.com/and Baidu. The search engine knows what 301 permanent redirection means, so that the addresses with and without www will be ranked under the same website.
Another reason is that different addresses may cause poor cache friendliness. When a page has several names, it may appear in the cache several times.5. browser tracking redirection address
Now, the browser knows that "http://www.facebook.com/?" is the address of the website to be accessed, and the website will send another request:
GET http://www.facebook.com/ HTTP/1.1 Accept: application/x-ms-application, image/jpeg, application/xaml+xml, [...] Accept-Language: en-US User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; [...] Accept-Encoding: gzip, deflate Connection: Keep-Alive Cookie: lsd=XW[...]; c_user=21[...]; x-referer=[...] Host: www.facebook.com
The header information has the same meaning as in the previous request.6. The server "processes" requests
The server receives the request and then processes it and returns a response.
This seems like a smooth task, but there are a lot of interesting things in the middle-a simple website like the author's blog, not to mention a website with a huge access volume like facebook!
To give a simple example, you can map the website address structure to hierarchical file storage. The address http://example.com/folder1/page1.aspxwill map the/httpdocs/folder1/page1.aspx file. The web server software can be set to manually process the corresponding request with the address, so that the publishing address of page1.aspx can be http://example.com/folder1/page1.
All dynamic websites face an interesting challenge-how to store data. Half of a small website will have a SQL database to store data. websites that store a large amount of data and/or access traffic have to find some way to allocate the database to multiple machines. Solutions: sharding (data tables are distributed to multiple databases based on primary key values), replication, and simplified database with weak semantic consistency.
Batch processing is a technology that keeps data updated at a low cost. For example, Fackbook needs to update the news feed in a timely manner, but the "people you may know" function supported by data only needs to be updated every night (I guess this is the case, how to Improve the function ). Updating batch processing jobs will lead to obsolete data that is not very important, but it will make the data update farming faster and more concise.7. The server sends back an HTML response
In the figure, the response is generated and returned by the server:
HTTP/1.1 200 OK Cache-Control: private, no-store, no-cache, must-revalidate, post-check=0, pre-check=0 Expires: Sat, 01 Jan 2000 00:00:00 GMT P3P: CP="DSP LAW" Pragma: no-cache Content-Encoding: gzip Content-Type: text/html; charset=utf-8 X-Cnection: close Transfer-Encoding: chunked Date: Fri, 12 Feb 2010 09:05:55 GMT 2b3Tn@[...]
The overall response size is 35 Kb, most of which are transmitted as blob after sorting.
The content encoding header tells the browser to use the gzip algorithm to compress the entire response body. After extracting the blob block, you can see the following expected HTML:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
For compression, the header information shows whether to cache this page, how to do it if it is cached, what cookies should be set (this is not found in the previous response), and privacy information.
Note that the Content-type is set to "text/html" in the header ". The header allows the browser to display the response content in HTML format, rather than downloading it as a file. The browser will decide how to explain the response based on the header information, but it will also consider other factors such as URL extension content.8. The browser starts displaying HTML
When the browser does not fully accept all HTML documents, it will start to display this page:9. the browser sends an object to obtain the object embedded in HTML.
When the browser displays HTML, it will notice the tags that need to be obtained from other addresses. In this case, the browser sends a request to obtain these files again.
The following are the URLs we need to retrieve when accessing facebook.com:
These addresses all go through a process similar to HTML reading. So the browser will find these domain names in DNS, send requests, redirect, and so on...
But unlike dynamic pages, static files can be cached by browsers. Some files may not need to communicate with the server, but can be directly read from the cache. The server response contains the retention period of static files, so the browser knows how long it will take to cache them. In addition, each response may contain the ETag header (the object Value of the requested variable) that works like the version number. If the browser observes that the file version ETag already exists, stop the transfer of this file immediately.
Static content often represents the bandwidth of the site and can be easily copied through CDN. Websites usually use third-party CDN. For example, Facebook's static files are hosted by Akamai, the largest CDN provider.
For example, when you try to ping static.ak.fbcdn.net, you may obtain a response from an akamai.net server. Interestingly, when you ping the server again, the response server may be different. This shows that the load balancing function starts to take effect.10. The browser sends an asynchronous (AJAX) Request
Guided by the great spirit of Web 2.0, the page shows that the client is still in touch with the server.
Among others, the fiddler tool allows you to see asynchronous requests sent by the browser. In fact, you can not only passively serve as a visitor to these requests, but also take the initiative to modify and resend them. AJAX requests are so easy to gain, which can make the online game developers who have scored a lot more depressing. (Of course, don't lie to anyone like that ~)
Facebook chat provides an interesting case about AJAX: Pushing data from the server to the client. Because HTTP is a request-response protocol, the chat server cannot send new messages to the customer. Instead, the client has to poll the server every few seconds to check whether there are any new messages.
In these cases, long polling is an interesting technique to reduce server load. If the server does not receive any new message when it is poll, it will ignore this client. When the client receives a new message that has not timed out, the server will find the unfinished request and return the new message to the client as a response.To sum up
I hope you can see how different network modules work together.
Original article: http://igoro.com/archive/what-really-happens-when-you-navigate-to-a-url/
Open the browser, enter the URL, press enter, and you will find the router on the network. Find the router and read the route table. Theoretically, you can find the URL you entered in 12 route tables, and then empty link to this address. then, the default listening port 80 will be used. If it is an FTP header, it will listen to other ports for normal listening, and then it will shake hands, that is, determine the protocol type, and then session your computer like URL to submit the required information A form computer said I want you to address all the WEB pages under the WEB site will ask your computer to show your ID card to see if your computer is good. If it is good, say OK, I'll give it to you. however, I will give you a copy of the document on my WEB, which is cookies. At the same time, the WEB site calculates the webpage code into a window webpage acceptable to the user, and then your computer obtains this document. then, return the computed webpage to your browser based on the content in the document and the WEB.
This is probably the case.
This operation is denied by the browser to enter "about: config" in the address bar of the browser and press Enter.
I used the sogou browser to solve this problem and changed it to compatible mode. This option is available at the end of the address bar.
Start building with 50+ products and up to 12 months usage for Elastic Compute Service