What will happen when you enter a URL?

Source: Internet
Author: User
Tags domain name server anycast

From: http://article.yeeyan.org/view/54517/91367

As a software developer, you will surely have a complete hierarchical understanding of how network applications work. The technologies used by these applications include browsers, HTTP, HTML, network servers, demand processing, and so on.

This article will take a deeper look at what happened in the background when you enter a website ~

1. First, you have to enter the URL in your browser:

2. Search for the IP address of the domain name in the browser

The first step in the navigation is to find the IP address of the accessed domain name. The DNS search process is as follows:

  • Browser cache-The browser caches DNS records for a period of time. Interestingly, the operating system does not tell the browser how long it will take to store DNS records, so that different browsers will save a fixed period of time (from 2 minutes to 30 minutes ).
  • System cache-If no required record is found in the browser cache, the browser will make a system call (gethostbyname in Windows ). In this way, records in the system cache can be obtained.
  • Router Cache-Next, the previous query request is sent to the router, which generally has its own DNS cache.
  • Isp dns Cache-Check the server on which the ISP caches DNS. The corresponding cache records can be found here.
  • Recursive search-Your isp dns server performs recursive search from the server with the domain name, from the. com top-level domain name server to the Domain Name Server on Facebook. Generally, the DNS server cache contains domain names in the. com Domain Name Server. Therefore, the matching process to the top-level server is not necessary.

Shows recursive DNS lookup:

DNS is a bit worrying, that is, the whole domain name like wikipedia.org or Facebook.com looks only corresponding to a separate IP address. Fortunately, there are several ways to eliminate this bottleneck:

  • Cyclic DNSIs the solution when multiple IP addresses are returned during DNS lookup. For example, Facebook.com actually corresponds to four IP addresses.
  • Load balancerIt is a hardware device that listens on a specific IP address and forwards network requests to the cluster server. Some large websites generally use this expensive high-performance Load balancer.
  • Geographic DNSYou can map a domain name to multiple IP addresses to improve Scalability Based on your location. In this way, different servers cannot update the synchronization status, but it is very good to map static content.
  • AnycastIt is a routing technology that maps IP addresses to multiple physical hosts. In the US, anycast and TCP Protocols are not well adapted, so they are rarely used in those solutions.

Most DNS servers use anycast for efficient and low-latency DNS lookup.

3. the browser sends an HTTP request to the Web server

Because dynamic pages like the Facebook homepage will expire soon or even immediately after they are opened in the browser cache, and they cannot be read from them without a doubt.

Therefore, the browser sends the following request to the server where Facebook is located:

GET http://facebook.com/ HTTP/1.1
Accept: application/x-ms-application, image/jpeg, application/xaml+xml, [...]
User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; [...]
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
Host: facebook.com
Cookie: datr=1265876274-[...]; locale=en_US; lsd=WW[...]; c_user=2101[...]

The GET request definesURL: "Http://facebook.com /". Browser definition (User-AgentHeader), corresponding to the type it wants to accept (AcceptAndAccept-EncodingHeader ).ConnectionHeader requires the server not to close the TCP connection for subsequent requests.

The request also contains the domain name stored by the browserCookies. As you may already know, cookies are key values that match the status of a website. In this way, cookies store the login user name, the password assigned by the server, and some user settings. Cookies are stored in the client as text documents and sent to the server each time a request is sent.

There are many tools used to view the original HTTP requests and their corresponding tools. The author prefers to use Fiddler, and of course there are other tools like firebug. These software will be very helpful for website optimization.

In addition to obtaining requests, another method is to send requests, which are often used in Form submission. Send a request to pass its parameter via URL (e.g.: http://robozzle.com/puzzle.aspx? Id = 85 ). The request body header sends its parameters.

As in http://facebook.com/#, the oblique barrier is important. In this case, the browser can safely Add a slash. For example. In this case, the browser directly accesses the address without adding a slash, and the server will respond to a redirection, resulting in an unnecessary handshake.

4. Permanent redirect response of the Facebook Service

The figure shows the response sent from the Facebook server to the browser:

HTTP/1.1 301 Moved Permanently
Cache-Control: private, no-store, no-cache, must-revalidate, post-check=0,
pre-check=0
Expires: Sat, 01 Jan 2000 00:00:00 GMT
Location: http://www.facebook.com/
P3P: CP="DSP LAW"
Pragma: no-cache
Set-Cookie: made_write_conn=deleted; expires=Thu, 12-Feb-2009 05:09:50 GMT;
path=/; domain=.facebook.com; httponly
Content-Type: text/html; charset=utf-8
X-Cnection: close
Date: Fri, 12 Feb 2010 05:09:51 GMT
Content-Length: 0

The server returns a 301 permanent redirect response to the browser, so that the browser will access "http://www.facebook.com/" instead of" http://facebook.com /".

Why must the server redirect instead of sending the webpage content that the user wants to view directly? There are many interesting answers to this question.

One of the reasons is thatSearch engine ranking. You see, if a page has two addresses, just like http://www.igoro.com/and Baidu. The search engine knows what 301 permanent redirection means, so that the addresses with and without WWW will be ranked under the same website.

Another reason is that different addresses may causeCache friendlinessDeteriorated. When a page has several names, it may appear in the cache several times.

5. browser tracking redirection address

Now, the browser knows that "http://www.facebook.com/?" is the address of the website to be accessed, and the website will send another request:

GET http://www.facebook.com/ HTTP/1.1
Accept: application/x-ms-application, image/jpeg, application/xaml+xml, [...]
Accept-Language: en-US
User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; [...]
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
Cookie: lsd=XW[...]; c_user=21[...]; x-referer=[...]
Host: www.facebook.com

The header information has the same meaning as in the previous request.

6. The server "processes" requests

The server receives the request and then processes it and returns a response.

This seems like a smooth task, but there are a lot of interesting things in the middle-a simple website like the author's blog, not to mention a website with a huge access volume like Facebook!

  • Web Server Software

    Web server software (such as IIS and Apache) receives HTTP requests and determines what requests are processed to process them. Request processing is a program that can read requests and generate HTML for response (such as ASP. NET, PHP, Ruby ...).

    To give a simple example, you can map the website address structure to hierarchical file storage. Images. The web server software can be set to manually process the corresponding request with the address, so that the publishing address of page1.aspx can be http://example.com/folder1/page1.

  • Request Processing

    The request processes the read request and its parameters and cookies. It reads or updates some data and stores the data on the server. Then, the request processing generates an HTML response.

All dynamic websites face an interesting challenge-how to store data. Half of a small website will have a SQL database to store data. websites that store a large amount of data and/or access traffic have to find some way to allocate the database to multiple machines. Solutions: sharding (data tables are distributed to multiple databases based on primary key values), replication, and simplified database with weak semantic consistency.

Entrusting batch processing is a technology that keeps data updated at a low cost. For example, fackbook needs to update the news feed in a timely manner, but the "people you may know" function supported by data only needs to be updated every night (I guess this is the case, how to Improve the function ). Updating batch processing jobs will lead to obsolete data that is not very important, but it will make the data update farming faster and more concise.

7. The server sends back an HTML response

In the figure, the response is generated and returned by the server:

HTTP/1.1 200 OK
Cache-Control: private, no-store, no-cache, must-revalidate, post-check=0,
pre-check=0
Expires: Sat, 01 Jan 2000 00:00:00 GMT
P3P: CP="DSP LAW"
Pragma: no-cache
Content-Encoding: gzip
Content-Type: text/html; charset=utf-8
X-Cnection: close
Transfer-Encoding: chunked
Date: Fri, 12 Feb 2010 09:05:55 GMT

2b3��������T�n�@����[...]

The overall response size is 35 Kb, most of which are transmitted as blob after sorting.

Content EncodingThe header tells the browser that the entire response body is compressed using the gzip algorithm. After extracting the Blob block, you can see the following expected HTML:

 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

lang="en" id="facebook" class=" no_js">



...

For compression, the header information shows whether to cache this page, how to do it if it is cached, what cookies should be set (this is not found in the previous response), and privacy information.

Note thatContent-TypeSet to"Text/html". The header allows the browser to display the response content in HTML format, rather than downloading it as a file. The browser will decide how to explain the response based on the header information, but it will also consider other factors such as URL extension content.

8. The browser starts displaying html

When the browser does not fully accept all HTML documents, it will start to display this page:

9. the browser sends an object to obtain the object embedded in HTML.

When the browser displays HTML, it will notice the tags that need to be obtained from other addresses. In this case, the browser sends a request to obtain these files again.

The following are the URLs we need to retrieve when accessing Facebook.com:

  • Image

    Http://static.ak.fbcdn.net/rsrc.php/z12E0/hash/8q2anwu7.gif

    Http://static.ak.fbcdn.net/rsrc.php/zBS5C/hash/7hwy7at6.gif

    ...
  • CSS style table

    Http://static.ak.fbcdn.net/rsrc.php/z448Z/hash/2plh8s4n.css

    Http://static.ak.fbcdn.net/rsrc.php/zANE1/hash/cvtutcee.css

    ...
  • JavaScript files

    Http://static.ak.fbcdn.net/rsrc.php/zEMOA/hash/c8yzb6ub.js

    Http://static.ak.fbcdn.net/rsrc.php/z6R9L/hash/cq2lgbs8.js

    ...

These addresses all go through a process similar to HTML reading. So the browser will find these domain names in DNS, send requests, redirect, and so on...

But unlike dynamic pages, static files can be cached by browsers. Some files may not need to communicate with the server, but can be directly read from the cache. The server response contains the retention period of static files, so the browser knows how long it will take to cache them. In addition, each response may contain the etag header (the object Value of the requested variable) that works like the version number. If the browser observes that the file version etag already exists, stop the transfer of this file immediately.

Try to guess"Fbcdn.net"What does the address represent? The smart answer is "Facebook content delivery network ". Facebook uses the content delivery network (CDN) to distribute static files such as images, CSS tables, and JavaScript files. Therefore, these files will be backed up in many CDN data centers around the world.

Static content often represents the bandwidth of the site and can be easily copied through CDN. Websites usually use third-party CDN. For example, Facebook's static files are hosted by Akamai, the largest CDN provider.

For example, when you try to ping static.ak.fbcdn.net, you may obtain a response from an akamai.net server. Interestingly, when you ping the server again, the response server may be different. This shows that the load balancing function starts to take effect.

10. The browser sends an asynchronous (Ajax) Request

Guided by the great spirit of Web 2.0, the page shows that the client is still in touch with the server.

Taking Facebook chat as an example, it will keep in touch with the server to update your bright and gray friends in a timely manner. In order to update the friend status of these pictures, the JavaScript code executed in the browser will send an asynchronous request to the server. This asynchronous request is sent to a specific address, which is a get or send request constructed by program. In the Facebook example, the client sends a request to http://www.facebook.com/ajax/chat/buddy_list.php to obtain the on-line status information of a friend.

When it comes to this mode, you must talk about "ajax" -- "Asynchronous JavaScript and XML". Although the server does not have a clear reason for responding in XML format. For another example, Facebook will return some JavaScript code snippets for asynchronous requests.

Among others, the fiddler tool allows you to see asynchronous requests sent by the browser. In fact, you can not only passively serve as a visitor to these requests, but also take the initiative to modify and resend them. Ajax requests are so easy to gain, which can make the online game developers who have scored a lot more depressing. (Of course, don't lie to anyone like that ~)

Facebook chat provides an interesting case about Ajax: Pushing data from the server to the client. Because HTTP is a request-response protocol, the chat server cannot send new messages to the customer. Instead, the client has to poll the server every few seconds to check whether there are any new messages.

In these cases, long polling is an interesting technique to reduce server load. If the server does not receive any new message when it is poll, it will ignore this client. When the client receives a new message that has not timed out, the server will find the unfinished request and return the new message to the client as a response.

To sum up

I hope you can see how different network modules work together.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.